Proceedings of the Fourteenth International Conference on Machine Learning
This paper addresses the problem of handling skewed class distributions within the case-based learning (CBL) framework. We first present as a baseline an information gain-weighted CBL algorithm and apply it to three data sets from natural language processing (NLP) with skewed class distributions. Although overall performance of the baseline CBL algorithm is good, we show that the algorithm exhibits poor performance on minority class instances. We then present two CBL algorithms designed to improve the performance of minority class predictions. Each variation creates test-case-specific feature weights by first observing the path taken by the test case in a decision tree created for the learning task, and then using path-specific information gain values to create an appropriate weight vector for use during case retrieval. When applied to the NLP data sets, the algorithms are shown to significantly increase the accuracy of minority class predictions while maintaining or improving overall classification accuracy.
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.
© the authors
Cardie, Claire and Howe, Nicholas, "Improving Minority Class Prediction Using Case-Specific Feature Weights" (1997). Computer Science: Faculty Publications, Smith College, Northampton, MA.