Document Type

Conference Proceeding

Publication Date

1997

Publication Title

Proceedings of the Fourteenth International Conference on Machine Learning

Abstract

This paper addresses the problem of handling skewed class distributions within the case-based learning (CBL) framework. We first present as a baseline an information gain-weighted CBL algorithm and apply it to three data sets from natural language processing (NLP) with skewed class distributions. Although overall performance of the baseline CBL algorithm is good, we show that the algorithm exhibits poor performance on minority class instances. We then present two CBL algorithms designed to improve the performance of minority class predictions. Each variation creates test-case-specific feature weights by first observing the path taken by the test case in a decision tree created for the learning task, and then using path-specific information gain values to create an appropriate weight vector for use during case retrieval. When applied to the NLP data sets, the algorithms are shown to significantly increase the accuracy of minority class predictions while maintaining or improving overall classification accuracy.

Creative Commons License

Creative Commons Attribution 4.0 License
This work is licensed under a Creative Commons Attribution 4.0 License.

Rights

© the authors

Comments

Author’s submitted manuscript.

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.