Proceedings of the Seventeenth International Conference on Machine Learning
Many collections of data do not come packaged in a form amenable to the ready application of machine learning techniques. Nevertheless, there has been only limited research on the problem of preparing raw data for learning, perhaps because widespread differences between domains make generalization difficult. This paper focuses on one common class of raw data, in which the entities of interest actually comprise collections of (smaller pieces of) homologous data. We present a technique for processing such collections into high-dimensional vectors, suitable for the application of many learning algorithms including clustering, nearestneighbors, and boosting. We demonstrate the abilities of the method by using it to implement similarity metrics on two different domains: natural images and measurements from ocean buoys in the Pacific.
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.
© Nicholas Howe
Howe, Nicholas, "Data as Ensembles of Records: Representation and Comparison" (2000). Computer Science: Faculty Publications, Smith College, Northampton, MA.