Classifying Dog Breeds Through Color Histograms
Identification of dog breeds is a difficult task. A truly robust breed classification model requires a nuanced review of each dog in question, and can result in non-intuitive groupings. Take for instance, the Fédération Cynologique Internationale (FCI) classification of sheepdogs and cattle dogs to include German and Belgian shepherds, while excluding the Swiss Mountain Cattle dog (certainly unexpected!).
Having recently completed Harvard's CSCI S-109A Introduction to Data Science course, I had the opportunity to work alongside fellow classmates to develop a series of classification models for the identification of various dog breeds.
Stanford University has offered an image dataset containing photos of 120 breeds of dogs from around the world. While the dataset had been built for the task of fine-grained image categorization, it became quickly apparent that the incorporation of image processing techniques for pre-processing would be required to properly classify breeds. For instance, photos in this data set widely varied in dimensional size.
In order to mitigate risk in successfully implementing a classification algorithm, our group's classification algorithm focused on the ability to distinguish between German Shepherd and Boston Bull breeds. When attempting to distinguish between these breeds by hand, one will quickly see their comparatively distinct visual qualities (for instance fur color). We ultimately opted for a suite of classification models based on material covered in class. Utilizing the K-Nearest Neighbors classifier, an AdaBoost Classification model, a Decision Tree and Random Forest classifier, and finally attempting a Logistic regression meta-model (based on the performance of all prior models).
The results from these classification models, our exploratory data analysis, and a further in depth literature review can be found at the following website.