Classification

23. Classification#

Susanna Lange

In general, classification is a type of supervised learning algorithm. Such an algorithm gets labeled data as input, and from this labeled data attempts to classify new data points.

This is similar to how you might have learned growing up. You walk past a dog and your parent ‘labels’ this new animal as a dog by telling you “This is a dog!”. You see a cat in a book and your sister ‘labels’ this new animal as a cat by pointing and saying ‘cat!’. Every time you see such an an animal, you are processing the animal and the label you are hearing from the outside world. Until the day comes that your kindergarten teacher asks “What animal is this?”. Now you are seeing the animal with no label and you must classify it yourself. Was it similar to the other dogs and cats you have seen before? Does it have a long tail and whiskers or floppy ears? Maybe you haven’t seen this particular breed before and you have to guess based on what you know. This is the idea behind classification algorithms.

More formally, classification refers to the assigning of a label, a finite collection of categorical values, to an unlabeled example, given previous labeled data. Some examples of classification include

spam detection
disease prediction
image classification

Note

This as opposed to unsupervised learning, which gets unlabeled data and finds patterns within. Find more information in Chapter 25 about models that fall into this category.

There are many algorithms that can be used for classification. These include logistic models, k-nearest neighbor, decision trees, and neural networks, to name a few. This chapter will focus on k-nearest neighbor to begin your journey into classification, with neural networks discussed in more detail in a later chapter. The first two sections provide an intuitive view on how this algorithm works, and the last section provides a more in-depth example.