Building an image classifier for waste sorting

Why waste sorting?

Recycling contamination occurs when waste is incorrectly disposed of — like recycling a pizza box with oil on it (compost). Or when waste is correctly disposed of but incorrectly prepared — like recycling unrinsed jam jars.

Contamination is a huge problem in the recycling industry that can be mitigated with automated waste sorting. Just for kicks, I thought I’d try my hand at prototyping an image classifier to classify trash and recyclables — this classifier could have applications in an optical sorting system.

Building an image classifier

I’ll train a convolutional neural network to classify an image as either cardboard, glass, metal, paper, plastic, or trash with the fastai library (built on PyTorch). I used an image dataset collected manually by Gary Thung and Mindy Yang. If you’re following along, download their dataset here, then move it to the same directory as the notebook. (Note: you’ll want to use a GPU to speed up training.)

My modeling pipeline:

  1. Download and extract the images
  2. Organize the images into different folders
  3. Train model
  4. Make and evaluate test predictions
  5. Next steps

1. Extract data

First, we need to extract the contents of “dataset-resized.zip”.

2. Organize images into different folders

Now that we’ve extracted the data, I’m going to split images up into train, validation, and test image folders with a 50–25–25 split. I defined some functions that helped me to quickly build it which you can check out in the notebook.

Next, I’m going to create a bunch of destination folders according to the ImageNet directory convention. This means it will have an outer folder (I called it data) with three subfolders: train, validation, and test. Within each of those folders, there is a folder named cardboard, glass, metal, paper, plastic, and trash. I’ll skip over that code as well because it’s mundane; just know that this is the organization of my image dataset.

ImageDataBunch.from_folder() specifies that we’ll be extracting our training, validation, and testing data from folders in an ImageNet structure.

The batch size bs is how many images you’ll train at a time. Choose a smaller batch size if your computer has less memory.

You can use get_transforms() function to augment your data.

Here’s an example of what the data looks like:

3. Model training

Specifying the CNN in one line of code

What is resnet34?

A residual neural network is a convolutional neural network (CNN) with lots of layers. In particular, resnet34 is a CNN with 34 layers that’s been pretrained on the ImageNet database. A pretrained CNN will perform better on new image classification tasks because it has already learned some visual features and can transfer that knowledge over (hence transfer learning).

Since they’re capable of describing more complexity, deep neural networks should theoretically perform better than shallow networks on training data. In reality, though, deep neural networks tend to perform empirically worse than shallow ones.

Resnets were created to circumvent this glitch using a hack called shortcut connections. If some nodes in a layer have suboptimal values, you can adjust weights and bias; if a node is optimal (its residual is 0), why not leave it alone? Adjustments are only made to nodes on an as-needed basis (when there’s non-zero residuals).

When adjustments are needed, shortcut connections apply the identity function to pass information to subsequent layers. This shortens the neural network when possible and allows resnets to have deep architectures and behave more like shallow neural networks. The 34 in resnet34 just refers to the number of layers.

Anand Saha gives a great more in-depth explanation here.

Finding a learning rate

I’m going to find a learning rate for gradient descent to make sure that my neural network converges reasonably quickly without missing the optimal error. For a refresher on the learning rate, check out Jeremy Jordan’s post on choosing a learning rate.

The learning rate finder suggests a learning rate of 5.13e-03. With this, we can train the model.

Training

I ran my model for 20 epochs. What’s cool about this fitting method is that the learning rate decreases with each epoch, allowing us to get closer and closer to the optimum. At 8.6%, the validation error looks super good… let’s see how it performs on the test data though.

First, we can take a look at which images were most incorrectly classified.

Visualizing most incorrect images

The images here that the recycler performed poorly on were actually degraded. It looks the photos received too much exposure or something so this actually isn’t a fault with the model!

This model often confused plastic for glass and confused metal for glass. The list of most confused images is below.

4. Make new predictions on test data

To see how this mode really performs, we need to make predictions on test data. First, I’ll make predictions on the test data using the learner.get_preds() method.

Note: learner.predict() only predicts on a single image, while learner.get_preds() predicts on a set of images. I highly recommend reading the documentation to learn more about predict() and get_preds().

The ds_type argument in get_preds(ds_type) takes a DataSet argument. Example values are DataSet.Train, DataSet.Valid, and DataSet.Test. I mention this because I made the mistake of passing in actual data (learn.data.test_ds) which gave me the wrong output and took embarrassingly long to debug.

Don’t make this mistake! Don’t pass in data — pass in the dataset type!

These are the predicted probabilities for each image. This tensor has 365 rows — one for each image — and 6 columns — one for each material category.

Now I’m going to convert the probabilities in the tensor above to a vector of predicted class names.

These are the predicted labels of all the images! Let’s check if the first image is actually glass.

It is!

Next, I’ll get the actual labels from the test dataset.

It looks the first five predictions match up!

How does this model perform overall? We can use a confusion matrix to find out.

Test confusion matrix

I’m going to make this matrix a little bit prettier:

Again, the model seems to have confused metal for glass and plastic for glass. With more time, I’m sure further investigation could help reduce these mistakes.

I ended up achieving an accuracy of 92.1% on the test data which is pretty great — the original creators of the TrashNet dataset achieved a test accuracy of 63% with a support vector machine on a 70–30 test-train split (they trained a neural network as well for a test accuracy of 27%).

5. Next steps

If I had more time, I’d go back and reduce classification error for glass in particular. I’d also delete photos from the dataset that are overexposed since those images are just bad data.

This was just a quick and dirty mini-project to show that it’s pretty quick to train an image classification model, but it pretty amazing how quickly you can create a state-of-the-art model by using the fastai library. If you have an application you’re interested in but don’t think you have the machine learning chops, this should be encouraging for you.

Source of Motivation- Medium

Thank you for reading 🙂