CNN Experiments for Fish Species Classifier

July 5, 2022

This post is exclusively focused on the CNN experiments I did in detail; for more context on the project overall checkout my project description here. I decided to initially limit myself to 92 species which had sufficient data in fishbase and google images. It is also worth noting that I elected to use the Fishbase images exclusively for test and validation; I did this because I this was the only data that I knew to be correctly labeled (google images could have the wrong species in a given result).

92 Species Classifier Experiments

For each species, I scraped google images for both their scientific and common names. For much of my initial exploration, I limited myself to training my models on the scientific name images; I did this because I believed that these images would be more likely to be labeled correctly. I began my exploration by trying a variety of pretrained models. I used Resnet18, EfficientNetB0, and Convnet Tiny. By training the pretrained models on my scientific image dataset, I got the following accuracies on my test set.

Model Test Accuracy Train Accuracy
ResNet18 63.68% 90.55%
EfficientNet B0 64.18% 86.34%
ConvNet Tiny 66.57% 90.27%

I also tried taking a feature extraction based approach from the pretrained versions of these models. To do this, I froze all but the last layer and trained them. When doing this, I got the following accuracies. It is worth noting, that while freezing layers resulted worse accuracies, it also significantly improved training times.

Model Test Accuracy Train Accuracy
ResNet18 41.39% 67.55%
EfficientNet B0 48.86% 59.43%
ConvNet Tiny 58.77% 63.19%

I also tested several initial learning rates for the Resnet18 model to make sure that I had an accurate representation of what a last layer approach could do. I found that:

Learning Rate Test Accuracy Train Accuracy
0.1 53.04% 67.55%
0.01 52.81% 61.75%
0.001 41.67% 49.67%

Since the domain space of fish identification is pretty different from where these models are originally trained, I decided to try unfreezing more layers, which returned the following results.

Model Test Accuracy Train Accuracy
ResNet18 63.90% 94.51%
EfficientNet B0 60.39% 89.26%
ConvNet Tiny 70.86% 96.27%

I then tried a different learning rate scheduler. Instead of using a step decaying approach, I elected to use a decay on plateau approach (with respect to validation loss) without any frozen weights and got the following results.

Model Test Accuracy Train Accuracy
ResNet18 64.18% 98.36%
EfficientNet B0 65.68% 96.65%
ConvNet Tiny 68.86% 97.85%

As we can see from the above experiments, both partially unfreezing layers and the decay on plateau improved performance for convnet tiny, so I decided to try different variations of unfreezing with the Decay on Plateau scheduler.

Model Test Accuracy Train Accuracy
CnT PF I (ConvNet Tiny Partial Frozen I) 72.31% 96.95%
CnT PF II 72.31% 97.17%
CnT PF III 72.31% 96.72%
CnT PF IIA 77.38% 97.55%

In training the models above, I noticed that sometimes the Validation Accuracy would plateau while training accuracy would increase after a set amount of time (symptom of overfitting) and other times both accuracies would plateau. To tackle this, I decided to try using L2 regularization. For values greater than or equal to 0.005, my models performance decreased with each epoch.

λ Test Accuracy Train Accuracy
0.002 71.25% 74.77%
0.001 75.15% 86.16%
0.0005 76.43% 93.27%
0.0003 75.54% 92.07
0.0001 77.44% 96.57%
0.00007 76.04% 97.08%
0.00005 76.21% 96.27%
0.00001 76.99% 96.31%

As we can see above, adding the L2 regularization did not really improve model performance on the test set (was within margin of error). With this all tested, I then decided to proceed to focusing on improvements on the image/training side of things. After reading this paper, I saw that random crops had the potential to greatly help model accuracy. So I decided to try implementing them on the best classifier thus far (Convnet Tiny Partial Frozen IIA with Decay on Plateau) to see if that helped. My resulting training accuracy was 97.84% and my test accuracy was 79.39%, which was the best result thus far!

As I previously mentioned, all of the above experiments were performed on the images scraped from the scientific name results as I believed that it had higher quality data. To test this hypothesis I then trained the same model on the common dataset only and the mixed dataset (both scientific and common). The results were as follows:

Training Data Source Test Accuracy Train Accuracy
Scientific 79.39% 98.36%
Common 80.33% 96.65%
Mixed 80.78% 97.85%

The dataset experiment above showed me that my hypothesis was incorrect; the best results came from the mixed dataset in the end. It turns out, to no one's real surprise, that more data is better. last model update I decided to try for the 92 species classifier, was fine-tuning the models using novel data. For example, I wanted to try fine tuning the model trained on the scientific data by utilizing the common name data and vice versa to see whether this would lead to a better overall result.

Initial Dataset Fine Tuning Dataset Initial Learning Rate Initial Test Accuracy Ending Test Accuracy
Scientific Common 0.001 79.39% 77.33%
Scientific Mixed 0.001 79.39% 77.60%
Scientific Mixed 0.0001 79.39% 76.88%
Common Mixed 0.0001 80.33% 77.21%

After all of this testing, the best model that I found was the partially frozen Convnet Tiny that was trained on the mixed data. This model has a 80.78% accuracy and a 90.03% top 3 accuracy, so it overall performs very solidly. With all of these experiments concluded, I was then ready to move on to building an expanded species classifier.

Expansion to 286 Species

After all of these experiments, I decided to expand my classifier to even more species. Unfortunately of the 1350 species in the the fishbase database, only 1,287 had images. For this expanded model I lowered my threshold for number of fishbase images from 25 to 10 and my threshold for scientific/common scraped images from 50 to 30 each. This shifting resulted in 286 species being valid instead of the initial 92. With all of this done, I was then ready to train a new model on this dataset. For this, I tried two different approaches: transfer learning off of my 92 species model and retraining a model from scratch. For both of these, I used the most performant architecture on my previous results.

Model Initial Learning Rate Test Accuracy
Transfer Model 0.001 65.17%
Transfer Model 0.0001 64.94%
From Scratch 0.001 63.45%

The transfer learned model with an initial learning rate of 0.001 had the best test accuracy. It’s top 3 and top 5 accuracies are 82.26% and 87.40% respectively.


P.S. If you'd like to access the code I wrote for this project, you can do so here!

Contact Me

Phone

(408) 401- 6524