CNN Experiments for Fish Species Classifier

July 5, 2022

This post is exclusively focused on the CNN experiments I did in detail; for more context on the project overall checkout my project description here. I decided to initially limit myself to 92 species which had sufficient data in fishbase and google images. It is also worth noting that I elected to use the Fishbase images exclusively for test and validation; I did this because I this was the only data that I knew to be correctly labeled (google images could have the wrong species in a given result).

92 Species Classifier Experiments

For each species, I scraped google images for both their scientific and common names. For much of my initial exploration, I limited myself to training my models on the scientific name images; I did this because I believed that these images would be more likely to be labeled correctly. I began my exploration by trying a variety of pretrained models. I used Resnet18, EfficientNetB0, and Convnet Tiny. By training the pretrained models on my scientific image dataset, I got the following accuracies on my test set.

Model	Test Accuracy	Train Accuracy
ResNet18	63.68%	90.55%
EfficientNet B0	64.18%	86.34%
ConvNet Tiny	66.57%	90.27%

I also tried taking a feature extraction based approach from the pretrained versions of these models. To do this, I froze all but the last layer and trained them. When doing this, I got the following accuracies. It is worth noting, that while freezing layers resulted worse accuracies, it also significantly improved training times.

Model	Test Accuracy	Train Accuracy
ResNet18	41.39%	67.55%
EfficientNet B0	48.86%	59.43%
ConvNet Tiny	58.77%	63.19%

I also tested several initial learning rates for the Resnet18 model to make sure that I had an accurate representation of what a last layer approach could do. I found that:

Learning Rate	Test Accuracy	Train Accuracy
0.1	53.04%	67.55%
0.01	52.81%	61.75%
0.001	41.67%	49.67%

Since the domain space of fish identification is pretty different from where these models are originally trained, I decided to try unfreezing more layers, which returned the following results.

Model	Test Accuracy	Train Accuracy
ResNet18	63.90%	94.51%
EfficientNet B0	60.39%	89.26%
ConvNet Tiny	70.86%	96.27%

I then tried a different learning rate scheduler. Instead of using a step decaying approach, I elected to use a decay on plateau approach (with respect to validation loss) without any frozen weights and got the following results.

Model	Test Accuracy	Train Accuracy
ResNet18	64.18%	98.36%
EfficientNet B0	65.68%	96.65%
ConvNet Tiny	68.86%	97.85%

As we can see from the above experiments, both partially unfreezing layers and the decay on plateau improved performance for convnet tiny, so I decided to try different variations of unfreezing with the Decay on Plateau scheduler.

Model	Test Accuracy	Train Accuracy
CnT PF I (ConvNet Tiny Partial Frozen I)	72.31%	96.95%
CnT PF II	72.31%	97.17%
CnT PF III	72.31%	96.72%
CnT PF IIA	77.38%	97.55%

In training the models above, I noticed that sometimes the Validation Accuracy would plateau while training accuracy would increase after a set amount of time (symptom of overfitting) and other times both accuracies would plateau. To tackle this, I decided to try using L2 regularization. For values greater than or equal to 0.005, my models performance decreased with each epoch.

λ	Test Accuracy	Train Accuracy
0.002	71.25%	74.77%
0.001	75.15%	86.16%
0.0005	76.43%	93.27%
0.0003	75.54%	92.07
0.0001	77.44%	96.57%
0.00007	76.04%	97.08%
0.00005	76.21%	96.27%
0.00001	76.99%	96.31%

As we can see above, adding the L2 regularization did not really improve model performance on the test set (was within margin of error). With this all tested, I then decided to proceed to focusing on improvements on the image/training side of things. After reading this paper, I saw that random crops had the potential to greatly help model accuracy. So I decided to try implementing them on the best classifier thus far (Convnet Tiny Partial Frozen IIA with Decay on Plateau) to see if that helped. My resulting training accuracy was 97.84% and my test accuracy was 79.39%, which was the best result thus far!

As I previously mentioned, all of the above experiments were performed on the images scraped from the scientific name results as I believed that it had higher quality data. To test this hypothesis I then trained the same model on the common dataset only and the mixed dataset (both scientific and common). The results were as follows:

Training Data Source	Test Accuracy	Train Accuracy
Scientific	79.39%	98.36%
Common	80.33%	96.65%
Mixed	80.78%	97.85%

The dataset experiment above showed me that my hypothesis was incorrect; the best results came from the mixed dataset in the end. It turns out, to no one's real surprise, that more data is better. last model update I decided to try for the 92 species classifier, was fine-tuning the models using novel data. For example, I wanted to try fine tuning the model trained on the scientific data by utilizing the common name data and vice versa to see whether this would lead to a better overall result.

Initial Dataset	Fine Tuning Dataset	Initial Learning Rate	Initial Test Accuracy	Ending Test Accuracy
Scientific	Common	0.001	79.39%	77.33%
Scientific	Mixed	0.001	79.39%	77.60%
Scientific	Mixed	0.0001	79.39%	76.88%
Common	Mixed	0.0001	80.33%	77.21%

After all of this testing, the best model that I found was the partially frozen Convnet Tiny that was trained on the mixed data. This model has a 80.78% accuracy and a 90.03% top 3 accuracy, so it overall performs very solidly. With all of these experiments concluded, I was then ready to move on to building an expanded species classifier.

Expansion to 286 Species

After all of these experiments, I decided to expand my classifier to even more species. Unfortunately of the 1350 species in the the fishbase database, only 1,287 had images. For this expanded model I lowered my threshold for number of fishbase images from 25 to 10 and my threshold for scientific/common scraped images from 50 to 30 each. This shifting resulted in 286 species being valid instead of the initial 92. With all of this done, I was then ready to train a new model on this dataset. For this, I tried two different approaches: transfer learning off of my 92 species model and retraining a model from scratch. For both of these, I used the most performant architecture on my previous results.

Model	Initial Learning Rate	Test Accuracy
Transfer Model	0.001	65.17%
Transfer Model	0.0001	64.94%
From Scratch	0.001	63.45%

The transfer learned model with an initial learning rate of 0.001 had the best test accuracy. It’s top 3 and top 5 accuracies are 82.26% and 87.40% respectively.

P.S. If you'd like to access the code I wrote for this project, you can do so here!

CNN Experiments for Fish Species Classifier

July 5, 2022

92 Species Classifier Experiments

Expansion to 286 Species

Contact Me

Email

Phone