Building a Fish Species Classifier
June 29, 2022
For as long as I can remember, I have been enamored with the ocean and all of its inhabitants; fish in particular were always fascinating to me. Since I began SCUBA diving, identifying them has become something I am very invested in. While I can often ask my dive buddies for help, many times they don’t know. This motivated me to try and see whether I could build an ML model that could tackle this issue for me. While, the idea behind this project was conceptually simple, getting the data and building the best model turned out to be anything but.
Initially, I set out with the goal of using fishbase.se (a helpful site that catalogues species of fish with images) as my primary data source, but I ran into two key issues:
- Fishbase contains 34,800 species in it. While it is possible to train a model on this number of species, I simply did not have the hardware/resources for it. Thus, I decided to limit myself to a much smaller subset: fish present in Hawaii. Of the 34,800 species in the database, 1,287 of them are present in Hawaii.
- Most of these species did not have many images with them. Of these species only 99 had 10 or more images and only 14 had 50 or more images. Ideally when training a convolutional neural network, at least 100 images per class is needed, so clearly fishbase did not have enough data to train a useful species classifier. Thus, I decided to augment my data with images scraped from Google.

To do this, I utilized Selenium along with a table of species names scraped from fishbase. The species had both a common and a scientific name, so I scraped images for both of these terms and stored them separately. Unfortunately, the scraped images from google were not as clean as those from fishbase; there were many non-fish images within the results. To tackle this, I elected to train another CNN classifier to determine whether or not an image contained a fish or not.
To train this classifier, I downloaded mini-imagenet and separated the fish images within it from the non-fish ones. This process involved doing a key-term search among the classes and then manually sorting through the remaining results. I then augmented the fish class using the images from fishbase. For the classifier model itself, I simply took a transfer learning approach. I took the Resnet18 with pretrained weights and trained it on the aforementioned data. In doing so, I was able to get 97.17% accuracy on a withheld test set.
With the is_fish classifier built, I was now ready to build my species classifier. For this classifier, I elected to use the fishbase images as my validation and test images, as these were the only images I knew to be correctly labelled. With this in mind, I initially decided to limit myself to species that had at least 25 images in fishbase and at least 50 fish images for both the common name and the scientific name. Of the 1,287 species in Hawaii, only 92 met these thresholds.
For this model, I also used a transfer learning approach, but I experimented with many more different aspects to find the best model that I could. In my experiments, I tested:
- different pretrained architectures (Resnet18, Efficient Net B0, Convnet Tiny)
- varying amounts of frozen layers (from no frozen layers to all but the last layer frozen)
- different initial learning rates
- different learning rate schedulers (stepwise decay and decay on plateau)
- L2 Regularization
- varied data augmentation (rotation, reflection, random cropping, normalization, etc.)
- different training sets (common name images only, scientific name images only, both image sets together)
- expanding training sets (train on one set [eg common] and then fine tune using the other set(s) [scientific or mixed])
With this model built, I wanted to see if I could build a classifier on even more classes, so I lowered the thresholds used for filtering fish classes to having 10+ images in fishbase and 30+ scientific and common images each, which allowed for 286 different species. For this model, I used the same architecture as I did for the 92 species classifier and tried both training the model from scratch and taking a curriculum learning based approach and training the model off of the 92 species classifier. The curriculum learning approach resulted in a better model, with an accuracy of 65.17%, a top 3 accuracy of 82.26% and a top 5 accuracy of 87.40%.
Overall, this project turned out to be a fantastic learning experience for me. In completing it, I learned about a wide variety of topics including, but not limited to: getting CUDA to work on Linux (a non-trivial task), Pytorch, web scraping using selenium, convolutional neural networks, transfer learning, neural network fine tuning, and curriculum learning. In the future, I hope to expand my model to more species, try other architectural improvements, and deploy a web application to enable easy access to my model. Until then, most of my interactions with these fascinating animals will be underwater.