Itransition delivered a PoC solution for plankton detection and classification, proving the feasibility of the suggested ML approach.
Our customer is a US-based company, created by a group of scientists and engineers to design and build advanced underwater measurement and observation systems that can operate in the most severe ocean environments. Striving to advance the knowledge of our planet’s aquatic environments, they built integrated instrumentation platforms to observe all kinds of physical and chemical parameters in real-time. From this data, scientists and companies from various industries get a better understanding of complex dependencies in ecosystems and make informed decisions.
One of the issues the customer addresses is understanding plankton composition in ocean water through acquiring data on plankton quantity and types. The detection, extraction, and classification activities were based on a supervised learning model involving convolutional neural networks (CNN). The activities happened in real time on their NVIDIA Jetson embedded AI computing device, which is coupled with a machine vision camera. Their self-developed plankton detection and recognition software based on C++ limited the image processing speed to 8-10 FPS, though the camera's maximum possible speed was 30 FPS. Therefore, the accuracy of plankton detection and recognition, important to the business, was unsatisfying, as the system employed older versions of CNN algorithms.
Considering the challenges the customer faced, they wanted to make the following changes:
The customer was searching for a trusted and competent partner that would take up the project and deliver a robust solution to improve the company’s digital performance. In the end, they turned to Itransition, as we proved to be the right match considering our strong ML consulting expertise and experience with solving similar challenges.
Based on the provided documentation and the code file, our experts concluded that there were two possible ways to achieve the customer’s goal:
As for improvements to the current solution, our experts prepared a list of recommendations to speed up code and algorithm execution. Still, we and the customer eventually decided that developing a completely new system with a new architecture and selected technology stack was a much faster and easier-to-maintain option. It would also bring the following positive changes:
According to the proposed approach, the solution detects objects in images using the YOLO object detection system. Then the EfficientNet CNN is applied to the obtained results, creating a vector for each detected object. After that, the solution compares the created vectors to the detected objects against the reference ones, leveraging the Faiss or Annoy algorithm, and chooses based on the minimum distance between them.
We opted for Python as the main programming language because of its popularity and quicker and easier development compared to previously used C++, which would still deliver comparable performance results.
For training the detection and classification models, we considered using Tensorflow, PyTorch, and Darknet, an open-source neural network framework written in C and CUDA. Initially, we wanted to ensure the uniformity of the leveraged libraries and utilize Tensorflow for both cases, especially considering that Darknet could be easily converted to Tensorflow. This would ensure model type consistency because using one framework reduces possible risks. Both Tensorflow and Darknet could work with Jetson CPU, but you can get the maximum performance boost when they work with the device’s GPU.
However, after testing the frameworks on the customer’s Jetson module, we found out that Tensorflow was unable to work with GPU properly given the module’s non-standard ARM architecture. As it was impossible to make the most out of Tensorflow capabilities, we discarded it in favor of Darknet for the detection process. For training the classification model, we chose PyTorch as it performed well on the customer’s machine and, unlike Tensorflow, could leverage its GPU resources.
When deciding which algorithm to apply for the Nearest Neighbor Search (NNS) process, we were choosing between Annoy and Faiss. We tested both options and decided that Faiss would be a much better solution considering the customer’s company specifics and a relatively small number of values (30+ classes with 100 images for each class).
To evaluate the new plankton detection and recognition process and fine-tune the solution before development, Itransition suggested creating a Proof of Concept (PoC). A PoC would have allowed us to elaborate on implementation details, solve potential issues, meet the customer’s specific technology requirements, and agree on the most suitable technical approach and budget. Before the start of PoC development, the customer:
The main purpose of the PoC was to ensure satisfactory interoperability of the new app and the customer’s hardware. We also wanted to try out the selected CNN algorithms and determine whether the level of accuracy they provide would be sufficient. To achieve the set PoC phase goals, our team performed the following activities:
Our project team consisted of a project manager and backend and ML engineers. Also, a developer from the customer’s side got involved if we had blockers while working with the customer’s machine. We had weekly meetings where we made go/no-go decisions, sent the customer weekly updates, timely notified them about blockers, and invited them to demos.
Since the PoC development project had many unknowns, we defined and decomposed the project scope, identified the main milestones, and determined their deadlines for easier, more efficient and manageable activities planning. Before the development began, we also created an architecture diagram and approved it with the customer.
Throughout the project, we carried out app testing on the customer’s equipment to make sure the solution’s modules worked properly locally as well as deployed on their machine. For this, we launched the solution, viewed logs, got performance metrics for each module, collected statistics, and made changes and improvements if needed. At first, we tested the solution on our systems with images from the folder and later connected to the camera and tested the system using a live video stream.
During PoC development, we adapted our work pace in line with the encountered complexities, such as selecting the necessary libraries for deploying the app on the customer’s machine and connecting it to the camera or choosing the frameworks for model training. We also measured the speed of the solution’s various components to identify bottlenecks and optimize them.
Challenge |
Description |
Solution |
---|---|---|
New software installation and environment setup on the customer's machine |
To run the application, suitable libraries and packages had to be installed, since NVIDIA Jetson is of a non-standard ARM architecture. We faced multiple obstacles while installing libraries because not all versions were supported. |
We researched and installed the corresponding libraries (Numpy and Pillow for processing multidimensional arrays, OpenCV for resizing images, etc.) suitable for the customer’s machine. We set up the environment so that we could use these libraries, as they were crucial for image processing. We tested different versions of libraries to identify which version works best on the customer’s device and ensured optimal configurations of the libraries and the device itself. |
Establishing camera connection |
We needed to figure out which libraries and their versions were supported by the camera. |
We researched libraries and found the versions suitable for the FLIR Grasshopper camera and the developed solution. To enable reading frames from the camera, we applied the PyCapture library. |
Performance optimization |
While measuring necessary performance metrics, we found several bottlenecks:
|
We found alternative ways to convert frames to a suitable format and improve the solution’s performance.
|
Setting up detection and vectorization on the customer's machine GPU |
We found out that TensorFlow utilized GPU poorly and needed too much RAM. |
We decided to stick to Darknet and PyTorch for detection and vectorization correspondingly, as they run faster and utilize resources better. |
Architectural design |
The available RAM caused app queue limitations and required a thoughtful solution architecture design. |
We experimented with the number of queue entities and their size to define whether the solution would work properly without freezing and crashing. We also designed the architecture that allows the app to optimize RAM use. |
The delivered solution includes the following modules:
This step is performed before the main classification step because the solution receives the class of the found plankton using the Faiss index and the JSON file during the “Class search” step:
There are two sources to get frames from – the folder with images and the camera:
The statistics module is responsible for gathering all the information collected at the runtime based on the data found. After the thread is initialized, the following actions take place:
Itransition defined the next improvement steps for the PoC solution to increase the solution’s performance and the accuracy of detection and classification models. For model accuracy, we suggested the following:
To increase processing speed, we suggested:
At the end of PoC development and after the final demo with the customer, we provided executable files and instructions on how to run the app. After the customer has sufficiently used the solution in the real environment and collected data, we will develop a full-scale software suite.
Itransition delivered a PoC solution for plankton detection and classification. The PoC allowed the customer to test the quality and performance of the approach we suggested without investing in a full-scale solution. By implementing the PoC, we proved the feasibility of the suggested approach and achieved the set goals: