ML PoC for a plant pathology recognition solution

ML PoC for a plant pathology recognition solution

Itransition’s team delivered a proof of concept for an ML plant pathology recognition solution, enabling the customer to get investments and partner with several scientific institutes for further work on the solution.

Table of contents

Context

Our customer is a software startup that had an innovative idea for an image recognition and comparison solution. They wanted to create an app for field scientists and laboratories to scan, analyze, and compare plant photos and then determine pathology presence and types. Most of the solutions existing on the market offered the functionality of comparing plant samples but didn’t provide further analysis of grouping and cutting off irrelevant data and samples. In addition, such solutions didn’t connect to IoT devices for pathology research on the go.

The customer wanted to create a comprehensive solution that would simplify and enhance the process of plant pathology identification with the help of machine learning. They partnered with Itransition thanks to our extensive experience in delivering mobile solutions and web solutions equipped with ML features.

Solution

Itransition’s team suggested starting with a proof of concept (PoC) to evaluate the idea of a new solution before developing it fully. The PoC phase would give us an opportunity to elaborate on the details of ML features and study possible limitations that we’d need to overcome when creating a full-blown solution. As the customer already had a scanning device, the PoC solution should also have comprehensive interoperability with the scanner. The main goal of this phase was a functioning solution to offer to scientific institutions to work with and provide their feedback for further development.

ML approach

We created ML pipelines with a combination of C, Python, and .NET as the solution employs complex math and algorithms for comparison. The main language we used was Python thanks to its wide usage in ML solutions and quick development.

For the comparison algorithm to work properly, the images must be correctly preprocessed. The sample type should be identified first because we use different comparison methods for different sample types. Our team performed exploratory data analysis and prepared datasets for training by removing outliers, labeling data, and balancing datasets.

We trained two neural networks (NNs):

  • Image classification using ResNet-50 model to identify sample types
  • Multilabel classifier using ResNet-50 model to predict pathology types

As the customer didn’t have enough reference photos, we generated synthetic data based on 2,000 provided photos to extend the available training dataset. Our specialists:

  • Performed image augmentation (random hue, contrast, coarse dropout, blur, image quality, adding random noise)
  • Tested different input image types (RGB, grayscale, composite image)

Using Jupyter and PyTorch, we got the statistics on the dataset and predictions. We used Azure virtual machines for NN training and validation. Our specialists created a script that augments the dataset and feeds it to the NN. Considering the limited initial data, Itransition’s team prepared data for the NNs and grouped it into three categories for training.

We validated trained NNs on a given dataset and collected metrics, histograms, and analyzed model errors. As a result, we got the KPI of 80% correct pathology identification.

Mobile solution

Itransition’s team developed a mobile application that connects to the scanning device and uploads the scans to Azure Blob Storage. The mobile solution was developed with Swift and the .NET backend and deployed in Azure.

A mobile phone is connected to the customer’s device. A plant is put onto a flat surface on the other end of the device and scanned.

Disclaimer: According to the non-disclosure agreement that we signed with the customer, we cannot reveal the screenshots of the real system. In this case study we provide a few similar screenshots created with the view to offer the reader an idea of the solution developed by Itransition.

Home screen and submitting plants details
Pathology review and uploading scans

The application works with RAW and ProRAW formats for maximum photo quality. These photos are then uploaded to the backend and further analyzed by the NNs according to the predefined steps.

The solution allows for the validation and confirmation that all the data was uploaded to the backend. Then a user can see the list of photos that have not been uploaded yet but have been taken. To speed up the uploading, several photos are processed at a time.

Web application

Itransition’s team also delivered a web application for scientific experts to work with. The experts analyze the data processed and predicted by the NNs and give the final opinion on the plant pathology identification.

The web application is multi-tenant, allowing different expert organizations to register, and includes an admin portal where laboratories and users are managed. The solution is written with Tailwind CSS and React.

Pathology determination
Grouped pathologies

Results

Itransition’s team delivered a plant pathology recognition solution, and as a result of our cooperation, the customer:

  • Successfully demonstrated the solution to the investors 8 months after the project’s start
  • Partnered with 2 scientific institutes to develop the solution further