This project is the work undertaken by Catalyst AI for Greenvale involving the classification of potatoes into midsize bands. This was achieved by using only the weight of the sample to be classified and using a KNN classifier fed historical data to decide the most likely band the sample should fall in.
In order to run this container you'll need docker installed.
Simply build the container
docker build -t greenvale-knn .
And run it whilst opening port 8000
docker run -p 8000:8000 greenvale-knn
There are currently two api endpoints offering a simple response with additional statistics and another that includes mu, CoV, and k in the response.
Takes an array of tuber samples and returns their classified midsize bands.
- URL
/classify
- Method
POST
- Data Params
Currently the classifier uses the first sample as reference to the variety being classified. This creates the requirement that for any single request, all the samples must be of the same variety.
[
{
"variety": "MarisPiper",
"tuber_id": "00001",
"sample_id": "510c1fb0",
"tuber_weight": 113.1
},
{
"variety": "MarisPiper",
"tuber_id": "00002",
"sample_id": "510c1fb0",
"tuber_weight": 103.3
}
]
- Response
- Code:
200
Content
- Code:
[
{
"sample_id": "510c1fb0",
"tuber_details": [
{
"variety": "MarisPiper",
"tuber_weight": 20.1,
"size_band": 27.5,
"tuber_id": "00154"
}
]
}
]
Takes an array of tuber samples and returns their classified midsize bands, along with calculated mu, CoV, and k for the input sample collection.
- URL
/expanded-classify
- Method
POST
- Data Params
Currently the classifier uses the first sample as reference to the variety being classified. This creates the requirement that for any single request, all the samples must be of the same variety.
[
{
"variety": "MarisPiper",
"tuber_id": "00001",
"sample_id": "510c1fb0",
"tuber_weight": 113.1
},
{
"variety": "MarisPiper",
"tuber_id": "00002",
"sample_id": "510c1fb0",
"tuber_weight": 103.3
}
]
- Response
- Code:
200
Content
- Code:
[
{
"sample_id": "510c1fb0",
"tuber_details": [
{
"variety": "MarisPiper",
"tuber_weight": 20.1,
"size_band": 27.5,
"tuber_id": "00154"
}
]
}
]
The training data is stored per variety in .csv
files within the training_data
folder. If more data is avaiable it can be added to the end of the csv in any order as long as the format is maintained.
When adding new varieties, the variety name and k-value for the knn classifier should be added to the constants.py
file. Also, the data relevant to the variety should be added to a new file named {variety_name}_TuberData.csv
with the data it contains in the same format as the pre-existing files.