This entire frontend application is developed using Next.js 12 and Typescript, not using Next.js 13.4 due to the @xenova/transformers
dependency called sharp
couldn't resolved in Next.js 13.4 (ended up migrating from app router
-> pages router
).
Styling based on shadcnui.com and Tailwind.
The model trained using HuggingFace Trainer API is being exported to onnx
using torch.onnx.export
which expects input_ids
and attention_mask
as inputs and outputs a regression value called logits
. The onnx
model then get further quantize into int8
precision using quantize_dynamic("", "", weight_type=QuantType.QUInt8)
.
In order to reduce the model deployment cost, edge deployment is used to load the onnx
model directly to client side. However, the model doesn't include tokenizing text out of the box. To address the tokenizer issues, see how to tokenize text in JS/TS in model/inference.ts. Not all tokenizer are being supported, see supported ones. Consequently, the ingredients can be tokenized and then serve as the forward pass inputs to the onnx
model.
Regarding the ingredients extraction from url, somehow this is quite manual. I used cheerio to scrape the elements nearby the h1 ... h6
tags and return it as list. See how I scrape the ingredients as an api.
This dataset is being normalized and can be downloaded here at huggingface hub.
See how I processed the dataset: process.ipynb.
See data/README.md for more details.
Note: The result should * 13.3627190349059 + 10.85810766787474
to obtain the actual sugar level.
The model is a fine-tuned version of distilbert-base-uncased-finetuned-sst-2-english and achieve 0.069200 loss.
See ziq/ingbetic at HuggingFace Hub and how to train the model using HuggingFace Trainer API here.
To all the framework and library maintainers.