/FoodDatasetsQA

Research & code into how to assess dataset quality for food-related data

Primary LanguageJupyter NotebookMIT LicenseMIT

Quality research for Food Data

This repository stores code and supplementary material for study on the quality of food related data in consumer nutrition applications.

Data for testing code

You can

Abstract

There is a need for developers of consumer nutrition applications to accumulate food-related data. However, studies about the methods to assess the quality of food related data are scarce. This study lays a foundation for further research into quality of such data. The central part of the research is a way to solve merge problem that occurs when trying to merge multiple datasets into one with minimal number of duplicates. This study solves it using lexical similarity function. This study also proposes a range of metrics that can be used to gauge quality of a dataset and demonstrates their results