Binary classification models to determine recipe relevancy trained using manually labelled Reddit comments from r/Cooking.
Please see the report for annotation guidelines, model methodology, metrics, and other details.
🗣️ Team: Eric, Sydney, Jake, Kristen, Yaxin
- A writeup for the assignment (
Report.pdf
) - A proposal for a multi-class recipe problem (
Proposal.pdf
) - Some Prodigy recipes (outputs not viewable in GitHub preview) (
Code.ipynb
) - Models for experiments 1 and 2
- All data used:
- Unlabeled training set (
homework2_train.jsonl
) - Unlabeled evaluation set (
homework2_eval.jsonl
) - Labeled evaluation set, uncombined (
hmwk2-eval-1000.jsonl
) - Labeled evaluation set, combined (
hmwk2-eval-final.jsonl
) - Labeled training set (
hmwk2-train-final.jsonl
)
- Unlabeled training set (
requirements.txt