nasa-petal/PeTaL-labeller

The PeTaL labeler labels journal articles with biomimicry functions.

Jupyter NotebookUnlicense

Issues

Preprocess_golden.py
#88 opened 2 years ago by pjuangph
0
Get to 90% precision
#22 opened 3 years ago by bruffridge
1
Create a multi-label classification model for the most used labels.
#70 opened 3 years ago by bruffridge
0
Change output of labeller from absolute "selection" to "ranking" using confidence scores.
#83 opened 3 years ago by bruffridge
1
Match vocab file PeTaL.emb
#86 opened 3 years ago by pjuangph
1
Use open-source Snorkel to create labelling functions to expand our training dataset.
#65 opened 3 years ago by bruffridge
2
Look into Label Studio to help increase the size of our labelled dataset for training
#80 opened 3 years ago by bruffridge
0
Explore tokenizer optimization strategies to improve precision/recall.
#84 opened 3 years ago by bruffridge
0
See if using an ensemble method with MATCH improves performance.
#75 opened 3 years ago by bruffridge
0
Use a large ensemble of single-label classifiers (so, treat all of the labels independently, ignore the hierarchy, and we have a hundred separate yes/no tasks) and see if this works better than MATCH
#74 opened 3 years ago by bruffridge
0
Create a binary classification filtering model for the top 25% of labels
#69 opened 3 years ago by bruffridge
0
Compare MATCH results to auto-sklearn
#56 opened 3 years ago by bruffridge
1
Try using a tree of multilabel classifiers
#81 opened 3 years ago by bruffridge
1
Run Match on just level 1 labels
#73 opened 3 years ago by bruffridge
4
See how replacing random weights with pretrained and fine-tuned weights in MATCH affects performance
#72 opened 3 years ago by bruffridge
4
Produce metrics to show which labels are being classified correctly and which aren't, and how they're being misclassified.
#61 opened 3 years ago by bruffridge
5
Add a description of metrics to MATCH's README
#77 opened 3 years ago by bruffridge
1
Try Google Cloud AutoML Natural Language's multilabel text classification on golden dataset
#78 opened 3 years ago by bruffridge
1
try to use a language model like GPT-2 for its general-purpose language understanding capabilities (and then integrate it with the MATCH classification task somehow)
#76 opened 3 years ago by bruffridge
0
Build a data pipeline for running unlabelled papers through the labeller
#45 opened 3 years ago by bruffridge
5
Analyze our training dataset to discover ways to improve it to improve labelling accuracy
#52 opened 3 years ago by bruffridge
1
Compare performance of only including leaf labels in the dataset.
#60 opened 3 years ago by bruffridge
1
Look into using a relevancy threshold vs. top k for labelling
#55 opened 3 years ago by bruffridge
1
Look into using MATCH to improve the labeler
#42 opened 3 years ago by bruffridge
6
Does adding MeSH terms and/or MAG fields of study improve accuracy?
#53 opened 3 years ago by bruffridge
3
rerun ablation study
#68 opened 3 years ago by bruffridge
1
Prepend the title to the abstract before running it through the ML model.
#23 opened 3 years ago by bruffridge
0
Replace MAG topics with another topic taxonomy
#59 opened 3 years ago by bruffridge
1
See if we can expand our training dataset by leveraging NLM MeSH labels or Microsoft Academic topics
#21 opened 3 years ago by bruffridge
1
Create a POC for integrating a Colab classification model with weights and biases
#44 opened 3 years ago by bruffridge
0
Do k-fold cross validation to generate ablation study results for including MAG and MeSH labels.
#58 opened 3 years ago by bruffridge
1
Plot a graph that shows how much adding additional training data improves labeller accuracy.
#40 opened 3 years ago by bruffridge
1
Only 409 papers with labels in cleaned_lens_output.json, should be 701
#50 opened 3 years ago by bruffridge
1
Create a label hierarchy for MATCH input
#47 opened 3 years ago by bruffridge
0
Verify input format expected by MATCH for unlabelled papers
#48 opened 3 years ago by bruffridge
3
Prepare train/test data for MATCH
#43 opened 3 years ago by bruffridge
0
CORE dataset vs. Semantic Scholar
#17 opened 3 years ago by bruffridge
1
Look into how we might use SPECTER to improve our labeller
#41 opened 3 years ago by bruffridge
4
Make use of the citation graph to improve labeling
#37 opened 3 years ago by hschilling
1
Identify the characteristics of a paper describing thermal management in nature
#39 opened 3 years ago by bruffridge
0
Test out how well the ML model abstains from applying labels to abstracts that don't belong within the biomimicry taxonomy.
#19 opened 3 years ago by bruffridge
2
duplicate code?
#31 opened 3 years ago by bruffridge
4
Investigate TF-IDF
#36 opened 3 years ago by pjuangph
0
Adapt Huggingface predictor to PubMed
#28 opened 3 years ago by pjuangph
1
Add code to shruti's labeller to estimate accuracy of train and validation datasets
#27 opened 3 years ago by pjuangph
0
Look into most cost effective way to run the Sagemaker endpoint
#33 opened 3 years ago by bruffridge
0
ERROR: No matching distribution found for torch==1.8.1+cpu
#32 opened 3 years ago by bruffridge
0
Model Error: Please provide a model_fn implementation
#26 opened 3 years ago by bruffridge
1
Look into domain adaptive pre-training
#24 opened 3 years ago by bruffridge
0
Instrument Labebller using WandB
#18 opened 3 years ago by pyvelepor
0