Helper scripts for the backyward-worlds-planet-9 project
- Install dependencies
- Python2.7
- PIP
- astropy
- Clone repo
git clone https://github.com/dancaselden/backyard-worlds-scripts.git
cd backyard-worlds-scripts
- Extract annotations from Zooniverse Classifications CSV
python -m byw.formats.annotationextract data/backyard-worlds-planet-9-classifications.csv data/test_annotations.csv
- Extract subjects from Zooniverse Subjects CSV
python -m byw.formats.subjectextract data/backyard-worlds-planet-9-subjects.csv data/test_subjects.csv
- [Optional] De-dupe subjects. Subjects are listed once for each workflow, so you may optionally cut the set of samples down
sort -r data/test_subjects.csv | uniq > data/test_subjects.csv.bak; mv data/test_subjects.csv.bak data/test_subjects.csv
(-r for reverse sort to keep header at the top)
- [Optional] Download your set of BDs from simbad. Otherwise, use data/l_to_t_bds.txt
- Extract Simbad entries from Simbad ASCII format
Unfortunately, even though there's only ~2000 entries, this step uses astropy, which has a few bugs that cause extrodinary performance issues
python -m byw.formats.bdextract data/l_to_t_bds.txt data/test_bds.csv
- Join BDs and Subjects CSVs
python -m byw.formats.bdsubjectjoin data/test_bds.csv data/test_subjects.csv data/test_bdsubjects.csv
655 data/test_bdsubjects.csv{code}
I gotta investigate why it outputs 4 duplicate items:
{code}uniq data/test_bdsubjects.csv | wc -l
651
- Execute Step 1) from JOIN BD & Subjects reproduction steps
- Run stats.usercount script
python -m byw.stats.usercount data/test_annotations.csv