This is the GitHub repository for the social media phenotype paper, currently live as a preprint here. Please contact k.slater [AT] bham.ac.uk with any questions.
- The processed results of the analysis, with the BDLP and SMP associations, can be found in ./data.json.
- You can also browse the results here (it is a bit friendlier)
- The rest of the files concern scripts and code for evaluation (in scripts/) and intermediate data or external public data in (data/)
- The original transactions are not made available, according to our licence agreement with White Swan.
You can skip this phase, but if you want to regenerate the literature phenotypes and their mappings to DOID, you can run the following:
This interpolates the raw CSV files provided by WS into a single file.
We will also edit it to remove the DOIDs that don't exist from the file.
Result: data/raw_transactions.tsv
This takes the data/raw_transactions.tsv, propagates phenotypes, calculates NPMI, creates transaction profile for permutations.
Run the R script!
This contains the code to look at perplexity, facets, radarcharts etc.
This will create the similarity matrix between diseases in the BL-DP and the SM-DP.
calculate average IC for constitutional sympyoms
We don't include the original JSON files with the responses, but the data sheet that it produces is stored in data/review/responses.tsv
This is produced using Klarigi
klarigi --data data/klarigi_input/input.tsv --resnik-ic --debug --min-exclusion=0 --min-ic=0.6 -o data/hp.owl --output-scores --output-type=latex --egl --min-inclusion=0.02 --scores-only
klarigi --debug --data data/create_facet_counts/bldp_constitutional.tsv -o data/hp.owl --verbose --output-type=latex --output-scores --scores-only --egl --min-exclusion=0 --min-ic=0 --min-inclusion=0.04 --include-only-classes=HP:0025142
klarigi --debug --data data/create_facet_counts/smdp_facet_profiles.tsv --group="smdp_all_constitutional" -o data/hp.owl --verbose --output-type=latex --output-scores --scores-only --egl --min-exclusion=0 --min-ic=0 --min-inclusion=0.04 --include-only-classes=HP:0025142