hubmapconsortium/ccf-asct-reporter

Add Biomarkers to Azimuth ASCT+B Tables

Closed this issue · 6 comments

I see the mapping files provided by Jaison have Biomarkers in listed as well. See the tables here:
https://github.com/DarshalShetty/asctb-azimuth-data-comparison/tree/main/data/azimuth_ct_tables

Use this information to add biomarkers to the generated Azimuth ASCT+B tables

@bherr2 @DarshalShetty @katyb
Yes, on the markers to HGNC IDs and labels. Labels should be identical actually, just need HGNC IDs

One thing to note is that these are top 10 highly expressed genes in these cell types based on single cell RNA sequencing.

We discussed this with both Rahul and the community and this does NOT necessarily equate to being the “unique markers to identify cell type”

Be careful to NOT equate what we have in the ASCT+B tables to Rahul’s Azimuth since they are NOT necessarily the same information.

Top 10 list is NOT the same as markers that uniquely identify cell type.

Big difference.

They MIGHT have overlap, but they do NOT have to have overlap.

Ok, so sounds worth doing. Btw, will he need to look up what type of biomarker they are as well or are they one specific type?

Do the B* columns in the ASCT+B tables require an exhaustive list of Biomarkers?

Btw, @macvogelsang is factoring in a lookup service for ontologies that you might want to use to lookup the HGNC IDs eventually. Its not there yet, but you can take a look at his PR to see how he's looking up HGNC info by ID.

https://github.com/hubmapconsortium/ccf-asct-reporter/pull/168/files#diff-04f45f58ebe1316ad836c85ab0cd35787e132ae6fe2f7d2f5f5d1f5b1aacfbf7R15

I think this endpoint should do the trick: http://rest.genenames.org/search/symbol/KLF4 browser returns XML, but I believe if you fetch with Content-Type set to application/json it returns json instead. See Mac's PR again.

@bherr2 @DarshalShetty

There are approximately 20,000 genes in humans, so that’s what these are; each have a unique name, if he runs into any that do not match anything, please let me know.

Paul had some problems with not always finding all using API, but it may be if you don't ask for correct information.

I actually downloaded a spreadsheet from Hugo HGNC and put here on Google Drive:
https://drive.google.com/drive/u/1/folders/1urXkywnrAv4ESeuGXcDmDRGP6kx8zIEf

NO, this is NOT required to be exhaustive, but it IS required to be uniquely identifying cell type.

Ok, so sounds worth doing. Btw, will he need to look up what type of biomarker they are as well or are they one specific type?

Do the B* columns in the ASCT+B tables require an exhaustive list of Biomarkers?

I think this endpoint should do the trick: http://rest.genenames.org/search/symbol/KLF4 browser returns XML, but I believe if you fetch with Content-Type set to application/json it returns json instead. See Mac's PR again.

As said, Paul had some problems using the HGNC API, so let me know if you have problems and can't find everything.