gosling-lang/gos

feat: add `gosling.datasets`

manzt opened this issue · 4 comments

manzt commented

It might be nice to add some convenience exports for reusable example datasets for gosling. This could remove some of the boilerplate in the examples for:

import gosling as gos
- from gosling.data import multivec
+ from gosling.datasets import cistrome_multivec


- data = multivec(
-    url="https://server.gosling-lang.org/api/v1/tileset_info/?d=cistrome-multivec",
-    row="sample",
-    column="position",
-    value="peak",
-    categories=["sample 1", "sample 2", "sample 3", "sample 4"],
-    binSize=5,
- )
- base_track = gos.Track(data, width=800, height=100)

+ base_track = gos.Track(cistrome_multivec, width=800, height=100)

This would be really useful! We can refer to the list of public data used in JS editor:

https://github.com/gosling-lang/gosling.js/blob/97befe6be38eaa64fc0f79ced194c8094d5bfd9b/src/editor/example/gosling-data.ts

manzt commented

Great, thank you! It would make sense to export the "complete" datasets, unless these urls can be interpreted differently.

Eg.

cistrome = multivec(
  url="https://server.gosling-lang.org/api/v1/tileset_info/?d=cistrome-multivec",
  row="sample",
  column="position",
  value="peak",
  categories=["sample 1", "sample 2", "sample 3", "sample 4"],
  binSize=5,
)

vs:

cistrome = "https://server.gosling-lang.org/api/v1/tileset_info/?d=cistrome-multivec"

Yes, I think it makes sense since those datasets are used across multiple examples with the same data configs. I guess one can also use cistrome['url'] to access the URL and use different configs.

manzt commented

In retrospect, I don't think we should have "magic" datasets in Gos. it somewhat obscures The use of the API, and might be confusing to new users. I'm going to close for now.