Sage-Bionetworks/schematic

In JIRA: Lightly refactor schematic synapse storage to reduce querying file view or stop downloading fileview as a CSV

linglp opened this issue · 4 comments

Is your feature request related to a problem? Please describe.
Loren is working on a recurring script that will update the DataFlow manifest. This calls manifest/download, storage/projects, and then storage/project/manifests for each storage project underneath a given asset view. But every time we query the file view, under the hood synapse python client would download the file view table as a CSV. On AWS, fargate by default has 20GB ephemeral storage, and even if we could increase this to 200 GB, it won't be enough. In addition, querying file view for a big project like HTAN is very expensive. We want to avoid doing it as much as possible.

To do:

  • Thoroughly review synapse storage class and reduce calling _query_fileview as much as possible
  • Look through synapse python client and see iff we could avoid downloading the file view as a CSV

This is related to the Jira issue here: https://sagebionetworks.jira.com/browse/DCA-101

Describe the solution you'd like
A clear and concise description of what you want to happen.

How important is this feature? Select from the options below:
• 🌗 Medium - can do work without it; but it's important (e.g. to save time or for convenience)

When will use cases depending on this become relevant? Select from the options below:
• Mid-term - 2-4 months

Additional context
Add any other context or screenshots about the feature request here.

@afwillia suggested that we could potentially delete .synapseCache. I will look into this too

Hi @linglp , Thanks for logging this ticket. I noticed this is marked with sprint 6 and FAIR is planning on moving our sprint to JIRA in that sprint. Could I move this to JIRA?

@MiekoHash : Sure. I have moved the issue to here: https://sagebionetworks.jira.com/browse/FDS-62

Awesome. Thanks, @linglp . This will likely go through today's refinement meeting and move onto the dev triage meeting on coming Monday.