In JIRA: Lightly refactor schematic synapse storage to reduce querying file view or stop downloading fileview as a CSV

Question

In JIRA: Lightly refactor schematic synapse storage to reduce querying file view or stop downloading fileview as a CSV

linglp opened this issue 2 years ago · 4 comments

Is your feature request related to a problem? Please describe.
Loren is working on a recurring script that will update the DataFlow manifest. This calls manifest/download, storage/projects, and then storage/project/manifests for each storage project underneath a given asset view. But every time we query the file view, under the hood synapse python client would download the file view table as a CSV. On AWS, fargate by default has 20GB ephemeral storage, and even if we could increase this to 200 GB, it won't be enough. In addition, querying file view for a big project like HTAN is very expensive. We want to avoid doing it as much as possible.

To do:

Thoroughly review synapse storage class and reduce calling _query_fileview as much as possible
Look through synapse python client and see iff we could avoid downloading the file view as a CSV

This is related to the Jira issue here: https://sagebionetworks.jira.com/browse/DCA-101

Describe the solution you'd like
A clear and concise description of what you want to happen.

How important is this feature? Select from the options below:
• 🌗 Medium - can do work without it; but it's important (e.g. to save time or for convenience)

When will use cases depending on this become relevant? Select from the options below:
• Mid-term - 2-4 months

Additional context
Add any other context or screenshots about the feature request here.

Answer 1 · 2023-03-10T00:04:03.000Z

@afwillia suggested that we could potentially delete .synapseCache. I will look into this too

Answer 2 · 2023-03-10T01:01:05.000Z

Hi @linglp , Thanks for logging this ticket. I noticed this is marked with sprint 6 and FAIR is planning on moving our sprint to JIRA in that sprint. Could I move this to JIRA?

Answer 3 · 2023-03-10T16:29:23.000Z

@MiekoHash : Sure. I have moved the issue to here: https://sagebionetworks.jira.com/browse/FDS-62

Answer 4 · 2023-03-10T16:35:28.000Z

Awesome. Thanks, @linglp . This will likely go through today's refinement meeting and move onto the dev triage meeting on coming Monday.