In JIRA: Lightly refactor schematic synapse storage to reduce querying file view or stop downloading fileview as a CSV
linglp opened this issue · 4 comments
Is your feature request related to a problem? Please describe.
Loren is working on a recurring script that will update the DataFlow manifest. This calls manifest/download, storage/projects, and then storage/project/manifests for each storage project underneath a given asset view. But every time we query the file view, under the hood synapse python client would download the file view table as a CSV. On AWS, fargate by default has 20GB ephemeral storage, and even if we could increase this to 200 GB, it won't be enough. In addition, querying file view for a big project like HTAN is very expensive. We want to avoid doing it as much as possible.
To do:
- Thoroughly review
synapse storage class
and reduce calling_query_fileview
as much as possible - Look through synapse python client and see iff we could avoid downloading the file view as a CSV
This is related to the Jira issue here: https://sagebionetworks.jira.com/browse/DCA-101
Describe the solution you'd like
A clear and concise description of what you want to happen.
How important is this feature? Select from the options below:
• 🌗 Medium - can do work without it; but it's important (e.g. to save time or for convenience)
When will use cases depending on this become relevant? Select from the options below:
• Mid-term - 2-4 months
Additional context
Add any other context or screenshots about the feature request here.
@afwillia suggested that we could potentially delete .synapseCache. I will look into this too
Hi @linglp , Thanks for logging this ticket. I noticed this is marked with sprint 6 and FAIR is planning on moving our sprint to JIRA in that sprint. Could I move this to JIRA?
@MiekoHash : Sure. I have moved the issue to here: https://sagebionetworks.jira.com/browse/FDS-62