Carto cannot handle "large and complex" requests
shaunanoordin opened this issue · 1 comments
Functionality Issue / External Dependency Issue
Initial reports:
In WildCam Gorongosa Lab's map explorer, users cannot download the CSV results if too many filters are active. (See below)
NOTE: this problem would apply to any WildCam Lab that uses Carto, which mean Darién should also be affected.
Investigation
Testing Steps
- Go to WildCam Gorongosa's map explorer - https://classroom.zooniverse.org/#/wildcam-gorongosa-lab/explorers/map/
- Set filters to "Species: Baboons"
- Click "Download". A CSV file will be downloaded, as expected. This is the baseline.
- Set filters to "Species: Baboons, Habitats: Limestone Gorge"
- Click "Download".
⚠️ Instead of downloading a CSV file, an error message will appear instead.
Inspecting the network requests shows that the error is coming from the external map database provider, Carto. The error is as follows:
Request:
GET https://MY-URL.carto.com/api/v2/sql?format=CSV&q=MY-SQL-QUERY
Response:
429 Too Many Requests
body: {
context: "limit"
detail: "datasource"
error: [
"You are over platform's limits: SQL query timeout error. Refactor your query before running again or contact CARTO support for more details."
]
}
Analysis
To sum up the problem, our external map database provider is experiencing timeout issues when it processes "large and complex" database requests.
-
What constitutes "large and complex" to the database is arbitrary but predictable. (Example: "show me photos that includes baboons AND limestone" is complex to the database, "baboons OR bushpigs" is simple. "baboons AND limestone" is large with > 500 photos, "bushpigs AND limestone" is small with < 500 photos)
-
Any database request that the external map database provider considers "large and complex" fails after a 5-second cutoff for processing time. This 5-second limit seems newly introduced, as we've never encountered an issue like this. Large and complex queries have been successfully downloaded before, albeit after a 10-20 second processing window.
Workarounds
Users can still download the WHOLE dataset (i.e. no filters, just click Download) and then perform their own filtering on their spreadsheet programs. I think the term is "using pivot tables in Excel"?
Status
Major problem.
One potential solution, I think, is to flatten the database (instead of camera, subjects, and aggregations tables, just merge them into one flat table, to avoid runtime JOIN functions) but that has its own problems. 1. there's no guarantee that the flattened database will work any better (maybe it's not the multiple runtime JOINs that are complex to Carto but the AND clauses??) given the time required to set it up, and 2. the problem is still with Carto's 5 second timeout.
Another solution is to move away from Carto altogether. Hmm!
Carto has been replaced with a Zooniverse-hosted map database server. Woohoo!