URL parameters can break temp files
GordianDziwis opened this issue · 6 comments
Describe the bug
2023-04-13 15:51:05,819 INFO Fetching from: https://statistik.leipzig.de/opendata/api/values?kategorie_nr=5&rubrik_nr=2&periode=y&format=csv...
[pid: 6|app: 0|req: 2/2] 127.0.0.1 () {32 vars in 501 bytes} [Thu Apr 13 15:51:05 2023] GET /job/54ebecf7-2a52-4347-a241-e6f651b7f9e0 => generated 930 bytes in 2 msecs (HTTP/1.1 200) 2 headers in 72 bytes (1 switches on core 1)
2023-04-13 15:51:06,124 ERROR Job "push_to_datastore (trigger: date[2023-04-13 15:51:05 UTC], next run at: 2023-04-13 15:51:05 UTC)" raised an exception
Traceback (most recent call last):
File "/usr/lib/datapusher-plus/lib/python3.10/site-packages/apscheduler/executors/base.py", line 125, in run_job
retval = job.func(*job.args, **job.kwargs)
File "/usr/lib/datapusher-plus/src/datapusher-plus/datapusher/jobs.py", line 471, in push_to_datastore
tmp = tempfile.NamedTemporaryFile(suffix="." + resource.get("format").lower())
File "/usr/lib/python3.10/tempfile.py", line 698, in NamedTemporaryFile
file = _io.open(dir, mode, buffering=buffering,
File "/usr/lib/python3.10/tempfile.py", line 695, in opener
fd, name = _mkstemp_inner(dir, prefix, suffix, flags, output_type)
File "/usr/lib/python3.10/tempfile.py", line 395, in _mkstemp_inner
fd = _os.open(file, flags, 0o600)
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmppdo2j9wn.text/csv'
To Reproduce
Steps to reproduce the behavior:
- Create a resource with this link:
https://statistik.leipzig.de/opendata/api/values?kategorie_nr=5&rubrik_nr=2&periode=y&format=csv
- Upload resource with datapusher-plus
Thanks for the report @BonaBeavis .
Can confirm this is a bug.
Where are the path for creating and reading the temp files created? I can look into this.
Looking at the error, it seems that we need to add some logic to sanitize the file extension.
The resource format attribute can sometimes be the full MIME type which contains a slash, and not just the extension.
@BonaBeavis , can you confirm and fix this?
Sure can you point me to the line, where the path is built?
Great!
The whole download section is here - check line 471 where we just grab resource.format.
Perhaps we should check that the resulting temp file is valid, and handle the case when the full MIME type is stored instead of just the file extension (e.g. application/csv
and not just csv
)
datapusher-plus/datapusher/jobs.py
Lines 429 to 555 in 3a3c696
Thanks!!