Failed to copy file in harvest job
Opened this issue · 2 comments
Getting a lot of variants on this theme recently:
A RuntimeError occurred in background at 2023-11-08 22:40:12 UTC :
Failed to move /data/harvester_to_do/harvest_428/[REDACTED].flac to any of [#<Pathname:/data/original_audio/de/deaa90c7-32e7-458c-b380-db45aae829f4.flac>]
/home/baw_web/baw-server/lib/gems/baw-workers/lib/baw_workers/file_info.rb:106:in `copy_to_any'
-------------------------------
Backtrace:
-------------------------------
/home/baw_web/baw-server/lib/gems/baw-workers/lib/baw_workers/file_info.rb:106:in `copy_to_any'
/home/baw_web/baw-server/lib/gems/baw-workers/lib/baw_workers/jobs/harvest/harvest_job.rb:557:in `block in copy_file'
/usr/local/bundle/gems/semantic_logger-4.11.0/lib/semantic_logger/base.rb:374:in `measure_internal'
/usr/local/bundle/gems/semantic_logger-4.11.0/lib/semantic_logger/base.rb:97:in `measure_info'
/home/baw_web/baw-server/lib/gems/baw-workers/lib/baw_workers/jobs/harvest/harvest_job.rb:556:in `copy_file'
/home/baw_web/baw-server/lib/gems/baw-workers/lib/baw_workers/jobs/harvest/harvest_job.rb:491:in `harvest_file'
/usr/local/bundle/gems/semantic_logger-4.11.0/lib/semantic_logger/loggable.rb:110:in `harvest_file'
/home/baw_web/baw-server/lib/gems/baw-workers/lib/baw_workers/jobs/harvest/harvest_job.rb:233:in `process_steps'
/usr/local/bundle/gems/semantic_logger-4.11.0/lib/semantic_logger/loggable.rb:110:in `process_steps'
/home/baw_web/baw-server/lib/gems/baw-workers/lib/baw_workers/jobs/harvest/harvest_job.rb:171:in `perform'
/usr/local/bundle/gems/activejob-7.0.3.1/lib/active_job/execution.rb:59:in `block in _perform_job'
The errors occasionally happen.
Originally, I thought #660 was the root cause (a double harvest item was triggering a double harvest job).
But that doesn't match up with database state.
- we have one harvest item
- the associated audio_recording is set and linked to audio_recording_id
- the final result is audio_recordings.status is
aborted
becausecopy_file
throws - this can't be a uuid reuse situation because the audio recording is already saved to the database and the unique index would have prevented duplicates
- the files appear to exist already
- the existing file has a different hash to the source hash
- the existing file has a recent modification date!
- the existing file is empty! 0 bytes!
Ok theory:
- the file is opened
- something fails... somehow
- then we... somehow... attempt the copy again
- this bit confuses me though because we'd have to retry the operation somehow
- the job execution count is 0
- checked logs the job only ran on one server and only ran once
- it looks like we could be:
baw-server/lib/gems/baw-workers/lib/baw_workers/file_info.rb
Lines 82 to 106 in 51616e2
- but targets is an array of one path? so no
- ✅ which fails because the file exists (but is empty)
- This was confirmed by the logs
2023-11-08T22:40:10.671 W [96661:9300] [BawWorkers::Jobs::Harvest::HarvestJob] [harvest_job:item=774983:should_harvest=true] BawWorkers -- Failed to move /data/harvester_to_do/harvest_428/[REDACTED].flac to /data/original_audio/de/deaa90c7-32e7-458c-b380-db45aae829f4.flac: Failed to copy /data/harvester_to_do/harvest_428/[REDACTED] to /data/original_audio/de/deaa90c7-32e7-458c-b380-db45aae829f4.flac, exists?:true, size:0, source_size:94646884
This would match up with our recent media issues #663 which are also suffering because of 0 byte files. This is likely tied to our recent network storage change.
Possible courses of action:
- change copy method (FileUtils.copy seems to provide no error information at all)
- check to see if path exists before copying
- diagnostically useful for this bug
- but also allows us to confidently know we're not mutating data from another file
- clean up after ourselves if copy fails (remove the zero byte stub)
- check to see if this an artefact of the current file system in use by Ecosounds
- repair files that have failed unnecessarily
- find any status aborted recordings from the last month
- delete the audio at the path if they are zero-bytes files
- can i retry the harvest job?
- looks like I can
baw-server/lib/gems/baw-workers/lib/baw_workers/jobs/harvest/harvest_job.rb
Lines 430 to 439 in 349ff8d
- looks like I can
An important update: our vendor noticed partial files were copied. Can't rely on the destination being 0 bytes as a reliable indicator of this problem.