C0D3D3V/Moodle-DL

Same file with identical content gets downloaded again every time Moodle-DL is executed

Opened this issue · 9 comments

Description of the bug

There is a file in my moodle account that kept getting downloaded again every time I execute Moodle-DL. The old file then renamed to append _old_01.md
For example:
XXX_old_28.md
XXX_old_27.md
XXX_old_26.md

Steps to reproduce the issue

no argument

2024-01-21 21:42:08  DEBUG  {task}  [0] Starting Task: Task (0, File (module_id: 1, section_name: "Topic 1", section_id: "<id are the same>", module_name: "Lesson 1 exer", content_filepath: /, content_filename: "Lesson 1 exer", content_fileurl: "", content_filesize: 1790, content_timemodified: 0, module_modname: label, content_type: description, content_isexternalfile: False, saved_to: "", time_stamp: 0, modified: True, moved: False, deleted: False, notified: False, hash: <all hash are different>, file_id: None, old_file_id: None), Course (id: 5845, fullname: "Course Title", overwrite_name_with: "None", create_directory_structure: True, files: 26), TaskStatus(state=<TaskState.STARTED: 'STARTED'>, bytes_downloaded=0, external_total_size=0, error=None, yt_dlp_failed_with_error=False, yt_dlp_used_generic_extractor=False, yt_dlp_current_file=None, yt_dlp_total_size_per_file={}, yt_dlp_bytes_downloaded_per_file={}))
2024-01-21 21:42:08  DEBUG  {task}  [0] Renaming old file
2024-01-21 21:42:08  DEBUG  {task}  [0] Starting downloading of: Course Title/Topic 1/Lesson 1 exer.md
2024-01-21 21:42:08  DEBUG  {task}  [0] Creating a description file
2024-01-21 21:42:08  DEBUG  {task}  [1] Starting Task: Task (1, File (module_id: 2, section_name: "Topic 1", section_id: "<id are the same>", module_name: "Lesson 1 exer", content_filepath: /, content_filename: "Lesson 1 exer", content_fileurl: "", content_filesize: 1823, content_timemodified: 0, module_modname: label, content_type: description, content_isexternalfile: False, saved_to: "", time_stamp: 0, modified: True, moved: False, deleted: False, notified: False, hash: <all hash are different>, file_id: None, old_file_id: None), Course (id: 5845, fullname: "Course Title", overwrite_name_with: "None", create_directory_structure: True, files: 26), TaskStatus(state=<TaskState.STARTED: 'STARTED'>, bytes_downloaded=0, external_total_size=0, error=None, yt_dlp_failed_with_error=False, yt_dlp_used_generic_extractor=False, yt_dlp_current_file=None, yt_dlp_total_size_per_file={}, yt_dlp_bytes_downloaded_per_file={}))
2024-01-21 21:42:08  DEBUG  {task}  [1] Renaming old file
2024-01-21 21:42:08  DEBUG  {task}  [1] Starting downloading of: Course Title/Topic 1/Lesson 1 exer_01.md
2024-01-21 21:42:08  DEBUG  {task}  [1] Creating a description file
2024-01-21 21:42:08  DEBUG  {task}  [2] Starting Task: Task (2, File (module_id: 3, section_name: "Topic 1", section_id: "<id are the same>", module_name: "Lesson 2 exer", content_filepath: /, content_filename: "Lesson 2 exer", content_fileurl: "", content_filesize: 1802, content_timemodified: 0, module_modname: label, content_type: description, content_isexternalfile: False, saved_to: "", time_stamp: 0, modified: True, moved: False, deleted: False, notified: False, hash: <all hash are different>, file_id: None, old_file_id: None), Course (id: 5845, fullname: "Course Title", overwrite_name_with: "None", create_directory_structure: True, files: 26), TaskStatus(state=<TaskState.STARTED: 'STARTED'>, bytes_downloaded=0, external_total_size=0, error=None, yt_dlp_failed_with_error=False, yt_dlp_used_generic_extractor=False, yt_dlp_current_file=None, yt_dlp_total_size_per_file={}, yt_dlp_bytes_downloaded_per_file={}))
2024-01-21 21:42:08  DEBUG  {task}  [2] Renaming old file
2024-01-21 21:42:08  DEBUG  {task}  [2] Starting downloading of: Course Title/Topic 1/Lesson 2 exer.md
2024-01-21 21:42:08  DEBUG  {task}  [2] Creating a description file
2024-01-21 21:42:08  DEBUG  {task}  [3] Starting Task: Task (3, File (module_id: 5, section_name: "Topic 1", section_id: "<id are the same>", module_name: "Lesson 2 exer", content_filepath: /, content_filename: "Lesson 2 exer", content_fileurl: "", content_filesize: 1779, content_timemodified: 0, module_modname: label, content_type: description, content_isexternalfile: False, saved_to: "", time_stamp: 0, modified: True, moved: False, deleted: False, notified: False, hash: <all hash are different>, file_id: None, old_file_id: None), Course (id: 5845, fullname: "Course Title", overwrite_name_with: "None", create_directory_structure: True, files: 26), TaskStatus(state=<TaskState.STARTED: 'STARTED'>, bytes_downloaded=0, external_total_size=0, error=None, yt_dlp_failed_with_error=False, yt_dlp_used_generic_extractor=False, yt_dlp_current_file=None, yt_dlp_total_size_per_file={}, yt_dlp_bytes_downloaded_per_file={}))
2024-01-21 21:42:08  DEBUG  {task}  [3] Renaming old file
2024-01-21 21:42:08  DEBUG  {task}  [3] Starting downloading of: Course Title/Topic 1/Lesson 2 exer_02.md
2024-01-21 21:42:08  DEBUG  {task}  [3] Creating a description file
2024-01-21 21:42:08  DEBUG  {task}  [4] Starting Task: Task (4, File (module_id: 4, section_name: "Topic 1", section_id: "<id are the same>", module_name: "Lesson 2 exer", content_filepath: /, content_filename: "Lesson 2 exer", content_fileurl: "", content_filesize: 1769, content_timemodified: 0, module_modname: label, content_type: description, content_isexternalfile: False, saved_to: "", time_stamp: 0, modified: True, moved: False, deleted: False, notified: False, hash: <all hash are different>, file_id: None, old_file_id: None), Course (id: 5845, fullname: "Course Title", overwrite_name_with: "None", create_directory_structure: True, files: 26), TaskStatus(state=<TaskState.STARTED: 'STARTED'>, bytes_downloaded=0, external_total_size=0, error=None, yt_dlp_failed_with_error=False, yt_dlp_used_generic_extractor=False, yt_dlp_current_file=None, yt_dlp_total_size_per_file={}, yt_dlp_bytes_downloaded_per_file={}))
2024-01-21 21:42:08  DEBUG  {task}  [4] Renaming old file
2024-01-21 21:42:08  DEBUG  {task}  [4] Starting downloading of: Course Title/Topic 1/Lesson 2 exer_01.md
2024-01-21 21:42:08  DEBUG  {task}  [4] Creating a description file
2024-01-21 21:42:08  DEBUG  {task}  [1] Download finished
2024-01-21 21:42:08  DEBUG  {task}  [0] Download finished
2024-01-21 21:42:08  DEBUG  {task}  [2] Download finished
2024-01-21 21:42:08  DEBUG  {task}  [3] Download finished
2024-01-21 21:42:08  DEBUG  {task}  [4] Download finished

It says there are files that changed but the file is actually identical with no change in content.
See anonymized output below

14 changes found for the configured Moodle-Account.
Course Title
<file that are not moved to _old_XX.md are redacted>
≠       Course Title/Topic 1/Lesson 1 exer.md
≠       Course Title/Topic 1/Lesson 3 exer.md
≠       Course Title/Topic 1/Lesson 1 exer_01.md
≠       Course Title/Topic 1/Lesson 2 exer_02.md
≠       Course Title/Topic 1/Lesson 2 exer_01.md
≠       Course Title/Topic 1/Lesson 3 exer_02.md
≠       Course Title/Topic 1/Lesson 3 exer_03.md
≠       Course Title/Topic 1/Lesson 2 exer.md
≠       Course Title/Topic 1/Lesson 3 exer_01.md

Expected behavior

The file is only downloaded once if the content is identical, without moving old file to _old_XX.md

Possible Fix

Technical details

  • OS: Arch Linux with kernel 6.7.0-arch3-1
  • Moodle-DL Version moodle-dl 2.3.2.0

P.S. if you need a more detailed/anonymized log I can send it to you privately.

😅 There is probably something changing in the html file (that gets stripped out in markdown).
Do you know how to debug python?

Sorry I am not really a programmer. I can code basic scripts but not sophisticated softwares like Moodle-DL. However, I can probably get whatever info you need if instruction is given.

Please send me a screenshot of your course. Can it be that you have multiple lessons with the same name?

Mh I just tested it to have two lessons with the same name. For me it works, without redownloading. So I guess it is really something changing in the description of the lessons.

Its pretty funny that moodle adds links to the other lessons with the same name to the description ^^ (at least for one other lesson with the same name). If you have more than two lessons with the same name in the same section the links kind of make no sense. But moodle-dl downloads them correctly.
Edit: Correcting myself, moodle does not refer to lessons with same name. That was because I used the name of the lesson in the description. The links were generated by the auot linking feature: https://docs.moodle.org/403/en/Auto-linking

So I need a call with you, so we can debug this together. Maybe on discord. Contact me via mail please

you could also provide my the files that get always updated (including the old files) maybe I see there what is changing.

I wonder even, what "Lesson 1 exer.md" should be? Because moodle-dl normaly does not create such names.

Sorry for the late reply, I got a new account for a new Moodle instance and it is still having the issue:
SE (copy).md
SE (copy) (copy).md
SE (copy) (copy) (copy).md
SE (copy) (copy) (copy)_old.md
SE (copy) (copy)_old.md
SE (copy)_01.md
SE (copy)_01_old.md
SE (copy)_old.md

I suspect it is caused by multiple "section"(not sure if this is the right term) with the same name.
(sorry, I had to black out most of the things on screen to post this publicly, but hope this is enough to give some context of the duplicated files.)

image

That are not sections but labels. Labels have a name on moodle additionally to the text the label contains. If you make a copy of a label on moodle, it will add the "(copy)" suffix to the name of the copy.

If two labels have the same name, moodle_dl will add the _01 suffix to one of the labels.

If a label gets redownloaded, moodle-dl adds the (old) suffix to the file name of the old downloaded file.

I have to investigate how we could fix this. I probably need more information from you, e.g. via mail.
In any case we probably need to add a feature that allows to numbering items in a section.

PS. you can turn of downloading of labels if you disable downloading descriptions.

That are not sections but labels. Labels have a name on moodle additionally to the text the label contains. If you make a copy of a label on moodle, it will add the "(copy)" suffix to the name of the copy.

If two labels have the same name, moodle_dl will add the _01 suffix to one of the labels.

If a label gets redownloaded, moodle-dl adds the (old) suffix to the file name of the old downloaded file.

I have to investigate how we could fix this. I probably need more information from you, e.g. via mail. In any case we probably need to add a feature that allows to numbering items in a section.

PS. you can turn of downloading of labels if you disable downloading descriptions.

Would it be possible to add an option to prefix or suffix the downloaded file name with the ID of the label? For example, I see each label has a unique ID such as data-id="XXXXXX" in the <li> of that label. So that the final file will be something like XXXXXX-TN.md or TN.XXXXXX.md.

image

Not exactly like the data-id. The data-id comes not from the moodle database, but probably from some sort of web framework like Angular.

But each label has an instance id and an module id, we could use these. Alternatively, we can use the sort order numbers, so that you would have a files ftructure like:

1. First Section
1. First Section / 1. Label in First Section
1. First Section / 2. First File in First Section
1. First Section / 3.  Directory or Assignment 
1. First Section / 3.  Directory or Assignment /1. File in that Directory or Assignment
1. First Section / 3.  Directory or Assignment /2. File in that Directory or Assignment
1. First Section / 4. Label in First Section
1. First Section / 5. Another Label in First Section
2. Second Section
2. Second Section / 1. Label in that Section
2. Second Section / 2. First File in that Section
2. Second Section / 3.  Quiz 
2. Second Section / 3.  Quiz /1. File in that Quiz
2. Second Section / 3.  Quiz /2. File in that Quiz

That is basically requested in #217