Garmelon/PFERD

Illegal Characters in file/directory names cause errors

JohnnyJayJay opened this issue · 15 comments

It appears that if a file or directory name in ilias contains an illegal character (such as : on Windows), this is not escaped or adjusted by PFERD and thus leads to an error being thrown.

This is the message I get eventually when trying to sync one of my courses (running pferd_sync_url.exe from the latest release on Windows 10 Pro, 64 Bit):

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ PFERD\errors.py:32 in inner                                                                      │
│                                                                                                  │
│ PFERD\pferd.py:282 in ilias_kit_folder                                                           │
│                                                                                                  │
│ PFERD\pferd.py:99 in _ilias                                                                      │
│                                                                                                  │
│ PFERD\ilias\downloader.py:109 in download_all                                                    │
│                                                                                                  │
│ PFERD\ilias\downloader.py:119 in download                                                        │
│                                                                                                  │
│ PFERD\ilias\downloader.py:63 in download_modified_or_new                                         │
│                                                                                                  │
│ PFERD\location.py:35 in resolve                                                                  │
│                                                                                                  │
│ pathlib.py:1204 in resolve                                                                       │
│                                                                                                  │
│ pathlib.py:205 in resolve                                                                        │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
OSError: [WinError 123] The filename, directory name, or volume label syntax is incorrect: 'D:\\syncurl_test\\Tutorien\\Tutorium 21: Alexander Hornig\\Linerare Algebra Tipps.pdf'

Normally a transform can solve that, but they are not available with sync_url. Maybe sync_url should pass a transform that just replaces special characters and hopes no file name collision happens.

Does this work? It should just silently drop forbidden characters. windows.zip

Does this work? It should just silently drop forbidden characters. windows.zip

Now it just shows "Searching ..." but doesn't download/save anything to the provided directory.

Okay, and this?

Okay, und das hier?

Works. I'd suggest replacing illegal characters with _ instead of nothing, though

Can you try the newest release?

Can you try the newest release?

It appears that it prepends the directory name to the file names instead of actually creating sub-directories. I double-checked with the previous version, it happens there too.

So foo/bar.pdf -> foo_bar.pdf

Yea, because \\ is the path separator on windows and I sanitize it away, I guess. I am not sure if I should also delete / but we will see... I can't really test on windows and some of the more interesting file problems are therefore a bit annoying.

Can you try the latest release (same number) again?

Will do in a minute. Seems like sanitising each element of the path individually would be the best bet.

The latest release doesn't include any executable. Can you perhaps give instructions on how to fetch the latest patch and build it myself or run it straight with python? I'm not too familiar with the structure of this project yet, but that would probably make this easier for both of us.

PlPt commented

Hint: Just move the sanitize_path function to ilias downloader or location, then it works with ilias_desktop crawl too

How would you go about implementing sanitization using the transform? I'm not too familiar with regular expressions.

@PlPt I don't want to always apply it, as my linux system can handle those characters just fine. I could provide a convenience method in the transform file though, then you could just do

from PFERD.transform import sanitize_windows_path

ilias_kit(
    ...
    transform=sanitize_windows_path
)

or use it in a do / attempt chain:

do(
    sanitize_path,
    move_dir(...),
    ...
)

@BertilBraun I can make a helper as outlined above, but in the meantime you can use this transform and just pass it as transform=sanitize_path.

EDIT: This adds a new helper to transform

@I-Al-Istannen Thank you very much

PlPt commented

Thanks for this modification.
I changed the code on my own, so the sanatize_path will be executed without a transform
Then I added multithreading scan and download so it's way much faster :)

@PlPt Yea, not multithreading was a conscious decision after ILIAS started throttling calls and PFERD was creating quite a few calls per second. We don't really want to be mean to the ILIAS instance, so it will likely stay single-threaded in the foreseeable future.