carderne/signal-export

[Windows 10] attachment filenames contain weird chars

Closed this issue · 7 comments

Hi. After struggling a bit I got Docker and your script running on my Windows 10 (Home) machine.

It seemed to work well, but I quickly found out that no images were displayed at all in the browser. The reason seems to be that the HTML source points to (for example) <img src="./media/2020-12-25T11:27:20.904000_00_None.jpeg">

Since filenames can't contain colons (on Windows at least afaik) it is clear why no images are displayed in the browser: it simply can't find the file. (I used both Chrome and Firefox.) However, I have got a folder full of images, videos etc. for each chat. However, they all have a weird character that looks a bit like a dot in their file names in place of the colons:
image

Expected behaviour would be to replace the colons with another character like an underscore or append h, m, and s to the numbers.

Some feedback regarding the instructions:

I struggled a bit to find out how to point the script at my Signal installation and to the output folder. From the looks of this part of the instructions

image

I assumed that I would need to enter the appropriate path into some configuration file somewhere but I didn't know where. ("Then set your input location depending on your OS."). I tried to set it as environment variables but the command line failed. (EDIT: maybe I would have needed to reboot or at least login again.) Only after a while did I realise that I could enter both paths directly as part of the command line. Then it worked in the way described above. Maybe you could spell it out a bit clearer in the readme (i.e. "set the paths as environment variables and reboot or use them in the command line")?

Finally, thank you (and the other collaborators!) for tackling this problem! I hope this can be made even better

The "dot" is Unicode character U+F03A. A lot of other people are also struggling with some of the quirks of WSL as can be seen here: microsoft/WSL#4609

@yringot Have fixed the colons in filenames issue, they are now replaced with hyphens. You may need to add --pull=always to the docker run ... command to make sure it pulls the latest version with these fixes.

About the env var instructions, can you confirm that, ultimately, copy-pasting the instructions from the README worked? Or did you need to do something different?

Have tweaked the README to be a bit more clear, but not sure it's much of an improvement, happy to hear more guidance.

@yringot Have fixed the colons in filenames issue, they are now replaced with hyphens. You may need to add --pull=always to the docker run ... command to make sure it pulls the latest version with these fixes.

Yes, it works really well!

About the env var instructions, can you confirm that, ultimately, copy-pasting the instructions from the README worked? Or did you need to do something different?

Have tweaked the README to be a bit more clear, but not sure it's much of an improvement, happy to hear more guidance.

I'll again just used the paths directly in the commands. Will investigate further.

Whoops, I spoke too soon, apparently. There's are still voice message that contain one colon:

image

It is between the minutes and seconds of the timestamp inside the original filename. For example:
HTML code: <source src="./media/2021-12-24T11-20-23.031_00_语音消息_12-24-21_19:20.m4a" type="audio/mp4">
filename: 2021-12-24T11-20-23.031_00_语音消息_12-24-21_1920.m4a

Hopefully fixed now in f47ffea with some more defensive replaceing.

Looking good now!