Pustur/whatsapp-chat-parser

Right-to-left languages, media limitations and fixed chat constants

reembar opened this issue · 5 comments

As I'm not a programmer, I'm not sure if I should post the issues here or on whatsapp-chat-parser-website.
I wanted to use the program to display a single fixed chat on my website, and ran into the following issues:

  1. when Hebrew or Arabic text is mixed with English words, the text become garbled. This is because the parser can not handle right-to-left languages. It also does not align them to the right.
  2. I wanted to display my chat with the original full-sized images and not those reduced by whatsapp (or omitted altogether). I found that the parser does not recognize png images as media, can not handle large movies (as they are converted to base64 instead of being downloaded directly from the file) and does not understand different media syntax in the filename (if you want to give the images more descriptive names).
  3. I'm missing an option to have the chat's filename and parameters defined as fixed constants without user intervention.

Thank you for your great project!

Hi @reembar,

Point number 1 is the only that applies to whatsapp-chat-parser so I'll address it here:

Thanks for your report. It is indeed possible that RTL languages don't play nice with the library as I have not extensively tested them yet. Would you be able to provide an example chat that I can use to test it? If you could send a sample directly to loris.bettazza@gmail.com that would be perfect.

It is also possible that the issue lies in the whatsapp-chat-parser-website project, as the RTL information may be correctly parsed but incorrectly displayed. So I'll have to investigate.


Related: Pustur/whatsapp-chat-parser-website#19

Thanks. I sent the message to the Gmail and also quote it here.
Attached is an example chat with a single message of mixed Hebrew and English.
When Hebrew (and I guess also Arabic) text is mixed with English words in the text, the sentence's order is garbled because the parser does not understand the right-to-left order.
Messages with only one language come out alright, but aligned to the left, and emoticons or punctuations placed at the end of a message are written on the left instead of the right, again because of the same issue.
Thanks for addressing the issue!
Reem
WhatsApp Chat to demonstrate RTL problem.txt

Hey again @reembar,

I have another question. This is how the chat looks when i open it with the default mac text editor. Is this the correct display?

Screenshot 2022-07-24 at 21 53 19

I guess the text should be aligned to the right of the window, but the Hebrew characters flow from right to left, that's correct right?

Yes, this display is correct.
As for the aligning, I think whatsapp decide about it according to the first letter: If it's an Hebrew one it aligns the message to the right and otherwise (if you send only an URL for example) it aligns it to the left.

Thanks for the info,

I can now confirm that this is a problem with the whatsapp-chat-parser-website project.
I made a script that parses your chat example and writes back the info in the same format as the input. Comparing the two files I can see that the two files are exactly the same, so no information is lost there, which means that the parser is doing its job correctly.

So in conclusion, it's the website that is not taking into account RTL languages.

So I'll close this issue as invalid, but I'll leave it open in the other repo.

Here's the small script for posterity:

const fs = require('node:fs');
const wcp = require('whatsapp-chat-parser');

const messages =
  wcp.parseStringSync(`7/23/22, 16:25 - ראם: הודעת נסיון בעברית😮 כדי לבדוק שילוב אנגלית, למשל כאשר כותבים על סרטונים ב-Youtube, כותבים על אתר כמו snark.co.il, מעירים הערה בסוגריים (like this) או מוסיפים סמליל בסוף ההודעה כך: 🙁

`);
const text = messages
  .map(m => {
    const date = `${m.date.getMonth() + 1}/${m.date.getDate()}/${m.date
      .getFullYear()
      .toString()
      .slice(-2)}`;
    const hour = `${m.date.getHours()}:${m.date.getMinutes()}`;

    return `${date}, ${hour} - ${m.author}: ${m.message}`;
  })
  .join('\n');

fs.promises
  .writeFile('reconstructed-chat.txt', text, 'utf-8')
  .then(() => console.log('File written'))
  .catch(console.error);