Pustur/whatsapp-chat-parser

media attachments

Closed this issue · 7 comments

Hello, congratulations on the fantastic project. Would that be possible to add support to WhatsApp attachments (images, videos, audio messages, etc.) in the form of links to corresponding files? The media files can be exported together with the chat.txt file. Thank you.

Thanks, glad you like the project :)

That's a cool idea but I'm unsure if I want to add it.
I have to do some more research because it could add a bit more complexity than I would like.

Current solution - Do it yourself

As it is now you shouldn't have any problem parsing a file that contains file "links" like:

‎[23.09.17, 01:14:15] Name Surname: ‎<attached: 00000209-PHOTO-2017-09-23-01-14-15.jpg>

Once you have the parsed messages, you could go through them and determine if it's a file link (with a regex maybe) and if it is, treat that message differently.

Proposal for future solution

Now If I were to add this functionality what would you expect as output?
I think the message that contains the attachment should have a few new properties. Something like this maybe?

{
  date: '2017-09-23T01:14:15.000Z', // Date object
  author: 'Name Surname',
  message: '<attached: 00000209-PHOTO-2017-09-23-01-14-15.jpg>',
  attachment: {
    fileName: '00000209-PHOTO-2017-09-23-01-14-15.jpg',
    mimeType: 'image/jpeg'
  }
},

Would the mimeType be necessary?
It would probably look at the extension to determine it.
It could also have non-negligible impact on package size.

With this structure you could easily check if a message is an attachment:

const isAttachment = message.hasOwnProperty('attachment'); // Will be truthy only for attachments
if (isAttachment) {
  console.log(
    message.attachment.fileName,
    message.attachment.mimeType
  );
}

Also when a chat is exported without attachments, should the messages <Media omitted> be categorized as attachments? Also these messages are different in each language...

EDIT3: Regarding the above, no. Attachments in chats exported "without attachments" are hard to identify. Sometimes they are surrounded by <> and sometimes they are not. Also they are in different languages so it would be too hard to do.

IDK seems like a nice feature but I could see this being a bit more complex than it looks.
Any input is welcome.

EDIT: Another thing to consider is how many file types are supported by whatsapp, and to what extent that would impact this feature, if at all.

EDIT2: Also I have to check what happens when you send multiple images at once. I think whatsapp will treat them as separate messages but I have to make sure. (Yes, they are separate messages)

Thank you so much for your prompt response!! Unfortunately, I am not very versed in the technical aspects of the potential solution. I run the parser through http-server and have no experience with java scripting. I was simply envisioning that the script while parsing the chat.txt file would provide a URL link to the attached files for example, "http://localhost/whatsapp-parser/media/00000209-PHOTO-2017-09-23-01-14-15.jpg" when an attachment is encountered.
Thanks again.

Well in that case the url probably depends on your environment (how you run the server, what port do you use, where the media files are stored, etc.) so I don't think I could provide a full link to the attachment.

What I could do, as described above, is at least differentiate the messages that contain an attachment and make it easier for the user to build the link themselves.


Anyway, I finally found a repo that I was looking for since yesterday. It's a react webapp where you can upload a zip file with your chat + media and it replaces the attachments with actual <img> or <video> tags ecc. (maybe you're trying to do something similar?)

It uses my library (an old version) but it could be useful to see how he does it:

Main file:
https://github.com/iddan/whatsapp-chat-viewer/blob/master/src/App.js

Website:
https://whatsapp-chat-viewer.netlify.app


Also there's a question on stackoverflow that was trying something similar, this time instead of converting the media files in base64, he converts them to a blob, and generates the url with URL.createObjectURL

https://stackoverflow.com/questions/62318128/


This is probably a bit overwhelming if you have no experience with javascript, but hopefully you'll find something useful there. If you have any questions you can always reach me here or even by email.

Yes, thanks again for the info, the link above to the project and the viewer depicts exactly how I would envision to have the attachments parsed. However, the project you referred to is so much outdated and buggy; it would be great if you implemented a similar solution into your code and design.

Ok, I just want to let you know that even if I decide to implement this feature it won't be anytime soon. I have work to do this month and then I'll (hopefully) get a PS5 the 19th so that's gonna take me out of coding for a while 😄.

I don't think I'm gonna be able to work on it before mid/late december, we'll see. I won't make promises on a timeline for now.

Just FYI, I started the development of the feature, you can find it in the feature/attachment-support branch.

Here are the built files if anyone wants to try it: whatsapp-chat-parser-dev.zip

The feature must be enabled with the option parseAttachments, it will output something like this when the message has an attachment:

{
  date: '2017-09-23T01:14:15.000Z',
  author: 'Name Surname',
  message: '<attached: 00000209-PHOTO-2017-09-23-01-14-15.jpg>',
  attachment: {
    fileName: '00000209-PHOTO-2017-09-23-01-14-15.jpg',
  },
}

I left out the mime type because it would be too complex and out of the scope of the project, if you need it you will have to use another package to guess the mime type based on the file extension.

I will test the feature in the upcoming days/weeks and see if I find any edge cases, eventually I'll release it as 3.1.0.

Released as 3.1.0