jbostoen/itop-jb-mail-to-ticket-automation-v2

Feature: only keep new parts

jbostoen opened this issue · 2 comments

Note: this feature can be sponsored!

Use case: agent responds from iTop to the client. E-mail is sent to the client; client hits "reply". Automatically the e-mail message with the agent's reply is included.

It may be interesting in an e-mail based ticket system to only keep the "new" part of the email and strip the other parts.

Scenarios to consider, which complicate the implementation:

Just splitting by a basic pattern or occurrence of certain HTML is a bad idea:

  • people will answer inline in the original text.
  • people will sometimes forward e-mails; which triggers some email clients to add HTML markup which we could have used to detect "the original message" from our use case. However, this time it would be stripped incorrectly.
  • different e-mail clients use different HTML tags and sometimes manipulate content

Ideal situation would be if log entries could be queried separately. However, iTop still doesn't support this.

Other difficulties: the current implementation of Mail to Ticket (Combodo and this fork) build a description and also process inline images etc, adding new links with random IDs.

So the basic approach, considering the current limitations:

  • fetch case log entries; process in reverse order (most recent first as this is most likely replied to). Do not just consider the latest one.
  • build the description of the new e-mail. Replace (some/all?) URL structures. Compare against the case log entries. Once there's a match, strip and stop processing.
  • should also consider that during original processing, inline images may be added that actually get stripped (and should be removed again!)

Known limitations:

  • might not work well with short/similar replies, but this would be rare.
  • if two identical emails get sent for some reason, it's possible an empty case log entry would be created?

Investigation needed:

  • how much of the tags clients add are removed by iTop? Is different formatting (whitespace) an issue?
    • seems to be covered by iTop
  • what if the original message is within a div with some mark up, should this be configurable for removal? What are the defaults? This might be related to the "basic"/non-intelligent approach on how to strip content?
  • what if there's just a reply where for instance " >" is added in front of each line?

Another interesting note in the design: MS Outlook for example adds this HTML

<hr style="display:inline-block;width:98%" tabindex="-1">
<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> Some Name &lt;some@address.ext&gt;<br>
<b>Sent:</b> Friday, December 15, 2023 9:59:10 AM<br>
<b>To:</b> Jeffrey Bostoen &lt;some@address.ext&gt;<br>
<b>Subject:</b> RE: something</font>
<div>&nbsp;</div>
</div>
...

Now, note that iTop does HTML sanitizing; and that this has changed a bit over time already (for example: ordered/unordered lists).

https://www.itophub.io/wiki/page?id=latest:admin:rich_text_limitations

So when developing something, it may be worth considering whether this is compared against the original HTML (so also -temporarily- store the original HTML rather than the sanitized one).

French version:

De : Abc Def support@x.be
Envoyé : vendredi 29 décembre 2023 14:42
À : Abc Def x@x.ma
Cc : y@x.ma
Objet : [ R-000322 ] iTop: xxx

In iTop, after sanitizing, all the above content was in a SPAN element.

Also a point to consider: inline images (links).