foens/hpop

Exception reading message 2

krumok opened this issue · 9 comments

Hi, i get this error when reading a message:
System.FormatException: Input string was not in a correct format.
at System.Number.StringToNumber(String str, NumberStyles options, NumberBuffer& number, NumberFormatInfo info, Boolean parseDecimal)
at System.Number.ParseInt32(String s, NumberStyles style, NumberFormatInfo info)
at System.Int32.Parse(String s, IFormatProvider provider)
at OpenPop.Mime.Decode.EncodingFinder.FindEncoding(String characterSet)
at OpenPop.Mime.MessagePart.ParseBodyEncoding(String characterSet)
at OpenPop.Mime.MessagePart..ctor(Byte[] rawBody, MessageHeader headers)
at OpenPop.Mime.Message..ctor(Byte[] rawMessageContent, Boolean parseBody)
at OpenPop.Mime.Message..ctor(Byte[] rawMessageContent)
at OpenPop.Pop3.Pop3Client.GetMessage(Int32 messageNumber)

using GetMessage method
This happens on a specific message, on other messages works successfully.
Now i bypass this message with try catch to avoid blocking email pop but i cannot read the indicted message.

Can you define a fallback decoder that prints out the character set string and then post it here?
Something like:

EncodingFinder.FallbackDecoder = delegate(string characterSet)
{
    Console.WriteLine(characterSet);
    return null; 
};

where i have to add the FallbackDecoder?
here is my code (simplified):

    Dim Pop3C As New OpenPop.Pop3.Pop3Client
    Dim TotMail, CurrMail
    Dim IsConnected As Boolean

        Try
            Pop3C.Connect(cH, 110, False)
            Pop3C.Authenticate(cU, cP)
            IsConnected = True
        Catch ex As Exception
            IsConnected = False
        End Try

    If IsConnected And uscita <> "ERR-SRV" Then

        Dim EmailIds As New List(Of Integer)
        TotMail = Pop3C.GetMessageCount

        Dim delmail = 0
        Dim i = 1
        CurrMail = 1

        Dim cMessage As OpenPop.Mime.Message
        logstr = logstr & "connesso pec " & IsPEC.ToString & " totmail " & TotMail & Chr(13) & Chr(10)

        OpenPop.Mime.Decode.EncodingFinder.AddMapping("ISO8859-15", System.Text.Encoding.UTF8)

        If TotMail > 0 Then
            If TotMail > MaxEmails Then
                uscita = "ERR-SRV RESULT:=8 "
            Else
                For i = 1 To CInt(6)
                    Try
                        cMessage = Pop3C.GetMessage(CurrMail)
                        delmail = parseMessage(cMessage, iddip, utente_nome, 0, IsPEC)
                        If delmail > 0 Then
                            If delmail = 1 Then Pop3C.DeleteMessage(CurrMail)
                            i = i - 1
                        End If
                    Catch ex As Exception
                        logstr = logstr & "ERRORE Lettura email #" & CurrMail & ": " & ex.ToString & Chr(13) & Chr(10)
                    End Try

                    CurrMail = CurrMail + 1
                    If CurrMail > TotMail Then Exit For
                Next
                uscita = "OK RESULT:=3 "
                If delmail = 2 Then uscita = "OK-NONEW RESULT:=2 "
            End If
        Else
            uscita = "OK-NONEW RESULT:=2 "
        End If

        Pop3C.Disconnect()

I get the error "System.FormatException: Input string was not in a correct format. ..."
on cMessage = Pop3C.GetMessage(CurrMail)

I'd put it in the same place that you've put the custom mapping.

ok i add a reference to FallbackDecoder but with this particular message it not call the delegate but throw an exception. I've tried with other messages and calls to delegate works correctly and i can see characterSet in console log.
I post you the source message extracted from webmail:

Return-Path: <--------------------------->
Delivered-To: --------------------
Received: from localhost (localhost [127.0.0.1])
by -------------------- (Postfix) with ESMTP id A586829A077
for <------------------->; Mon, 22 Jun 2015 08:30:15 +0200 (CEST)
X-Virus-Scanned: ------- AntiSpam System
X-Spam-Flag: YES
X-Spam-Score: 7.21
X-Spam-Level: *******
X-Spam-Status: Yes, score=7.21 tagged_above=3.9 required=5
tests=[DNS_FROM_AHBL_RHSBL=2.025, HTML_MESSAGE=0.001,
HTML_MIME_NO_HTML_TAG=1.052, MIME_HTML_ONLY=1.672, SPF_SOFTFAIL=0.654,
SUBJ_ALL_CAPS=1.806]
Received: from --------------------- ([127.0.0.1])
by localhost (--------------------- [127.0.0.1]) (amavisd-new, port 10024)
with LMTP id GEi-54BQYLjN for <--------------------->;
Mon, 22 Jun 2015 08:30:06 +0200 (CEST)
Received: from smtpcmd02102.aruba.it (smtpcmd02102.aruba.it [62.149.158.102])
by --------------------- (Postfix) with ESMTP id 72E5129A071
for <--------------------->; Mon, 22 Jun 2015 08:30:06 +0200 (CEST)
Received: from BTGS.COM ([62.10.178.126])
by smtpcmd02.ad.aruba.it with bizsmtp
id jJW11q01E2k0jgm01JW2cY; Mon, 22 Jun 2015 08:30:05 +0200
MIME-Version: 1.0
From: "Giulia" <--------------------->
Reply-To: ---------------------
To: ---------------------
Subject: [Spam: MED] BANDI PER I SETTORI INDUSTRIA, TURISMO, COMMERCIO,
ARTIGIANATO.
Content-Type: text/html; charset="windows-1252http-equivContent-Type"
Content-Transfer-Encoding: quoted-printable
X-Mailer: SendBlaster.1.5.5
Date: Mon, 22 Jun 2015 08:28:39 +0200
Message-ID: 22722539762401220130843@Tiscali

I finanziamenti agevolati per le le aziende e le PMI,  rappresentano=
una risorsa fondamentale per il sostegno degli investimenti per la cre= scita e il consolidamento. Clicca qui.
Consulta il Portale Italiano.
Per cancellarti dalle news  sul = sito.

I've obscured sensitive data

If you take a look at the charset, it's "windows-1252http-equivContent-Type", which should just be "windows-1252". Add another custom mapping, like:

OpenPop.Mime.Decode.EncodingFinder.AddMapping("windows-1252http-equivContent-Type", System.Text.Encoding.GetEncoding(1252))

ok thank you
with the custom mapping i solve on that message.
Is it something that would be fixed in future release?

This is not something that is easily solvable, as the email you received is clearly invalid.

It would be possible to have this case handled by OpenPop by automatically stripping non-numerical characters from the string if it starts "windows-" or "cp-".
I don't have access to a C~ dev env at the minute, so you're welcome to submit a PR.

Having said that, it's a very specific case, with the charset obviously wrong so it may be better leaving it to throw an exception.

The way that MimeKit handles this is to:

  1. avoid the use of int.Parse() and instead use int.TryParse(): https://github.com/jstedfast/MimeKit/blob/master/MimeKit/Utils/CharsetUtils.cs#L218
  2. if that fails (in this case, it would), then it returns a codepage of -1
  3. when the codepage is -1, attempt to convert using UTF-8, followed by the user's default charset, followed by ISO-8859-1: https://github.com/jstedfast/MimeKit/blob/master/MimeKit/Utils/CharsetUtils.cs#L451
  4. since the user's default charset can be overridden on the ParserOptions, it's possible for the user to override it at parse time (but defaults to Encoding.Default).
  5. after the message is parsed, since each Header has a GetValue() method allowing you to specify a fallback charset to use, you can once again override it if the user decides that the text doesn't look right and wants to try another charset encoding - and no need to re-parse the entire message again, simply re-decode the individual header value(s) that the user wants.

MimeKit follows the philosophy that exceptions parsing messages should be avoided if possible and a sane fallback should be taken, but allowing the user to get access to the raw data and "re-try" if things didn't parse exactly right.

That said, @foens is correct that this particular message is invalid and trying to get the correct charset encoding out of that string is probably not worth the trouble. While in this particular case, stripping off text after the last numeric character might work, there are other charsets such as "iso-2022-jp" where you can obviously not do that and expect things to work.