karastojko/mailio

when parsing the header of this mime crashed!

Closed this issue · 14 comments

------=_Part_1448452_458257080.1707905695133
Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet;
	name*=UTF-8''%E8.xlsx
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
	filename*=UTF-8''%E8.xlsx

crashed function:
void mime::parse_header_value_attributes(const string& header, string& header_value, attributes_t& attributes) const

crashed information:
when parsing the line

Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet;
	name*=UTF-8''%E8.xlsx

it crashed i debug so long time but could not figure out the way to fix it.

Let me try to debug it.

So, this is the feature from RFC 2231 section 4 which is not implemented. Since there is already implementation for the parameter value continuations of the section 3, let me see whether I can easily add this one.
Thanks for reporting, hopefully I will have something to show in the following days.

Thanks for a lot.I just wondered why do you need to use a state machine to implement this。
Is it better to figure out with the regex for parse the mail message?
Im not good at this,I just say so.
And could I part into your team?
By the way ,you made a good mail lib.

Working on a separate branch to test possible approaches, there is more work to do.
There is no formal team, only contributors. You can create a PR for defects or features if you want to participate.

I know you re very busy .But I just want to know when to fix the issues.Could I do something for that?

Can you please try the latest changes from the branch whether they work for you?

The branch is merged to the main.

I've tested the branch u just mentioned.Eventhough there'is no errors,it dosnt work well.I guess the parsing part of the
Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet; name*=UTF-8''%E8.xlsx
something is still not right.It is hard for me to debug it.

finally I found that

example fisrt: Date: Wed, 14 Feb 2024 18:14:55 +0800 (GMT+08:00)
example second: Date: Wed, 07 Feb 2024 07:19:36 GMT
this caused the errors
could you fixed it ? here is the code i modified.It looks like not very effeciantly.

`

// date format to be parsed is like "Thu, 17 Jul 2014 10:31:49 +0200 (CET)"
regex r(R"(([A-Za-z]{3}[\ \t],)[\ \t]+(\d{1,2}[\ \t]+[A-Za-z]{3}[\ \t]+\d{4})[\ \t]+(\d{2}:\d{2}:\d{2}[\ \t]+(+|-)\d{4}).)");
// Wed, 07 Feb 2024 07:19:36 GMT modified added like this for google mail date format
regex r_extra_1(R"(([A-Za-z]{3}[\ \t],)[\ \t]+(\d{1,2}[\ \t]+[A-Za-z]{3}[\ \t]+\d{4})[\ \t]+(\d{2}:\d{2}:\d{2}[\ \t]+.))");
smatch m;
if (regex_match(date_str, m, r))
{
// TODO: regex manipulation to be replaced with time facet format?
// if day has single digit, then prepend it with zero
string dttz = m[1].str() + " " + (m[2].str()[1] == ' ' ? "0" : "") + m[2].str() + " " + m[3].str().substr(0, 12) + ":" + m[3].str().substr(12);
stringstream ss(dttz);
local_time_input_facet* facet = new local_time_input_facet("%a %d %b %Y %H:%M:%S %ZP");
ss.exceptions(std::ios_base::failbit);
ss.imbue(locale(ss.getloc(), facet));
local_date_time ldt(not_a_date_time);
ss >> ldt;
return ldt;
}
else if(regex_match(date_str, m, r_extra_1)){
// TODO: regex manipulation to be replaced with time facet format?
string t_default_zone = "+00:00";
// if day has single digit, then prepend it with zero
string dttz = m[1].str() + " " + (m[2].str()[1] == ' ' ? "0" : "") + m[2].str() + " " + m[3].str().substr(0, 8)+ " " +t_default_zone;
stringstream ss(dttz);
local_time_input_facet* facet = new local_time_input_facet("%a %d %b %Y %H:%M:%S %ZP");
ss.exceptions(std::ios_base::failbit);
ss.imbue(locale(ss.getloc(), facet));
local_date_time ldt(not_a_date_time);
ss >> ldt;
return ldt;
}
return local_date_time(not_a_date_time);
`

Hi, does the test fail in your case?

Regarding the header Date: Wed, 14 Feb 2024 18:14:55 +0800 (GMT+08:00) \n Date: Wed, 07 Feb 2024 07:19:36 GMT , why do you have it twice in the same line? (The line delimiter is \r\n not \n) Furthermore, the second date header is in the invalid format, so the parser will fail.

My mistake.It dose not contain the break line in the fact.
It failed when parsing the datetime.
what I added the code is just telling u the date formats I've met.
especially when using the gmail to send a mail.the date format became the errors.
Date: Mon, 5 Feb 2024 21:45:21 +0000 (GMT)
Date: Wed, 07 Feb 2024 07:19:36 GMT
Date: Wed, 07 Feb 2024 07:19:07 +0000

And when I parsed the content below,it failed but with no debuged information.It is hard for me to figure out the error codes.I pulled the mime data to the post.Here the error mime data.It is my pleasure if u fix the problem.

test.txt

To make a digression.Is it possible to parsed the mime date after recieved the whole message from the server,but not parsing the mime data at the same time of recieving? It always failed when recieving the message from the server cause there're so many different unstandard formats.For designing a popular lib,I think its better to reduce the dependency of the mime class and pops class.Its my unprofessional idea.

The second date example is invalid, how did you get it? The test.txt example has it correct.

Regarding the mime parser, do you suggest that POP3 should not parse line by line but at once?

First I think it is better that u should receive the whole mime data from the server.then parsing the data of the user what they needed.Meaning added a function named source_data something else that could get the whole data without parsing.
`

//the method which fetch the message

void pop3::fetch(unsigned long message_no, message& msg, bool header_only){
......
bool empty_line = false;
// added a new data_vars;
std::string the_whole_data;
while (true) {line = dlg
->receive();
_the_whole_data.push_back(line);
}
//here's the end of the method,then parsing the mime data from the _the_whole_data with another mehod
......
}
`
and the second date time format what I just mentioned is getting from the gmail server .Maybe is it a different area have a different standard?

Second the way parsed line by line always stopped by the unstandard mime format.So I just have an unprofessional idea if it's better to parse the data with using the regex,not with using the state machine strategy.

By the way the test.txt parsing failed with the headline.I will post the detail information a moment later.

///this code below I thought maybe is a bug for the loop
///the filename*=xxxx pattern of the mime parts became a error.Would u test it?

void mime::merge_attributes(attributes_t& attributes) const{
      map<string, map<int, string_t>> attribute_parts;
      for (auto attr = attributes.begin(); attr != attributes.end();)
      {
        auto full_attr_name = attr->first;
        auto attr_value = attr->second;
        string::size_type asterisk_pos = full_attr_name.find(ATTRIBUTE_MULTIPLE_NAME_INDICATOR);
        if (asterisk_pos != string::npos)
        {
            string attr_name = full_attr_name.substr(0, asterisk_pos);
            try
            {
                if (!full_attr_name.substr(asterisk_pos + 1).empty())
                {
                    int attr_part = stoi(full_attr_name.substr(asterisk_pos + 1));
                    attribute_parts[attr_name][attr_part] = attr_value;
                    attr = attributes.erase(attr);
                }
            }
            catch (const std::invalid_argument& exc)
            {
                throw mime_error("Parsing attribute failure at `" + attr_name + "`.");
            }
            catch (const std::out_of_range& exc)
            {
                throw mime_error("Parsing attribute failure at `" + attr_name + "`.");
            }
        }
        **//else  ///this line I removed takes no errors**
            attr++;
    }
 

And the mime below the parts filename came to the unrecognized code.So how to fix it

------=_Part_83623_1885261339.1710385977427
Content-Type: application/octet-stream; 
	name="=?GBK?B?MaHG2r3p0N3Fqcnf0NAyMDI0xOqhsfXWwPe3eLeeNPC0uM=?=
 =?GBK?B?x7DQfGxyte8vr+qw8W67dK1zvG+usj8u+62r9W9saguZG94A==?="
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
 filename="=?GBK?B?MTHG2r3p0N3FqcnM0NAyMDI0xOqhsPXWwPe33LeiINPC0uM=?=
 =?GBK?B?x7DQfGxyte8vr+qw8W67dK1zvG+usj8u+62r9W9saguZG94A==?="

Let me try it.