Writing Streams Seem to have a Off by one buffer overflow
devilsclaw opened this issue · 5 comments
In the example below. It has already found all the Indirect Ids of a Annots AP -> N -> Indirect ID
This indirect id point to ePDFObjectIndirectObjectReference which contains a ePDFObjectStream
I am modifying all of them as the same in this test until I can move onto the next section.
What I have found is what ever the last byte/character is sent to the stream. It will be duplicated.
In this example I have a \n which is 0x0A I only have one but two of them are written. I thought maybe this is how the stream worked so I remove the \n and then I could two C's at the end of the stream.
for(size_t index = 0; index < annots_ids.size(); index++) {
annot_t* annot = annots_id[annots_ids[index]];
objects_context.StartModifiedIndirectObject(annot->ap_n_id);
PDFObjectCastPtr<PDFStreamInput> stream = parser->ParseNewObject(annot->ap_n_id);
PDFStream* pdfStream = objects_context.StartPDFStream();
IByteWriter* cmapWriteContext = pdfStream->GetWriteStream();
std::string value = "";
value += "/Tx BMC\n";
value += "q\n";
value += "BT\n";
value += annot->da + " 1 0 0 1 264.12 -0.09 Tm " + "\n";
value += "(Testing) Tj\n";
value += "ET\n";
value += "Q\n";
value += "EMC\n";
IOBasicTypes::Byte c[5] = {0};
for(size_t i = 0; i < value.size(); i++) {
c[0] = value.at(i);
cmapWriteContext->Write(c, 1);
printf("%c", value.at(i));
}
objects_context.EndPDFStream(pdfStream);
objects_context.EndIndirectObject();
}
With the \n at the end
ePDFObjectIndirectObjectReference: Start : value = 207
ePDFObjectStream:
2F 54 78 20 42 4D 43 0A 71 0A 42 54 0A 30 20 30
20 30 20 72 67 20 2F 48 65 42 6F 20 31 34 2E 30
30 33 20 54 66 20 31 20 30 20 30 20 31 20 32 36
34 2E 31 32 20 2D 30 2E 30 39 20 54 6D 20 0A 28
54 65 73 74 69 6E 67 29 20 54 6A 0A 45 54 0A 51
0A 45 4D 43 0A 0A
ePDFObjectIndirectObjectReference: End : value = 207
With the \n removed from the end
ePDFObjectIndirectObjectReference: Start : value = 207
ePDFObjectStream:
2F 54 78 20 42 4D 43 0A 71 0A 42 54 0A 30 20 30
20 30 20 72 67 20 2F 48 65 42 6F 20 31 34 2E 30
30 33 20 54 66 20 31 20 30 20 30 20 31 20 32 36
34 2E 31 32 20 2D 30 2E 30 39 20 54 6D 20 0A 28
54 65 73 74 69 6E 67 29 20 54 6A 0A 45 54 0A 51
0A 45 4D 43 43
ePDFObjectIndirectObjectReference: End : value = 207
When I edit the pdf with okular in linux it does not have this problem if two 0x0A at the end
Okular output
ePDFObjectIndirectObjectReference: Start : value = 207
ePDFObjectStream:
2F 54 78 20 42 4D 43 0A 71 0A 42 54 0A 30 20 30
20 30 20 72 67 20 2F 48 65 42 6F 20 31 34 2E 30
30 33 20 54 66 20 31 20 30 20 30 20 31 20 32 36
34 2E 31 32 20 2D 30 2E 30 39 20 54 6D 20 0A 28
54 65 73 74 69 6E 67 29 20 54 6A 0A 45 54 0A 51
0A 45 4D 43 0A
ePDFObjectIndirectObjectReference: End : value = 207
When I set SetCompressStreams to false the problem goes away. So its related to compression. I am not sure where that code is because when I tried to fallow it. It was just stubs and interfaces and the once instance of it being called.
hmm. if i'd suspect stream writing i'd go check the end stream code, that flushes some output.
but it might be a thing im seeing in the pdf_info code instead.
seems like it assumes that as long as stream->NotEnded() it's gonna get at least one char. this is not guaranteed.
in the reading code:
IOBasicTypes::LongBufferSizeType read = stream->Read(&c, 1);
try verifying that read
returns with 1.
if, for instance, the return is 0, this totally explains why you are seeing a double char...because you're just rewriting what you read in the previous iteration
I added in my test version and it still shows the double character.
IOBasicTypes::LongBufferSizeType read;
if((read = stream->Read(&c, 1)) != 1) {
break;
}
Also as I stated this only happens when compression is on, when I turn off compression it goes away.
objects_context.SetCompressStreams(true); //double character at the end
objects_context.SetCompressStreams(false); //expected output is shown
Never Mind. I added a pre check to my code for is it is all ascii which where it put the first break. I also needed to add it to the actual output and the problem went away. Still a bit weird that it happens only on the compressed version. Its also weird the it does not do it on the okular modified version.
Anyway you were correct I needed to check the return of the read and handle it correctly.
case PDFDictionary::ePDFObjectStream: {
PDFObjectCastPtr<PDFStreamInput> _value = value;
if(!dry_run) {
Byte c;
IByteReader* stream = parser.StartReadingFromStream(_value.GetPtr());
std::string prefix = string_format("%%-%lus", (depth + 1) * 2);
bool is_hex = false;
while(stream->NotEnded()) {
IOBasicTypes::LongBufferSizeType read;
if((read = stream->Read(&c, 1)) != 1) {
break;
}
if(!isascii(c)) {
is_hex = true;
break;
}
}
int pos = 0;
printf("\n");
printf(prefix.c_str(), "");
stream = parser.StartReadingFromStream(_value.GetPtr());
//is_hex = true; //force hex style
while(stream->NotEnded()) {
IOBasicTypes::LongBufferSizeType read;
if(read = stream->Read(&c, 1) != 1) {
break;
}
if(is_hex) {
printf("%02X", c & 0x0FF);
if(((pos + 1) % 16) == 0) {
printf("\n");
printf(prefix.c_str(), "");
pos = 0;
} else {
printf(" ");
pos++;
}
} else {
printf("%c", c & 0x0FF);
}
}
printf("\n");
}
break;
}