r-barnes/ArcRasterRescue

Invalid Position in Large Rasters

WilliamKappler opened this issue · 6 comments

We're running into an issue where with some of our larger rasters (around 4 GB) where reading the file goes wrong around the 2 GB mark.

We tracked down what is happening to this bit of logic:

    GotoPosition(geoDatabaseIndex, 16 + m_featureIndex*featureTableOffsets);
    int32_t featureOffset = ReadInt32(geoDatabaseIndex);

where featureOffset is returned as a negative value. After that, everything goes wrong and it eventually ends up stuck in an endless loop trying to read from an invalid stream position.

That's a 32-bit integer overflow issue.

Could you provide the file and line number of the code?

Sorry, looks like someone here made some changes to the file so it didn't match up exactly with the official. The specific place we are running into it is line 930, but there are two other uses of very similar logic in MasterTable::MasterTable and RasterBase::RasterBase.

I think you're right it's rolling over.

@WilliamKappler: The value is coming from ReadInt32. Looking over the places that's called from, I think it would be acceptable in every instance (except perhaps here and here) to have ReadInt32 return an unsigned 32-bit integer.

That should double the maximum size of the file you can read. For files beyond that, I don't think we have the spec reverse-engineered well enough.

Could you substitute the following code in and test?

uint32_t ReadInt32(std::ifstream &fin){
  #if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
    return ReadThing<uint32_t>(fin);
  #else
    #pragma message "Big-endian unimplemented"
  #endif
}

You were right it should be unsigned. Upon further inspection of our 4GB files, I found the 5th byte is in fact an additional part of the index value. Reading that and ORing it into the rest of the value lets us read huge rasters. I posted a pull request with our changes. Very much appreciate your help!

Thanks! I've accept the pull request and will close this issue for now. Feel free to contact me if you find any other problems. (I think there might still be issues with reading projections, for instance.)