asmaloney/libE57Format

Double precision for Data3D

MCvin opened this issue ยท 20 comments

MCvin commented

return FloatNode( imf_, 0., ( pointRangeScale < E57_NOT_SCALED_USE_FLOAT ) ? E57_DOUBLE : E57_SINGLE,

return FloatNode( imf_, 0., ( angleScale < E57_NOT_SCALED_USE_FLOAT ) ? E57_DOUBLE : E57_SINGLE,

Not sure if there is a mistake in these conditions but in my case I needed to swap E57_DOUBLE and E57_SINGLE to enable double precision in my point clouds:

return FloatNode( imf_, 0., ( pointRangeScale < E57_NOT_SCALED_USE_FLOAT ) ? E57_SINGLE : E57_DOUBLE,
                  data3DHeader.pointFields.pointRangeMinimum, data3DHeader.pointFields.pointRangeMaximum );
return FloatNode( imf_, 0., ( angleScale < E57_NOT_SCALED_USE_FLOAT ) ? E57_SINGLE : E57_DOUBLE,
                  data3DHeader.pointFields.angleMinimum, data3DHeader.pointFields.angleMaximum );

Hmmm...

I find the naming of these (E57_NOT_SCALED_USE_FLOAT & E57_NOT_SCALED_USE_INTEGER), their values, and the way they are used a bit confusing.

I think @ptc-jhoerner will have to answer this one.

(I don't mean he has to, just that he's probably the one who can ๐Ÿ˜„ )

Having looked at this a little, I think this is a holdover from the original E57Simple that has been adapted by @ptc-jhoerner.

Edit: This is wrong. See next comment.

It looks like maybe the intent is to set pointRangeScaledInteger to:

  • 0.0 if you want to use ScaledInteger (E57_NOT_SCALED_USE_FLOAT)
  • < 0.0 if you want to use FloatNode w/doubles
  • > 0.0 if you want to use FloatNode w/floats

I think pointRangeScaledInteger should just be a bool. Then whether we are writing double or float should be derived from whether we are using a Data3DPointsData or a Data3DPointsData_d.

(Same issue with angleScaledInteger and timeScaledInteger.)

OK. After a bunch of experimentation, I think this is the situation...

Set pointRangeScaledInteger to:

  • 0.0 (E57_NOT_SCALED_USE_FLOAT) if you want to use floats (or just leave as default )
  • < 0.0 if you want to use doubles
  • > 0.0 to use scaled integers with this number as the scale (e.g. 0.001)

@MCvin Does this fit with the problem you were having? Set it to -1.0 to use doubles.

This is pretty confusing so I will rework this or at least document it better.

MCvin commented

I'm not sure since I don't understand what is pointRangeScale (or angleScale) and the global constant E57_NOT_SCALED_USE_FLOAT.

I just saw that in my case the condition as it is written:
return FloatNode( imf_, 0., ( pointRangeScale < E57_NOT_SCALED_USE_FLOAT ) ? E57_DOUBLE : E57_SINGLE,
led to false so using E57_SINGLE although I used the class Data3DPointsData_d to get double precision.

My fix was to swap the result of the condition, i.e.
pointRangeScale < E57_NOT_SCALED_USE_FLOAT ) ? E57_SINGLE : E57_DOUBLE
but this is maybe not a good fix.

I'm not sure since I don't understand what is pointRangeScale (or angleScale) and the global constant E57_NOT_SCALED_USE_FLOAT.

That's what I'm trying to figure out/explain in my previous comment ๐Ÿ˜„

If you set pointRangeScaledInteger to -1.0 instead of changing the code does it do what you expect?

MCvin commented

Ah ah sorry I didn't give enough context maybe ^^

I use libE57Format as a library in another project so I was hoping using the class e57::Data3DPointsData_d with
e57::Writer from E57SimpleWriter.h would be enough to use double precision points and angles like this for example:

e57::Data3DPointsData_d readBuffer3dData;
            readBuffer3dData.cartesianX = new double[bufferSize];
            readBuffer3dData.cartesianY = new double[bufferSize];
            readBuffer3dData.cartesianZ = new double[bufferSize];

That's why I thought maybe the if condition was incorrect and I prefer not to hardcode the value pointRangeScaledInteger.

Does it makes any sense ?

I prefer not to hardcode the value pointRangeScaledInteger

? I don't understand. The whole purpose of the PointStandardizedFieldsAvailable struct is to describe the data you're going to be feeding the library. It's meant to be set.

If you set pointRangeScaledInteger to -1.0 instead of changing the code does it do what you expect?

MCvin commented

I never use PointStandardizedFieldsAvailable nor pointRangeScaledInteger.
I only use

        // Instantiate reader and writer object
        e57::Reader eReader(readFilePath);
        e57::Writer eWriter(writeFilePath);
...
        int64_t writeScanIndex = eWriter.NewData3D(readScanHeader);
...
            e57::Data3DPointsData_d readBuffer3dData;
            readBuffer3dData.cartesianX = new double[bufferSize];
            readBuffer3dData.cartesianY = new double[bufferSize];
            readBuffer3dData.cartesianZ = new double[bufferSize];
...
            e57::CompressedVectorWriter dataWriter = eWriter.SetUpData3DPointsData(writeScanIndex, bufferSize, readBuffer3dData);

I followed the simple API tutorial : 
http://libe57.org/TutorialSimpleAPI.html

I never use PointStandardizedFieldsAvailable nor pointRangeScaledInteger.

You do actually.

int64_t writeScanIndex = eWriter.NewData3D(readScanHeader);

Look at your readScanHeader. It's a e57::Data3D which contains:

PointStandardizedFieldsAvailable pointFields;

You have to be setting fields in it (or reading it from the file) in order for the library to read/write anything. It's how the library knows the available fields.

(In the case of reading you don't get to choose whether you are using double, float, or scaled ints - it uses whatever the file contains. When writing you can choose to use double, float, or scaled ints - by setting pointRangeScaledInteger!)

I followed the simple API tutorial

If you look at the writing example, the scanHeader.pointFields is a PointStandardizedFieldsAvailable struct and that is where you set pointRangeScaledInteger.

MCvin commented

Oh ok, I will have to look into this. But it is strange since I use the same scan header as the original file which contains
double precision positions for points.

My test consist in reading and writing the same file, with the scanHeader exactly the same as the input
and I end up with truncated values for points positions X, Y and Z.
That's how I ended up using e57::Data3DPointsData_d in the writing process hoping for it to fix the issue.
But it seems it is not sufficient.

I will have a look at the scanHeader and the value of pointRangeScaledInteger but my feeling is that something is strange since I just try to copy a file and use the exact same original header.

with the scanHeader exactly the same as the input

Aha! I think this is the issue. I think reading and writing are - in a sense - asymmetrical and this won't work. I agree that it should though.

If you look at ReaderImpl::SetUpData3DPointsData, when it creates the SourceDestBuffers it is explicitly setting doConversion to true. This means that it will convert to whatever the buffer type is (float or double) when reading - regardless of what's in the file.

So in your case what's probably happening is:

  • the file has stored floats (pointRangeScaledInteger is 0.0)
  • you're requesting doubles by using e57::Data3DPointsData_d
  • the floats get converted to doubles when reading
  • when you write it you aren't telling it you are using doubles by setting pointRangeScaledInteger
  • it writes FloatNode using floats

I looked at the original implementation. It stores everything as doubles and it looks like you control whether it writes floats or doubles using pointRangeScaledInteger (which is the code you referenced at the very beginning).

And looking at the history with this repo, I think this is what happened:

  • when the original "simple API" was brought over, a bunch of parameters were collapsed into a new struct called Data3DPointsData
  • but it was changed to use float buffers instead of doubles
  • someone wanted to use doubles, so they submitted a PR to make it a template
  • result: we now have two ways to deal with float/double which conflict with each other

I'm not sure what the correct solution is for this, but I think not having to explicitly manage all of these details would be better. The original way of using pointRangeScaledInteger as a combination of float/double/scaled int & also the scale if using scaled int is messy & confusing. The way the templates are done requires more management (as you've found out) which isn't obvious. And the (sort of hidden) implicit conversion on read doesn't help.

One hacky way to do it is to use the template type to set pointRangeScaledInteger appropriately, but this would really just hide the mess.

Since I'm making other breaking changes for 3.0, I might rework all of this if I get inspired.

(Also, you might find my last commit useful to require less boilerplate code and make some of this more robust.)

MCvin commented

Oh thank you for this feedback, it looks more complex than I thought initially.

I will try what you suggest, thanks a lot!

And good luck if you find the inspiration ^^

it looks more complex than I thought initially.

Me too ๐Ÿ˜† I didn't write any of this - I'm just the Bit Janitor - so I have to work through it as well.

The underlying API is flexible and the "simple" API was supposed build on it to provide a straightforward way to use it at the expense of that flexibility. Adding the double/float option to the Data3D pulls it back the other way and I think that's where the conflict lies.

I agree the way pointRangeScaledInteger is used is very confusing. We could simplify the API by removing the option to write scaled integer data and support only float and double. But even this might not achieve symmetric reading and writing - the essence of the problem is that for reading we have the correct prototypes for the data, so that we can convert all data to the float/double structures provided by the user correctly. This makes the read API quite simple.

For writing we need to know a bit more to support writing data with conversions, that is where this whole mess with pointRangeScaledInteger comes from. We could remove support for writing with conversions and always write just float or doubles, but that by itself does not guarantee symmetry with reading because of the conversions happening there. I would argue we should keep those conversions on reading, though, they are fully within the spirit of the simple API.

Thanks Jiri!

We could simplify the API by removing the option to write scaled integer data and support only float and double.

I don't want to lose that functionality - in the tests I'm working on now, using scaled int results in file sizes 1/4 the size of the double one, so it's pretty useful!

I would argue we should keep those conversions on reading, though, they are fully within the spirit of the simple API.

I agree.

I'm in the middle of writing up some tests for float/double/scaled so that I can make some more changes here. I think we can make it a little bit closer to symmetric read/write. (On read we need to set pointRangeScaledInteger properly which I think is what is causing @MCvin's example to fail.)

After that I will look at reworking how pointRangeScaledInteger is handled - it's jamming multiple concepts into one variable which is confusing. I think just making the options & intent explicit will help a lot with usability.

Thanks for looking into that!

@MCvin I think I fixed the main problem you were reporting waayyyy up there.

I added a rather long test in test/src/test_SimpleData.cpp that is meant to reproduce what you're doing. Please let me know if that's roughly what you have or if I'm missing anything.

Before I merge it, could you please try the branch to see how it works for you?

As outlined in my analysis, we don't handle all cases for all the scaled int fields, so we're still read/write asymmetrical...

MCvin commented

Thanks @asmaloney !

I was out for a few days so I didn't check yet, I will do it now.

MCvin commented

I just tried and your changes to master fixed it for my test case!
Hopefully for others too.

Thank you so much!

Excellent!

I've been making a lot of other changes to simplify the use of the API and help prevent mistakes/errors - see the tests for examples.

I have one more thing I want to do (automatically add some missing fields) then I'll let it sit for a while to wait for any feedback.