lune-org/lune

Floating point precision issue with CFrames (CoordinateFrame)

Opened this issue · 22 comments

Quenix commented

We are working with multiple places and integrating lune into our workflow. We are using lune to version the entire DataModel from each place. It is working greatly for merging changes from different places, but I am having an issue with floating point precision.

As you can see in the image down below some instances are randomly getting changed without any real change made in the data model, it happens frequently when pulling changes from one place and syncing into another, then pulling back.

Doesn't seems to affect any game, but it generates not needed staged files when commiting.

Thank you very much! I am open to provide more examples if needed!

Example 1

Going to mark this as a bug since it looks like this probably something we could fix pretty trivially by rounding to the nearest 1 / 2^x, and I don't really see any downsides to doing that if we just round to a small enough decimal

Quenix commented

Thank you for the quick reply on this! That would definitely fix our issue!

Quenix commented

Going to mark this as a bug since it looks like this probably something we could fix pretty trivially by rounding to the nearest 1 / 2^x, and I don't really see any downsides to doing that if we just round to a small enough decimal

hey, hi! Thanks for the great work on Lune! We are really loving this tool here at Voldex.
About this fix you mentioned, do you have an estimate when you are pushing it?

Thank you very much!

About this fix you mentioned, do you have an estimate when you are pushing it?

Yep, was just now working on the fix :)

Will be in the next release - was fixed in bfcd78c

Quenix commented

About this fix you mentioned, do you have an estimate when you are pushing it?

Yep, was just now working on the fix :)

Will be in the next release - was fixed in bfcd78c

Hey, thanks! I am still having issues with floating points even running lune locally with all the new changes.
So what I do is pretty much:

  • Download the place file from given place ID (Live, DEV or feature place)
  • Serialize every single instance into a rbxm file
  • Push everything to github

When I am serializing it is still showing the floating points issue. If downloading always the same place it isn't a problem, but when I try to merge from LIVE into DEV the floating points are more visible and generate thousands of file changes on git.

Is the fix applicable to this scenario or would be a different one?

Hey, thanks! I am still having issues with floating points even running lune locally with all the new changes.

Hmm, that's quite strange, I was convinced any cases of these tiny floating point errors would be fixed..
Do the components/numbers still look about the same as before? What does your merge process look like?

Quenix commented

Hey, @filiptibell ! I did this repo to replicate step-by-step how we are doing things in a simplified way. Check the latest commit in here: https://github.com/Quenix/lune-workflow-example

Steps:

This latest commit represents the issue with the floating points. Basically I synced MAIN into DEV-A and pulled again right after directly from DEV-A.

Expected result: No changes on git to commit, since I pulled right after publishing over it
What I've got: A few changes with the floating point issue.

Thank you very much!

Thanks for the detailed repro @Quenix ! I'll check this out and try to also repro it locally using your steps as soon as I am able to

Just for the sake of clarity, does the changed value happen if you use a binary file as the first one (i.e. the one you pull from main and process using the sync script)? Something as obvious as going from -0.000000007423843 to 0 (as happened here) should be easy to spot if it happens with the binary format too and it would help pin down where in the process this bug is happening.

Quenix commented

Just for the sake of clarity, does the changed value happen if you use a binary file as the first one (i.e. the one you pull from main and process using the sync script)? Something as obvious as going from -0.000000007423843 to 0 (as happened here) should be easy to spot if it happens with the binary format too and it would help pin down where in the process this bug is happening.

If I sync the same files into main and pull again it will be all fine. It only happens with different place IDs.
So pulling from main -> sync into main again -> pull again, zero changes.

Quenix commented

Hey, guys! How is it going? Do you think a possible workaround would be me doing this float rounding during serialization?

You can give it a try; none of the floats you're seeing get rounded should be noticeable in-game if they're rounded.

I think I have a suspicion as to what's happening though! To confirm my theory, can you try changing your script that pushes to DEV-A to use the XML format and see if you still get the diffs?

My suspicion is based on the fact that the only changes you're seeing are with the rotation parts of the CFrame and they're rounding towards orthogonal matrices. If a CFrame is orthogonal, it gets special representation in Roblox's binary format and network scheme, so in rbx_dom (which is the parsing library Lune uses for Roblox's file formats) we do some rounding to check whether they're almost right and if they are, we use the special representation. It's possible you're running into an edge case for that, but it would only happen when serializing the binary format.

Quenix commented

We wanted to have it as readable XML so this is why I am using like this:

 local file = roblox.serializeModel({ child }, true)
 fs.writeDir(`{fullPath}`)
fs.writeFile(`{fullPath}/{child.Name}.rbxm`, file)

On serializeModel if I use false it will bring the binaries which are not readable. So I am already using XML.

I was thinking about the CFrame thing that I mentioned, but I came out with another possible approach. I recorded a video with a quick explanation about my idea: https://www.youtube.com/watch?v=VRmydaNaau0

Let me know if that makes sense.

Thank you very much for the support, guys!

Right, I understand that you're using the XML format but in the step where you push to Dev-A, you serialize the place as a binary place file here: https://github.com/Quenix/lune-workflow-example/blob/main/build/scripts/sync.lua#L120

That's the step I was interested in. If changing that to be an XML place file fixes the problem here, it's a bug on our end and we need to look into it.

I recognize that's not sustainable to do long term so I'm not asking you to do that, I just want to verify that the issue you're seeing is what I think it is.

Quenix commented

Gotcha! I tried modifying it as you asked. Pulled data from main again to "reset" repo to original:
Quenix/lune-workflow-example@db7e8b8

Updated the sync process to use XML with parameter true in the function:
Quenix/lune-workflow-example@73babdc

  • I did run the script sync again from main to DEV-A.
  • I did run the script pull-changes-from-place to pull from DEV-A and compare with main now.
  • Result was promising, no changes detected!
  • So I opened the DEV-A place, and just moved an object called RedwoodTree-Var01 and published my changes
  • I did run the script pull-changes-from-place to pull from DEV-A again and now this is what I see in the changes:
    Quenix/lune-workflow-example@11631fa

The floating point issue happened when I opened and edited the place and published again. But pushing as a XML indeed fixed the occurrence of this issue right after syncing.

Let me know if you want me to provide more examples or chat, I'm open to provide any help/context.

Thank you, guys!

Quenix commented

@Dekkonot does that makes sense with what you asked me to test?

Quenix commented

Hey, just checking, is there anything else I could do to help with this issue?

Hi, sorry for the delay in getting back to you! I started a new job recently so things have been hectic in my life.

This is what I was expecting, and I'm not sure what the path forward for fixing it is. Our rounding appears as though it might be too aggressive, but we do want to round a little bit since it decreases the size of CFrames rather dramatically if they're orthogonal (13 bytes vs 49 bytes if they're even slightly off).

I'll look into whether we can make it less aggressive but it shouldn't be particularly strong as-is, so I'm rather confused by it. :-/

eAi commented

We're having a somewhat similar issue - basically, the values loaded in via the CFrame aren't the same as the ones that come out, so we end up with diffs that we don't want. We also see -0 turning into 0 and vice-versa.

Our workflow is also using a binary file for editing in Roblox Studio - it's basically:

rbxmx files -> Rojo (rbxm) -> lune (rbxm) -> Roblox Studio (rbxm) -> lune -> rbxmx files

For example:

            <CoordinateFrame name="CFrame">
              <X>-250.99507</X>
              <Y>-2385.6814</Y>
              <Z>-5072.552</Z>
              <R00>-0.9238739</R00>
              <R01>0</R01>
              <R02>0.38269043</R02>
              <R10>0</R10>
              <R11>1</R11>
              <R12>0</R12>
              <R20>-0.38269043</R20>
              <R21>0</R21>
              <R22>-0.92388916</R22>
            </CoordinateFrame>

vs

            <CoordinateFrame name="CFrame">
              <X>-250.99507</X>
              <Y>-2385.6814</Y>
              <Z>-5072.552</Z>
              <R00>-0.9238739</R00>
              <R01>0</R01>
              <R02>0.38269043</R02>
              <R10>0</R10>
              <R11>1</R11>
              <R12>0</R12>
              <R20>-0.38269043</R20>
              <R21>0</R21>
              <R22>-0.9238739</R22>
            </CoordinateFrame>

Our current workaround is to do two things:
a) Manually search the generated rbxmx file and replace all >-0</ instances with >0<
b) Run this script to check if the old version of the rbxmx (almost) matches the new one that we've just generated. We use a tolerance of 3 decimal places which seems to work so far, we may be able to get away with 4.

local function normalizeCoordinateFrames(xmlContent, precision)
    local function normalizeFrameBlock(frameBlock)
        return frameBlock:gsub("(<[^>]+>)([^<]+)(</[^>]+>)", function(startTag, numberString, endTag)
            local numberValue = tonumber(numberString)
            if numberValue then
                local roundedNumber = string.format("%." .. precision .. "f", numberValue)
                roundedNumber = roundedNumber:gsub("(%d)%.?0*$", "%1")
                if roundedNumber == "-0.0" or roundedNumber == "-0" then
                    roundedNumber = "0"
                end
                return startTag .. roundedNumber .. endTag
            else
                return startTag .. numberString .. endTag
            end
        end)
    end

    local tags = { "CFrame", "CoordinateFrame" }

    for _, tag in ipairs(tags) do
        local framePattern = "(<" .. tag .. "[^>]*>.-</" .. tag .. ">)"
        xmlContent = xmlContent:gsub(framePattern, function(frameBlock)
            return normalizeFrameBlock(frameBlock)
        end)
    end

    return xmlContent
end

local function compareXML(xmlContent1, xmlContent2, precision)
    local normalizedContent1 = normalizeCoordinateFrames(xmlContent1, precision)
    local normalizedContent2 = normalizeCoordinateFrames(xmlContent2, precision)

    return normalizedContent1 == normalizedContent2
end

This is the core of the logic that uses the above:

            local filename = root_extract_path .. "/" .. modelPath .. ".rbxmx"
            local oldFilename = "temp/" .. filename
            local newFile = fixMinusZero(roblox.serializeModel({model}, true))
            if fs.isFile(oldFilename) then
                local oldFile = fs.readFile("temp/" .. filename)
                if not compareXML(oldFile, newFile, 3) then
                    fs.writeFile(filename, newFile)
                else
                    fs.writeFile(filename, oldFile)
                end
            else
                fs.writeFile(filename, newFile)
            end

Hi @eAi , thanks for posting your workaround. Sad to hear that this is still an issue up to date! How is the experience so far doing these workarounds? Is it better than manually managing your assets by chance?

eAi commented

@Quenix We’ve got a more advanced version of the above script now that handles some cases this doesn’t (OptionalCoordinateFrame for example). I’ll share that tomorrow in case it’s useful. But it’s a bit of a pain doing this - obviously it’s a day or so of time to write the system plus it makes things a bit more brittle.

The end result though works fine for us, for now! We’ve not seen any random diffs slip through or any issues in the game.

I think a better fix would be to find some way to ensure that CFrame in Lune does no processing on the values it reads in unless necessary.

eAi commented

I've updated the post above with the current code we're using.