GeyserMC/MCProtocolLib

Problems with ChunkSection.read : IllegalArgumentException: Expected 1024 longs but got 0 longs

MatteCarra opened this issue · 13 comments

I have had some issues with ChunkSection.read on some servers.
The issue is very easily reproducible on mc.hypixel.net by just connecting to the server.

Disconnected: java.lang.IllegalArgumentException: Expected 1024 longs but got 0 longs
java.lang.IllegalArgumentException: Expected 1024 longs but got 0 longs
	at com.github.steveice10.mc.protocol.data.game.chunk.BitStorage.<init>(BitStorage.java:62)
	at com.github.steveice10.mc.protocol.data.game.chunk.DataPalette.read(DataPalette.java:46)
	at com.github.steveice10.mc.protocol.data.game.chunk.ChunkSection.read(ChunkSection.java:30)
	at Main$SessionAdapterImpl.packetReceived(Main.java:48)
	at com.github.steveice10.packetlib.tcp.TcpSession.callPacketReceived(TcpSession.java:156)
	at com.github.steveice10.packetlib.tcp.TcpSession.lambda$channelRead0$2(TcpSession.java:375)
	at io.netty.channel.DefaultEventLoop.run(DefaultEventLoop.java:54)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:833)

The issue seems to be that MCProtocolLib reads bitsPerEntry=16 and then this length is equal to 0.
I can't figure out where the problem is tho.
The BitStorage implementation seems fine and I was not able to find any issue prior to that read.
Also note that the issue happened on the first iteration of ChunkSection.read on a particular chunk.
The first bytes of the chunkData where: [0 0 16 0 0 35 4 -1.....], with the first two bytes being the blockCount, the third being bitsPerEntry and the fourth being the long array length varint.
The whole chunkData in this case had length 98646
Removing the throw IllegalArgumentException in BitStorage caused the ChunkSection.read to have different problems (like negative bitsPerEntry), meaning that bitsPerEntry should not be 16 or the 0 for the long array length varint is not meant to be 0.

Sample code here:

import com.github.steveice10.mc.auth.data.GameProfile;
import com.github.steveice10.mc.auth.util.UUIDSerializer;
import com.github.steveice10.mc.protocol.MinecraftProtocol;
import com.github.steveice10.mc.protocol.data.game.chunk.ChunkSection;
import com.github.steveice10.mc.protocol.data.game.chunk.DataPalette;
import com.github.steveice10.mc.protocol.packet.ingame.clientbound.ClientboundLoginPacket;
import com.github.steveice10.mc.protocol.packet.ingame.clientbound.level.ClientboundLevelChunkWithLightPacket;
import com.github.steveice10.packetlib.Session;
import com.github.steveice10.packetlib.event.session.DisconnectedEvent;
import com.github.steveice10.packetlib.event.session.SessionAdapter;
import com.github.steveice10.packetlib.io.stream.StreamNetInput;
import com.github.steveice10.packetlib.packet.Packet;
import com.github.steveice10.packetlib.tcp.TcpClientSession;

import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.util.concurrent.CountDownLatch;

import static java.util.concurrent.TimeUnit.MINUTES;

public class Main {
    public static void main(String[] args) {
        GameProfile gameProfile = new GameProfile(UUIDSerializer.fromString("REDACTED"), "MatteCarra");
        String accessToken = "REDACTED";
        Session session = new TcpClientSession("mc.hypixel.net", 25565, new MinecraftProtocol(gameProfile, accessToken));
        try {
            SessionAdapterImpl listener = new SessionAdapterImpl();
            session.addListener(listener);
            session.connect();

            listener.login.await(1, MINUTES);
        } catch (InterruptedException e) {
            e.printStackTrace();
        } finally {
            session.disconnect("Login test complete.");
        }
    }

    private static class SessionAdapterImpl extends SessionAdapter {
        @Override
        public void packetReceived(Session session, Packet packet) {
            if (packet instanceof ClientboundLevelChunkWithLightPacket) {
                StreamNetInput dataIn = new StreamNetInput(new ByteArrayInputStream(((ClientboundLevelChunkWithLightPacket) packet).getChunkData()));
                for(int i = 0; i < 16; i++) {
                    try {
                        ChunkSection.read(dataIn, DataPalette.GLOBAL_PALETTE_BITS_PER_ENTRY);
                    } catch (IOException e) {
                        e.printStackTrace();
                    }
                }
            }
        }

        @Override
        public void disconnected(DisconnectedEvent event) {
            System.err.println("Disconnected: " + event.getReason());
            if (event.getCause() != null) {
                event.getCause().printStackTrace();
            }
        }
    }
}

Note that the chunk section = 16 is correct in the hypixel hub, but it's not relevant because this fails at the first iteration of the for

I have just noticed that the PrismarineJS guys just ignore the varInt x that MCProtocolLib is using to read the x amount of longs: https://github.com/PrismarineJS/prismarine-chunk/blob/7fca7f3e27bb82f0f7c9f0772df5c9de7343de11/src/pc/common/PaletteChunkSection.js#L90

Is this the only way to go?
This is the entire pull request by @nickelpro : PrismarineJS/prismarine-chunk#150
They reference a bug in the Minecraft code

We ignore the varint because we know the size of the data based on the size of the palette. It's always 4096 block states packed into non-spanning 64-bit longs. So for example with 4-bits per block, we already know it's going to be 256 64-bit values, 5-bits per block is 342 64-bit values, etc. The varint is completely extraneous.

Mojang has two bugs in how they calculate the total buffer size for a chunk packet. This results in extra empty bytes at the end of the chunk data array for each Global and Single Value palette used in the chunk column:
https://bugs.mojang.com/browse/MC-131684
https://bugs.mojang.com/browse/MC-247438

Mojang has two bugs in how they calculate the total buffer size for a chunk packet. This results in extra empty bytes at the end of the chunk data array for each Global and Single Value palette used in the chunk column: https://bugs.mojang.com/browse/MC-131684 https://bugs.mojang.com/browse/MC-247438

Thanks for the clarification. that means that it has nothing to do with our problem

We ignore the varint because we know the size of the data based on the size of the palette. It's always 4096 block states packed into non-spanning 64-bit longs. So for example with 4-bits per block, we already know it's going to be 256 64-bit values, 5-bits per block is 342 64-bit values, etc. The varint is completely extraneous.

This is very clear to me, but I was surprised that the varInt (for the long length) does not match the length computed using the bitsPerEntry value.
MCProtocolLib currently reads a number of longs equal to the value of that varint and checks that it matches the value calculated using bitsPerEntry.

This is the bit of code that checks that they match:

I guess that the Minecraft client does not use that varint and that's why the mc client works on the hypixel hub, while MCProtocolLib doesn't.
As I said we are receiving a ClientboundLevelChunkWithLightPacket from Hypixel with the chunkData first bytes equal to: [0 0 16 0 0 35 4 -1 .....]
The first two bytes being the blockCount = 0, the third being bitsPerEntry=16 and the fourth being the long array length varint=0.
The code expects to find the long array length equal to 1024 and so it raises the exception.

Does Prismarine work in the Hypixel lobby? I can confirm that Geyser does not as of 1.18, but we get a different error I will link later. ViaVersion shares (tested through ViaAAS), essentially, the same error.

We have some bug related to how were handling palettes right now so I wouldn't use our implementation as a reference. The vanilla minecraft client itself uses the varint to verify the size of its long array. If you're getting invalid values you're desyncing somewhere in the stream. The relevant code from PalettedContainer.java:

public void read(FriendlyByteBuf friendlyByteBuf) {
    this.acquire();
    try {
        byte by = friendlyByteBuf.readByte();
        Data<T> data = this.createOrReuseData(this.data, by);
        data.palette.read(friendlyByteBuf);
        friendlyByteBuf.readLongArray(data.storage.getRaw());
        this.data = data;
    }
    finally {
        this.release();
    }
}

And then readLongArray:

public long[] readLongArray(@Nullable long[] lArray) {
    return this.readLongArray(lArray, this.readableBytes() / 8);
}

public long[] readLongArray(@Nullable long[] lArray, int n) {
    int n2 = this.readVarInt();
    if (lArray == null || lArray.length != n2) {
        if (n2 > n) {
            throw new DecoderException("LongArray with size " + n2 + " is bigger than allowed " + n);
        }
        lArray = new long[n2];
    }
    for (int i = 0; i < lArray.length; ++i) {
        lArray[i] = this.readLong();
    }
    return lArray;
}

The relevant lines being lArray.length != n2 and lArray = new long[n2];. Notchian relies on that varint being correct.

We have some bug related to how were handling palettes right now so I wouldn't use our implementation as a reference. The vanilla minecraft client itself uses the varint for verification. If you're getting invalid values you're desyncing somewhere in the stream.

I would say it's unlikely that we have a desync. The only option is we got the nbt reading wrong.
The reason is this:


We are just reading two ints (chunkX and chunkZ) and the nbt for the height map.
Right after that we read [0 0 16 0 0 35 4 -1 .....] for the chunkData (I have already explained why it was causing the problem)

public void read(FriendlyByteBuf friendlyByteBuf) {
    this.acquire();
    try {
        byte by = friendlyByteBuf.readByte();
        Data<T> data = this.createOrReuseData(this.data, by);
        data.palette.read(friendlyByteBuf);
        friendlyByteBuf.readLongArray(data.storage.getRaw());
        this.data = data;
    }
    finally {
        this.release();
    }
}

And then readLongArray:

public long[] readLongArray(@Nullable long[] lArray) {
    return this.readLongArray(lArray, this.readableBytes() / 8);
}

public long[] readLongArray(@Nullable long[] lArray, int n) {
    int n2 = this.readVarInt();
    if (lArray == null || lArray.length != n2) {
        if (n2 > n) {
            throw new DecoderException("LongArray with size " + n2 + " is bigger than allowed " + n);
        }
        lArray = new long[n2];
    }
    for (int i = 0; i < lArray.length; ++i) {
        lArray[i] = this.readLong();
    }
    return lArray;
}

Oh wow that code needs a bit of attention.
First and foremost they are only throwing an exception when varInt > expectedLength (the opposite of our case, so we should NOT throw the exception).

The second thing is that, when lArray.length != n2, lArray is NOT updated (because they do lArray = new long[n2]).
The new lArray is then returned, but it's ignored by PallettedContainer.read friendlyByteBuf.readLongArray(data.storage.getRaw());

Yes, PalettedContainer.java is a mess, really really terrible.

My point was just to show that the code, as you correctly observed, completely falls apart if the length of the array calculated for the underlying BitStorage is not the same as the length written for the varint. Vanilla expects these to be the same value. If they're not the same value for you, something has gone wrong and you're reading from the wrong place.

Thanks for the confirmation.

I am pretty sure that hypixel is sending the chunk wrong, but the Minecraft client does not crash and just ignores the wrong chunk because of that code (it was not designed to do that obv, it's just by chance).
I would bet that If I run the same check on prismarine (if varint != expectedLength) I would get the same result.

On Hypixel, we get a different error:

[09:44:08 ERROR] Could not translate packet ClientboundLevelChunkWithLightPacket
java.lang.NegativeArraySizeException: -2147483648
	at com.github.steveice10.mc.protocol.data.game.chunk.palette.ListPalette.<init>(ListPalette.java:21) ~[MCProtocolLib-6a23a780.jar:?]
	at com.github.steveice10.mc.protocol.data.game.chunk.palette.ListPalette.<init>(ListPalette.java:25) ~[MCProtocolLib-6a23a780.jar:?]
	at com.github.steveice10.mc.protocol.data.game.chunk.DataPalette.readPalette(DataPalette.java:117) ~[MCProtocolLib-6a23a780.jar:?]
	at com.github.steveice10.mc.protocol.data.game.chunk.DataPalette.read(DataPalette.java:42) ~[MCProtocolLib-6a23a780.jar:?]
	at com.github.steveice10.mc.protocol.data.game.chunk.ChunkSection.read(ChunkSection.java:30) ~[MCProtocolLib-6a23a780.jar:?]
	at org.geysermc.geyser.translator.protocol.java.level.JavaLevelChunkWithLightTranslator.translate(JavaLevelChunkWithLightTranslator.java:112) ~[classes/:?]
	at org.geysermc.geyser.translator.protocol.java.level.JavaLevelChunkWithLightTranslator.translate(JavaLevelChunkWithLightTranslator.java:78) ~[classes/:?]
	at org.geysermc.geyser.registry.PacketTranslatorRegistry.translate0(PacketTranslatorRegistry.java:86) ~[classes/:?]
	at org.geysermc.geyser.registry.PacketTranslatorRegistry.lambda$translate$0(PacketTranslatorRegistry.java:67) ~[classes/:?]
	at io.netty.channel.DefaultEventLoop.run(DefaultEventLoop.java:54) [netty-transport-4.1.66.Final.jar:4.1.66.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) [netty-common-4.1.66.Final.jar:4.1.66.Final]
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.66.Final.jar:4.1.66.Final]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-common-4.1.66.Final.jar:4.1.66.Final]
	at java.lang.Thread.run(Thread.java:831) [?:?]

The cause for this is that the global palette should be used as a fallback for any number outside of 0-8, and we read the palette bits as negative and use a list palette instead. Even after fixing that, though, both ViaVersion and us see a too-long VarInt. So the issue probably isn't specifically in the BitStorage but another encoding issue elsewhere.

No idea if its related, but this (from DataPalette.java) looks wrong to me:

private static Palette readPalette(PaletteType paletteType, int bitsPerEntry, NetInput in) throws IOException {
    if (bitsPerEntry > paletteType.getMaxBitsPerEntry()) {
        return new GlobalPalette();
    }
    if (bitsPerEntry == 0) {
        return new SingletonPalette(in);
    }
    if (bitsPerEntry <= paletteType.getMinBitsPerEntry()) {
        return new ListPalette(bitsPerEntry, in);
    } else {
        return new MapPalette(bitsPerEntry, in);
    }
}

When bitsPerEntry is <= the MinBitsPerEntry, you should use the MinBitsPerEntry.
Again from vanilla's PalettedContainer.java:

@Override
public <A> Configuration<A> getConfiguration(IdMap<A> idMap, int n) {
    Configuration configuration = switch (n) {
        case 0 -> new Configuration(SINGLE_VALUE_PALETTE_FACTORY, n);
        case 1, 2, 3, 4 -> new Configuration(LINEAR_PALETTE_FACTORY, 4);
        case 5, 6, 7, 8 -> new Configuration(HASHMAP_PALETTE_FACTORY, n);
        default -> new Configuration(GLOBAL_PALETTE_FACTORY, Mth.ceillog2(idMap.size()));
    };
    return configuration;
}

This is the configuration manager for BlockState sections, where n is the bitsPerEntry. As you can see, for any bitsPerEntry 1-4, the bits value passed to the Palette configuration is always 4.

For biomes its different, and the bitsPerEntry is always used as is when <= 3:

@Override
public <A> Configuration<A> getConfiguration(IdMap<A> idMap, int n) {
    return switch (n) {
        case 0 -> new Configuration(SINGLE_VALUE_PALETTE_FACTORY, n);
        case 1, 2, 3 -> new Configuration(LINEAR_PALETTE_FACTORY, n);
        default -> new Configuration(GLOBAL_PALETTE_FACTORY, Mth.ceillog2(idMap.size()));
    };
}

I see the issue. Minecraft itself is ignoring the length prefix of the long array and it can be a pretty much anything. It is reading the size of the palette. There may be a speical case, not investigating this, I'll implement a fix in my client.

@nickelpro indeed, that is wrong in the code

was it fixed by ecc04fd ??

Most likely. Thanks for the bump.