Chisel-Team/ConnectedTexturesMod

[1.16.5+] Client crash on loading due to concurrency issue

Opened this issue · 1 comments

hix3r commented

Hello,

Mod ver.: MC1.16.1-1.1.2.6
Forge ver.: 1.16.5-36.2.20
Minecraft ver.: 1.16.5
Forge crash log: https://gist.github.com/hix3r/52aceac059fcaeb935b9a07b80e37c68

This is the same issue that was reported in #166 , it happens intermittently, because it is caused by a race condition of multiple threads trying to modify a single Java HashMap. The cause of this issue is present in the current latest branch as well. Looking at the changes, the code logic responsible is the same since at least 5 years ago so all versions of the mod in that period are affected.

In team.chisel.ctm.client.util.TextureMetadataHandler the onModelBake() method subscribes to all ModelBakeEvent events. Events are dispatched asynchronously in multiple worker threads.

onModelBake() does the following at line number 156 in the most recent branch:

try { 
    meta = ResourceUtil.getMetadata(ResourceUtil.spriteToAbsolute(tex.texture()));
} catch (IOException e) {}

ResourceUtil.getMetadata() is a static method that modifies metadataCache which is a static final HashMap via its put() method on line 64 of the ResourceUtil class.

This means there is a possible scenario when two event threads will try to put() something into metadataCache at the same time. As a result during loading the cache could be: unaffected or its entries silently corrupted or cause a runtime crash depending on thread timing.

Java HashMap by itself is not implemented to handle concurrent access and modification. The implementation of it needs to maintain a consistent internal tree structure of nodes. The strange exception message class java.util.HashMap$Node cannot be cast to class java.util.HashMap$TreeNode is due to multiple threads trying to act on this internal tree structure at the same time, leaving it in a broken state.

Possible solutions:

  • change metadataCache in ResourceUtil into a ConcurrentHashMap, as this uses several lock-free optimizations to lessen the burden of synchronization, the retrieval operations like get() do not block, plus the internal hash array is divided into segments so blocking only occurs on one segment during a modification
  • use a synchronized block when accessing or modifying metadataCache
  • synchronize on the getMetadata() method
  • possibly use or implement some other cache data structure that is suitable for concurrent modification and retrieval

Hope this helps.

This bug is also present on 1.19.2-1.1.6+8

https://mclo.gs/NGys42F