vibe-d/vibe.d

Error on creating a mongo connection

gedaiu opened this issue · 7 comments

Hey everyone, I just updated my project to use the latest vibe version and I started getting this exception. Does anyone have an idea on how I could debug it?

creating new MongoConnection connection, all 1 are in use
TaskFiber getting terminated due to an uncaught core.exception.InvalidMemoryOperationError
core.exception.InvalidMemoryOperationError@core/lifetime.d(126): Invalid memory operation
----------------
??:? onInvalidMemoryOperationError [0x7f5d2016a83d]
??:? nothrow void* core.internal.gc.impl.conservative.gc.ConservativeGC.runLocked!(core.internal.gc.impl.conservative.gc.ConservativeGC.mallocNoSync(ulong, uint, ref ulong, const(TypeInfo)), core.internal.gc.impl.conservative.gc.mallocTime, core.internal.gc.impl.conservative.gc.numMallocs, ulong, uint, ulong, const(TypeInfo)).runLocked(ref ulong, ref uint, ref ulong, ref const(TypeInfo)) [0x7f5d2017a023]
??:? _DThn16_4core8internal2gc4impl12conservativeQw14ConservativeGC6mallocMFNbmkxC8TypeInfoZPv [0x7f5d2017d1a4]
../../../../.dub/packages/vibe-d/0.10.0/vibe-d/mongodb/vibe/db/mongo/client.d:66 [0xe55eef]
../../../../.dub/packages/vibe-core/2.8.2/vibe-core/source/vibe/core/connectionpool.d:94 [0xe568ad]
../../../../.dub/packages/vibe-d/0.10.0/vibe-d/mongodb/vibe/db/mongo/client.d:167 [0xe5600f]
../../../../.dub/packages/vibe-d/0.10.0/vibe-d/mongodb/vibe/db/mongo/cursor.d:518 [0x809d34]
../../../../.dub/packages/vibe-d/0.10.0/vibe-d/mongodb/vibe/db/mongo/cursor.d:124 [0x4e4a4d]
submodules/crate/source/crate/resource/gridfs.d:481 [0xb9e308]
??:? rt_finalize2 [0x7f5d201a4556]
??:? nothrow ulong core.internal.gc.impl.conservative.gc.Gcx.sweep() [0x7f5d2018034e]
??:? nothrow ulong core.internal.gc.impl.conservative.gc.Gcx.fullcollect(bool, bool, bool) [0x7f5d2017c6f9]
??:? nothrow void* core.internal.gc.impl.conservative.gc.Gcx.bigAlloc(ulong, ref ulong, uint, const(TypeInfo)) [0x7f5d2017f143]
??:? nothrow void* core.internal.gc.impl.conservative.gc.ConservativeGC.runLocked!(core.internal.gc.impl.conservative.gc.ConservativeGC.mallocNoSync(ulong, uint, ref ulong, const(TypeInfo)), core.internal.gc.impl.conservative.gc.mallocTime, core.internal.gc.impl.conservative.gc.numMallocs, ulong, uint, ulong, const(TypeInfo)).runLocked(ref ulong, ref uint, ref ulong, ref const(TypeInfo)) [0x7f5d2017a058]
??:? _DThn16_4core8internal2gc4impl12conservativeQw14ConservativeGC6qallocMFNbmkMxC8TypeInfoZSQDd6memory8BlkInfo_ [0x7f5d2017d232]
??:? gc_qalloc [0x7f5d2018683b]
??:? core.memory.BlkInfo_const  rt.lifetime.__arrayAlloc(ulong, ref core.memory.BlkInfo_const , TypeInfo, const(TypeInfo)) [0x7f5d201a28fb]
??:? _d_arraysetlengthiT [0x7f5d201a5adc]
/usr/lib/ldc/x86_64-redhat-linux-gnu/include/d/core/internal/array/capacity.d:47 [0x80a71a]
../../../../.dub/packages/vibe-d/0.10.0/vibe-d/mongodb/vibe/db/mongo/cursor.d:533 [0x80a6c3]
../../../../.dub/packages/vibe-d/0.10.0/vibe-d/mongodb/vibe/db/mongo/connection.d:600 [0x80f304]
../../../../.dub/packages/vibe-d/0.10.0/vibe-d/mongodb/vibe/db/mongo/connection.d:752 [0x4f7672]
source/ogm/api/api.d:590 [0x80ad28]
../../../../.dub/packages/vibe-d/0.10.0/vibe-d/mongodb/vibe/db/mongo/connection.d:520 [0x80a38b]
source/ogm/api/api.d:464 [0x809a6c]
../../../../.dub/packages/vibe-d/0.10.0/vibe-d/mongodb/vibe/db/mongo/cursor.d:137 [0x4e4c04]
/usr/lib/ldc/x86_64-redhat-linux-gnu/include/d/std/range/primitives.d:1788 [0xb2836c]
submodules/crate/source/crate/base.d:732 [0xb282e6]
submodules/crate/source/crate/resource/gridfs.d:507 [0xb9e3f9]
source/ogm/api/service.d:82 [0x4ce8d3]
../../../../.dub/packages/vibe-core/2.8.2/vibe-core/source/vibe/core/task.d:780 [0x8115be]
../../../../.dub/packages/vibe-core/2.8.2/vibe-core/source/vibe/core/task.d:798 [0xf9590b]
../../../../.dub/packages/vibe-core/2.8.2/vibe-core/source/vibe/core/task.d:471 [0xf95129]
../../../../.dub/packages/vibe-core/2.8.2/vibe-core/source/vibe/core/task.d:411 [0xf94c7d]
??:? fiber_entryPoint [0x7f5d201911ea]
Error Program exited with code -6

Judging from the call stack, the cause looks to be that some GC allocated type in submodules/crate/source/crate/resource/gridfs.d is holding a MongoCursor, which in turn is trying to send a "killCursors" command to the server. The only difference in behavior that came to mind is this: https://github.com/vibe-d/vibe.d/pull/2769/files#diff-b86dcfeb19c6689b601a62860fdc40c9d8e38d5522c63cc0d9bf4fd1c1337978R516-R522

@WebFreak001: I failed to ask in the PR review, but did you change those lines intentionally in terms of semantics? Because this now causes repeated calls to killCursors() to connect to the server, even though the cursor has already been killed. In the review I was just thinking of it as a minor inefficiency and didn't realize that this would also affect the finalizer.

I don't think it's calling multiple times, (there is a guard) but just calling at all in the destructor - it might be relying on stuff that is already freed at that point though. (e.g. m_client) Additionally I

The stack trace is in handleReply: https://github.com/vibe-d/vibe.d/blob/v0.10.0/mongodb/vibe/db/mongo/cursor.d#L533 - failing because it's trying to allocate GC memory in a destructor (which is forbidden)

coming from a call to conn.killCursors: https://github.com/vibe-d/vibe.d/blob/v0.10.0/mongodb/vibe/db/mongo/cursor.d#L520

I swapped around the lines intentionally since it looked to me like a possible race condition that it could kill a cursor twice, so I changed the early-return check to be run only after locking. However the issue only happens in conn.killCursors, so I doubt this was affecting anything - if it did, it would have been a problem before as well but with slightly different timing that may have just been passed luckily/unluckily.

Should we just add a flag that we are in the destructor and in that case ignore any replies?

Okay, no, the locking doesn't really perform any synchronization in terms of the cursor, it just allocates a connection to use in that scope. The problem is that with this order it will attempt to allocate a connection on every call to killCursors, no matter whether the cursor is active or not. In this case, there is no connection in the pool, so it tries to allocate one from MongoCursor.~this.

I actually didn't notice the second part of the call stack, but the handleReply is not the one responsible for the InvalidMemoryOperationError, it is the in the upper part, which is nested inside Gcx.fullcollect, calling new MongoCollection at client.d:66.

oh right this looks like it's 2 stacktraces somehow molded into one?

here is the stacktrace, linked to source code on github for better overview:

core.exception.InvalidMemoryOperationError@core/lifetime.d(126): Invalid memory operation

so yeah it looks like it's coming from the lockConnection call in the destructor, which is not destructor safe (since it tries to allocate)

If it doesn't do any synchronization anyway we could just move it back then. I was just assuming it under the name but didn't actually check and thought it wouldn't cause this kind of issue.

My plan would be to swap it back, but also add a GC.inFinalizer check in ~this that results in a somewhat descriptive error log message instead of crashing the app. I'd hope that MongoDB will at some point clean up stale cursors anyway(?)... so that doesn't necessarily have to be a fatal error to not be able to clean up.

From the user code side of things this of course still means that the cursor must be fully processed (or explicitly killed), or needs to be destroyed deterministically instead of by the GC.