man-group/arctic

MongoDB 5.0

devqinves opened this issue · 6 comments

Hi, Artic has a depency on pymongo "pymongo>=3.6.0, <= 3.11.0". MongoDB 5.0 is not compatible with pymongo 3.11.0. Is there any plans for Arctic to be able to run with pymongo>= 3.12.0 to be compatible with MongoDB 5.0 ?

@dunckerr

A software version is not automatically better or more "stable" (as in "less bugs/crashing less", etc) just because it's older. MongoDB is a big, established project with a more sophisticated release engineering process than simply slapping a tag on a commit and calling it a day.

What are the technical reasons to specifically avoid 5.0 now?

As it stands, arctic does not play nice with the rest of the python ecosystem. It is not very developer experience-friendly to not support python versions 3.8 onward, or recent pandas versions, for example. It places a lot of constraints in the development environment. Not supporting the most recent versions of pymongo/mongodb worsens the situation.

If the issue is simply a matter of implementation, then I would be interested in knowing the requirements. I may be able to contribute the necessary changes, if they are not yet being worked on by someone else.

If you think supporting more recent stuff would introduce too much maintenance burden, consider making up for that by dropping support for older stuff (such as Python 3.6, which went EOL a month ago now) - it is the user's responsibility to keep their baseline stuff (e.g. Python) up to date, not yours to keep supporting deprecated/EOL stuff. Deprecated/EOL means "[support] may be removed at any time". The burden is on the user to deal with that, and with what "at any time" entails. Beyond the courtesy of providing a comfortable time window for migration for bigger changes, no one has the right to expect anymore from you, unless they're paying for some kind of support contract, but then again that would be maintained separately...

There have been significant data corruption bugs in Mongo 4.4.x and 5.0.x. E.g. Have a look at the Jira regarding this and mitigation:
https://jira.mongodb.org/plugins/servlet/mobile#issue/WT-8395

The version ranges currently specified are the ones we have been able to test against. These aren’t the latest, but are known to work. We know there are some pandas performance issues with newer pandas (which need investigation and fixing), and we’ve had unexpected issues with newer versions of pymongo.

We are always interested in patches if you are able to test, fix and validate that the library works for your setup.

@jamesblackburn

There have been significant data corruption bugs in Mongo 4.4.x and 5.0.x. E.g. Have a look at the Jira regarding this and mitigation: https://jira.mongodb.org/plugins/servlet/mobile#issue/WT-8395

I took a look at that bug and I don't see such a cause for alarm. The data corruption only happens under very specific circumstances, and there are lots of available remediations. Besides, it's already fixed, and the fix is included in version 5.0.6, which will release soon.

I took a look around the issue tracker, and the only other active data corruption bug is this one: https://jira.mongodb.org/browse/WT-8695. Again, its impact seems very limited.

Regardless, these are generally problems for users, not for library authors (unless you're really unlucky and such particular bugs really mess with your implementation, or if the bugs are really egregious). Which brings me to my next point...

The version ranges currently specified are the ones we have been able to test against. These aren’t the latest, but are known to work.

How can 5.x ever become "known to work" if you never start testing and experimenting with it? It's a bit of a chicken and egg problem - it seems this project is waiting for users to give the green light to support 5.x, but the users are waiting the project to implement support in order to exercise the 5.x version in their use cases. Bugs will always exist. Without thoroughly exercising a piece of software in diverse environments, they will remain undiscovered. The ecosystem moves faster if all projects keep up and report bugs to each other's upstream.

In case you're worried about the "blame game": just ignore those who blame you for an upstream bug that affected them, if that does happen. It's the users' responsibility to follow sane migration and backup strategies, to cover hypothetical worse-cases like that.


We know there are some pandas performance issues with newer pandas (which need investigation and fixing),

Re: Pandas - seems like it's no longer an issue?

#887
#908

Quote from one of the PRs:

Various people have reported arctic working latest version of Pandas. Without this fix, arctic forces the use of older versions of Pandas which can lead to conflicts with other libraries.


and we’ve had unexpected issues with newer versions of pymongo.

Care to point me in the right direction, please? Searching through the open issues yielded no relevant results. I did come across #926, which I would assume it's at least one of the things you're referring to.


We are always interested in patches if you are able to test, fix and validate that the library works for your setup.

👍 , trying to gather information right now.

I work in the fund industry so I understand first hand people's hesitance to make changes, espically when things are working. But I also see the missed opportunities to improve the existing tools by not incorporating the latest features, fixes, etc. My view is that probably an easier way is to create something separate that co-exists (may be a different branch, tag, or even a different library), that tries to move a bit faster and is not necessarily thoroughly tested, just for the brave. Kind of an insider preview. I'm sure people will be happy to try it out with some non-critical applications and potentially contribute.

@devqinves please note that we have merged mongo4.4/pymongo3.11.0 into master. I have also created a test branch with mongo5.0.14/pymongo==3.14.0, which is passing all tests. Note that we haven't performance tested this branch.