neuecc/Utf8Json

Started a fork/v2 at https://github.com/vilinet/Utf8Json-v2

vilinet opened this issue ยท 53 comments

I know it is not the nicest way to introduce a fork of a repo but i had no other choice. I asked the author but my messages were ignored.

The v2 can be found at https://github.com/vilinet/Utf8Json-v2/
I cleaned up the code a bit, added new frameworks.
And already fixex some issues but there are many left.

I hope we can make this stunning library even better :)
Its alive!

@vilinski
@RamType-0
@st-gwerner
@SaAnnka
@neon-izm
@Kiryushin-Andrey
@AndrewGumenyuk
@ pCYSl5EDgo

do you really have a time and energy to maintain the fork?
are you sure @neuecc has really stepped down?
I mean I would appreciate to see this lib evolving and bugfixed. But... not if it results in another "padleft" hell.

@neuecc said that he gonna work with this repo after he release UniTask v2

@neuecc said that he gonna work with this repo after he release UniTask v2

That is great news!

It does not matter to me which version just need to keep this stuff alive.

I made my repo private :)
Any idea how long it will take?

I have seen he is is active on many other projects.

I'm relieved to hear he may start giving this attention, but I would certainly be happier if there was more than one maintainer. Even if it means that it requires unanimous consensus on all pull requests, that'd be better than the feeling that things were abandoned for a couple years. There are a few fixes that we all can look at and agree would be good, but without him personally approving them, the issues linger, and that results in way too many forks with the same fix.

I would welcome a well maintained fork. The goal here is to have a low allocation UTF8 aware serializer for JSON in .NET.

The point in the MIT license in to foster innovation. If @neuecc (who has done a great job here) does get the time to come back to maintain this library, then all we will have is 2 great libraries to choose from. Maybe there will be some differences that will make each more suitable for different applications, but that also sounds great to me.

I think calling it the same thing is probably unfair though and could create confusion, you should probably come up with a new name, and then maintain recognition for the original base.

Would be much better if everyone could concentrate on one lib instead decentralizing manpower into 2 separated libs.

Best option really would be to share ownership with someone who could review and approve PR.
Or to have a "bugfix branch" that we can take care of. We could also release some bugfix packages in prelease status or just simply increasing the build version.. Whatever.

There are fundamental very basic and simple bugs and they have not been fixed in the last 2 years.
And it blocks many people from using it. (like its not supporting json escape sequences).

I know many people who look at the repo and they just dont use it as they dont thrust in an abandoned codebase that have these basic issues untouched.

And the best is how the owner does not even write a single comment ...

I don't see it such essential to justify a fork. Json serializing is a not complex operation. You can always take another format if it is not fast enough.
And there is already decent System.Text.Json.
And if you not aware there is also https://github.com/Dogwei/Swifter.Json
it claims to be even better than anything else, but you probably need google translate to read the docs ))

@vilinski, hello do you think you can share your private link? I'd like to see if we can use it while waiting for the author.

@vilinski, hello do you think you can share your private link?

@markmadlangbayan you surely mixed me up with @vilinet ? Don't know why he even has my nick :D

@vilinet hi we are encountering a critical & security issue in this library: #224 - this \u0003 doesn't serialeze properly causing the front end to crash. I hope you could help us and for the community as well, is there a fix or workaround for this?

@firephantomassasin Lets have a look at it :)

@vilinet Thank you so much.

Drop me a mail to the vilinet9@ the mail domain that google uses dottt com :)

Issue #224 sounds a lot like something several of us have fixed, including a PR I submitted #217.

@st-gwerner it works! Thank you so much.

Hey guys @vilinet, i think while the main author is not yet around and looks like this repo is not active anymore, it would be better to create a V2 Fork of this repo and include all the current pending Pull Requests especially those who are bug fixes.

@st-gwerner
Yeah definitely a fix. But also would make string writing much slower because of that condition check on every single character and also copying. Also that array access is not the fastest either.

I would do something like this: https://pastebin.com/xfYC7f3z
No additional if, no array access, conversion to hex is faster than the convert one. sure the common stuff could be "aggressivly" inlined though c# compiler most of the times does not care about that either :D

@firephantomassasin Yeah but also there are many PR that would slow it down serialization. Of course sometimes there is now way to avoid it but many of them can be improved.

@vilinet All the more reason that your idea of a single repo would be amazing, managed by multiple heads who can discuss and evaluate the best ways to do it. My focus was getting correct behavior with the same relative performance (in my testing of 40-50MB files, the check didn't slow performance much at all), but I would love it if we had a central repo where we can bat things around, do additional testing, etc.

I also agree with you also that not all "fixes" are good.

@st-gwerner Yeah :)
Also if the original author decides to continue he can use the actual up-to-date code as a base.
Maybe it should not be called v2 but something like utf8json-patches / fixes. Dont know.

Also i checked that Swifter.Json that was linked above and i need to stay that is much more faster then this lib(by 35% with my test data), but also has less feature. But this code base is much more "contributor" friendly and there are things we could use from that lib as well.

@vilinet just wonder, which features you miss in Swifter.Json compared to Utf8Json?
For me no support for F# types makes it currently useles, but author already works on it.

@vilinski No simple way for camel case output for example. You can use the callback but perfermance drops heavily. Or you rename your properties to use camel case.

No attribute support to ignore/rename properties.

@vilinet I opened an issue #209 a short while ago. Being able to easily define your own naming formats would be very handy in any fork, and not hard to expose without slowing anything down. We have to consume a lot of weird APIs at our shop, and having to put attributes on everything is annoying.

@vilinet not yet tried myself, but there are RWField and RWFielAccess attributes and also other JsonFormatterOptions

@vilinski you are right! I will have a try!

@st-gwerner @vilinet have you checked this another security issue in this library? #127

@vilinet If you think that you have time to continue this project, please go on with the fork. It is badly needed.

Okay guys. So i made this https://github.com/vilinet/Utf8Json-v2/ public again.
I already released a 1.5 nuget package with some crucial issue fix but raised a ticket also to have a discussion how things should go on(like versioning).

So please have a look :) We can also decide to stop it and we will switch to another library. (There are some nice candidates)

@vilinet

Do you really think someone will work on this new repo if you deleted the history and just copied all the source code in the initial commit!? https://github.com/vilinet/Utf8Json-v2/commit/fdf4515d7cac2bef8e4b700706ca98aad0a08301

  • It's not discoverable as a fork
  • It's very hard to understand what's going on in complex places without history
  • It's hard to see what and where you added/removed from original

Not related to this library, but recently I forked a 5-years-old near 1k-stars library and I spent almost a day just to clean and keep Git history, otherwise I could hardly navigate and understand what was going on in this 3rd party codebase. This is important.

In my fork of this library I try to force-push to have as few commits as possible and view them as patches rather than full-blown continuation of upstream development (I do not have skills and resources for that, do you?). There are multiple "natural" forks of this library and it's easy to cherry-pick bug-fixes from there (or from PRs) and keep the changes to upstream visible as isolated patches. It would be great if there is one place that accumulates all useful PRs and fork commits in one place, but keeps the development anchored to the upstream.

My fork is incompatible with the upstream (no explicit Unity support and I do not know if it works there, changed internals so it works with native memory and not only with byte[], serializes tuples as JS arrays and other changes; but faster), so it's not even a candidate for such a place. But I do periodically review if there are worthy changes and cherry-pick them. Your "fork" will be completely invisible to me.

So, ideally and in my opinion, a new centralized fork should have all worthy PRs and commits from other forks that are cherry-picked from their branches (or merged with their authors kept in git history). Anyone should be able to cherry-pick from this fork or rebase their fork on the new centralized fork. This also makes it possible to backport all community contributions to the upstream in a couple of clicks whenever it is active again. Without this any "artificial"/"forced" fork adds more hassle than value.

@vilinet what are those another library (There are some nice candidates) ? So far i've tried other libraries and this one gives the fastest performance.

@buybackoff I do agree. The creation of that "fork" is crap, not the way as it should have been done.

So the question is again: Who wants to fork it? or Do we choose an existing fork that has been already worked on?
And so likely there should be more owner on that project to avoid it getting killed again.

If you say okay, do it properly i am willing to fix my errors and create a new valid fork, clean it up, upgrade to latest libraries, etc.
Not quite sure about the history cleanup but maybe that can play too.

@vilinet from the "right" fork your are just one click on "Fork" button away. ) Wrong way is just mkdir and copy sources. You can also create an organization for the owner(s) and move the repo into it. This way the changes can be backported to the parent repo or cherry picked in other forks.
Even better were if @neuecc would give the ownership

@firephantomassasin some nice candidates are mentioned above in this thread, if you not tried them already

Yeah. There was a guy above how knows him and told he will work on it later maybe.
I also tried to email him the one email address i found, but no answer.

Aren't you concerned that MSFT will eventually make their new JSON serializer almost as good? And given that MessagePack serializer v2 was integrated with SignalR by @AArnott with @neuecc help, I won't be surprised if the same machinery that makes this library fast will be employed by System.Text.Json and @neuecc may also also participate in that. Personally, I stopped touching/improving my fork after the initial version of System.Text.Json. From one side, it's already good and complete enough; from another, maintaining something for which MSFT has it's own competing solution is not a great idea.

This library is not friendly to Span & Co and is based on byte[], which requires allocations and copying and byte array pooling if original data is in native memory or Span/Memory. MessagePack v2 works with IBufferWriter<byte> and ReadOnlySequence. Maybe a similar work is planned for this library. A fork will inevitably diverge here.

The worst thing in this story is the silence and absence of a roadmap from the author.

if it were my commercial project maybe I would close the sources exactly for this reason. MSFT has multiple times proven they understand nothing from doing open source, or supporting communities. This alone, and looking around, renders .NET as an almost obsolete plattform for me, even with all improvements last years.

But otherwise I would only welcome any improvements in BCL, cause I can only profit from it. System.Text.Json has a long way to the feature parity to Newtonsoft and in performance to this and some other libraries.

This doesn't stops me in any way to contribute to the libs I use, even it it is difficult to contribute.
Short answer - I don't care ))

I can't speak for @neuecc, but I'd guess this repo works for him as-is so his priority is low to improve it. MessagePack v2 happened because I wanted to pick it up for Visual Studio but couldn't in its v1 form. I started that with a fork and tried to keep communication open as much as possible with the original author to increase the odds of the fork eventually merging back into the main repo, which turned out to be a success story.
So far I haven't had a need for a super-fast JSON parser, so my attention hasn't turned to this repo (I'm only here because @buybackoff mentioned me). But if you really like it and intend to use it in an app such that you'll keep caring for it, and if you can't get the author to respond, if I were you I would proceed with the fork and improve it. For the first while I'd try to maintain it in the spirit of the original author in case a merge in the future is possible. But after some period of time (e.g. when my fork becomes more popular than the original) and the author still hasn't responded, I would start taking it in my own preferred direction.

Thanks @AArnott for replying and MsgPack v2 background! I mentioned you in case you know something more by being in contact with the author and from MSFT :)

BTW, interesting tweet from today showing difference of @giraffe-fsharp in Techempower benchmarks when using Utf8Json. https://twitter.com/dustinmoris/status/1309133995796463623

Maybe @dustinmoris could add something from F# side and join the discussion if Utf8Json is important for Giraffe.

@buybackoff Good stuff! And would worth considering switching/supporting Span.

Hey, thanks for tagging me. Interesting thread.

It's a pitty that @neuecc hasn't responded here or to any other issue yet. I totally understand that there are many good reasons why someone doesn't have the time/passion/motivation to work on an OSS project any more, but at least responding to these threads would have been helpful :(

I just hope that it's only a complete lack of interest and not something else which prevents @neuecc from responding here. Wishing him the best of luck!

Regarding an official fork I agree with the advice given by @AArnott.

In order to support an official fork I would like to see the following things:

  • Official fork rather than a copy of the repo, so that we can preserve the original history and also to properly attribute the original author of this project
  • The official fork could be easier PR'd back into this repo if @neuecc does want to take an active role again
  • Keep the original README with a notice at the top that it is an official fork and gradually also invest into improving the README and further documentation to strengthen the official fork (the official fork must become a one stop show. If a user has to go back and forth between the original repo and the fork in order to figure out how Utf8Json works then it's not good enough)
  • Decide on an official name and promote it consistently across the board from naming the repo accordingly, namespaces and NuGet package names. I think there's generally two options: Either go with Utf8Json+ (e.g. Utf8Json.V2) or a complete new name like e.g. ZeroJson (for zero allocation Json or similar), so that the fork can over time become it's own project with a forward looking name, which sounds good, can attract .NET developers and which would make things like having a website, wiki and other things easier possible as the name is simpler and less clunky than Utf8Json.V2.
  • Start the official fork as part of a new OSS organisation on GitHub so that it can become a community project rather than a single person's project. It's ok if just one person champions it in the beginning, but it would make it easier to share the maintenance burden over time with more people and allow power transfers more easily in case we run into the same problem again. We don't want to have Utf8Json.V3 creeping up one day :)

EDIT:

BTW I really like Utf8Json and I'm happy to assist with anything which needs to be done in order to get this thing properly off the ground. I didn't look into the ins and outs of this project enough to feel comfortable to maintain it myself, but I'd happily help in shaping the initial OSS organisation, writing some docs and helping with other things like setting up a GitHub Actions pipeline to deploy directly to NuGet on official releases and doing other "maintenance" work to help @vilinet or whoever wants to take the leadership of this project.

i think @vilinet can initiate a clean fork from this repo and re-apply some of your changes again and let's start from there the official fork version of this repo? This library should have a new official fork version, this is a nice library for .net.

@dustinmoris Very great comment!

So we could create this free organization and having this as one of its projects.
We just need a 2 good name :) github organization name and a nice project name.
I hope someone will come up with some good name.

Re name, I think a new name might have more longevity than trying to version Utf8Json. The reason why because if in a year's time the fork has become the new official repo because @neuecc hasn't responded or stepped in since then, then the project will likely start taking complete new shapes which will over time have less and less resemblance with Utf8Json and in 24 months you'll regret that the library is called Utf8Json.V2 and has a NuGet version of v5.1.2 or something like that.

I would suggest something like ZeroJson because for three reasons:

  • It aligns with @neuecc naming of other projects (e.g. https://github.com/neuecc/ZeroFormatter), so it's a rename which I think he could support of if he as to step in again
  • It's a really easy name and personally I think it sounds good
  • There is no NuGet package with that name yet

Sounds cool definitly. There is only a zeroJson user on github but it may not be that confusing.
Also the organization can be ZeroJson?

i think @vilinet can initiate a clean fork from this repo and re-apply some of your changes again and let's start from there the official fork version of this repo? This library should have a new official fork version, this is a nice library for .net.

I think a lot of things in this discussion are for creating a fork for the sake of creating a fork. There should be some plan/roadmap for what the new fork should include and wants to achieve.

There are several existing forks:

  • Elasticsearch-net: their work on the fork was visible, they were adding a lot of usability things, such as member attributes a la JSON.NET. Unfortunately, they copied the source from the fork to their main repo. Their fork with modifications is here, it has multiple PRs merged and some additional stuff. // cc: @russcam
  • In my fork, I did a lot of low level work to make this library working with native memory and make it even faster (things like inlining, SIMD char search, etc). Original (this) implementation works with byte[] and have some tricky bugs. It does not know about payload length, which prevents using an array pool or native memory without cleaning in some edge cases. E.g. if a pooled buffer contained "1.23456" and was not cleaned, and then we write "7.89" to it and try to deserialize, the returned value will be "7.89456". This library expects zero as the end of string. Adding support for buffer length required some low level changes. Yet, I couldn't do that with Span because it's ref struct and changing dynamic IL was not trivial. I went with a "normal" unsafe struct with pointer+length because it suited my requirements, but this in turn requires pinning when working with byte[]-based memory. Again, it was OK for me, but a proper support for Span or ReadOnlySequence without pinning is required. And this requires someone comfortable with rewriting dynamic IL implementation. Debugging this is not a pleasant thing.
  • This forks network map has multiple forks with a lot of continuous changes. I bet some of them are useful (yet many are duplicate bug fixes). There should be some agreed upon understanding what kind of changes should go into the new fork from there. Probably the fork authors should be explicitly pinged if their commits are included.

'+ PR bug fixes.

One thing that would turn me off from the new fork is if there will be changes for the sake of changes, e.g. no improvements in performance or functionality, but beautification and/or refactoring, moving code around for "better organization". It's so easy to do that with VS/R#/Rider in one click, but quite soon merging would be a mess. That should be done only after the fork is proven to be independent and maintained by multiple people.

E.g. even if the project is renamed to XxxJson, it should be the very last step to change the namespace everywhere. That change is almost as bad as copying the source, despite all the modern merge/diff tools. For that reason, I think Utf8Json2 is not a bad name.

I just hope that it's only a complete lack of interest and not something else which prevents @neuecc from responding here. Wishing him the best of luck!

That's a very kind thought. @neuecc is fine. He's active on MessagePack still, at least. I'm sure it's just a priority thing.

Keep the original README with a notice at the top that it is an official fork

I'm not sure what constitutes an "official fork" if you haven't got the blessing of the original repo author. Perhaps "active fork with several contributors" would be more accurate and still very encouraging to newcomers.

Start the official fork as part of a new OSS organisation on GitHub so that it can become a community project rather than a single person's project. It's ok if just one person champions it in the beginning, but it would make it easier to share the maintenance burden over time with more people and allow power transfers more easily in case we run into the same problem again. We don't want to have Utf8Json.V3 creeping up one day :)

Orgs can be abandoned if all the committers lose interest just as well, so I don't personally see that as a mitigation. Repos on personal accounts can have many collaborators as well, and if the sponsoring account's user loses interest, they can always transfer the repo to another user or org at that point, and github performs all the 301 redirects automatically so you don't lose traction. GitHub is all about open source, yet we don't see an org behind every repo. So I'm not against the idea, but it feels overkill to jump into it too fast.

I would also defer the rename of anything (nuget package, namespaces, etc.) till later. Invest in the fork, make it a worthwhile repo to maintain and tempting for the original author to merge back in. In the meantime, push with the original nuget package ID to some alternate feed somewhere (Azure Pipelines offers free NuGet feeds and I think GitHub does too) so you all can consume it. I would suggest you keep it with unstable versions (rather obviously forked ones, like 1.2.3-fork.5) until the ID changes or it merges back in.

One thing that would turn me off from the new fork is if there will be changes for the sake of changes, e.g. no improvements in performance or functionality, but beautification and/or refactoring, moving code around for "better organization". It's so easy to do that with VS/R#/Rider in one click, but quite soon merging would be a mess. That should be done only after the fork is proven to be independent and maintained by multiple people.

+10 to this, @buybackoff

Thanks for the ping @buybackoff.

Elasticsearch-net: their work on the fork was visible, they were adding a lot of usability things, such as member attributes a la JSON.NET. Unfortunately, they copied the source from the fork to their main repo. Their fork with modifications is here, it has multiple PRs merged and some additional stuff. // cc: @russcam

The Utf8Json fork that the client is using is in

https://github.com/elastic/elasticsearch-net/tree/7.x/src/Elasticsearch.Net/Utf8Json

The separate nullean/Utf8Json repository was originally used and IL-merged into the client, but then it became easier to manage by including the source in Elasticsearch.Net project. The fork in the project is quite different in places to fit the client's needs, and contains additional fixes such as handling all allowed offset formats in ISO8601 Date string formats (i.e. [+-]hh, [+-]hhmm in addition to [+-]hh:mm), and removes components that are not relevant to the client's usage.

So, I've been following this thread since the get-go. Is Utf8Json-v2 a go or not? The reason I'm asking if this v2-repo is still happening because it was created, then hidden, then public again, now it's gone again. Would love to see this project kept going strong ๐Ÿ‘

Since nothing has happened on this for a while and I have a need for it for my company and its projects, I've forked it.

Right now, I've favoured the use of netstandard and dropped the old frameworks. Executables and tests are targeting .NET 5.

I've already pulled in a few PRs such as #217 #129 #172 #193 and will be looking at bringing more in soon. Not sure when a nuget package will get built, what its version number will be, its name etc, but it will be soon. As such I welcome conversations from those with an interest in this project and need a working library. If you want to help maintain it, get in contact with me. Would like to build a consensus on what is the most pressing issues, get it all tested and working for everyone's platforms and sort those issues as they come in and then take it from there.

  • Gordon.

@cryptisk-grs I think it's a good case to activate discussions in your new repo

For folks wanting an up to date working package, I've released what I'm calling v1.4.0 and is now available on NuGet. See here for more information. https://github.com/Cryptisk/Utf8Json/releases/tag/v1.4.0