PyYoshi/cChardet

Support Python 3.10

decaz opened this issue ยท 17 comments

decaz commented

... and prepare for Python 3.11 (dev).

Any news on making this library Python 3.10 supported?

After 6 months, will that PR get merged or even looked at

Future updates like python 3.10 support coming or the project is dropped?

it seems as though cchardet has been abandoned, yet it is depended on by many large projects

Are there any good alternatives to cchardet? If the repo is not getting any love then quite a few projects will need a replacement.

^, it is possible to install gcc but this wont be the biggest issue forever, 3.11/12 may somehow break this

^, it is possible to install gcc but this wont be the biggest issue forever, 3.11/12 may somehow break this

I assume this is for *nix users, I'm on Windows and it keeps throwing up the 'C++ 14 Required' error when I try to install. I assume because for Windows it's trying to compile using C++ instead of gcc

Does anyone know if I can manually compile this on Windows using my MinGW gcc install? I'd rather not download multi GB's of Visual Studio just for one python package.

^, it is possible to install gcc but this wont be the biggest issue forever, 3.11/12 may somehow break this

I assume this is for *nix users, I'm on Windows and it keeps throwing up the 'C++ 14 Required' error when I try to install. I assume because for Windows it's trying to compile using C++ instead of gcc

Does anyone know if I can manually compile this on Windows using my MinGW gcc install? I'd rather not download multi GB's of Visual Studio just for one python package.

@NebularNerd yes this was *nix, here are steps i found for windows using mingw

  • add C:\MinGW\bin to PATH
  • edit PYTHONPATH\Lib\distutils with a distutils.cfg file containing
[build]
compiler=mingw32

https://stackoverflow.com/a/5051281

I ended up here when I was upgrading the project's python version and started hitting up against errors involving this package in pip.

Are there any good alternatives to cchardet? If the repo is not getting any love then quite a few projects will need a replacement.

It depends on what you're trying to do. There's an MIT licensed package called charset_normalizer many seem to have switched to.

charset_normalizer focuses on providing you the actual text content in usable, unicode form.

Whereas, it seems like cchardet focuses on trying to tell you what a text file is encoded in. In a project I'm working on, this detected encoding is attempted to be used with an open().

charset_normalizer is like, "why bother with determining the exact encoding scheme?"

Instead it figures out the most likely original encoding scheme to result in successful decoding and encoding to useable text content.

If you look, it is specifically compared with this package and calls out this package, cChardet's apparent use of a cpp binding. It also claims it has higher accuracy but possibly less speed.

Thanks @ooliver1 and @banagale for your replies. I'm going to take a good look at charset_normalizer as anyone having to install gcc just to compile cChardet for my small Subtotxt script seems a trifle excessive.

In the meantime I'll compile it with gcc as an interim bodge.

It was working for me on Python 3.10, but now fails to install on Python 3.11:

/usr/bin/clang -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -pipe -Os -isysroot/Library/Developer/CommandLineTools/SDKs/MacOSX12.sdk -Isrc/ext/uchardet/src -I/opt/local/Library/Frameworks/Python.framework/Versions/3.11/include/python3.11 -c src/cchardet/_cchardet.cpp -o build/temp.macosx-12.0-x86_64-cpython-311/src/cchardet/_cchardet.o
src/cchardet/_cchardet.cpp:196:12: fatal error: 'longintrepr.h' file not found
  #include "longintrepr.h"
           ^~~~~~~~~~~~~~~
1 error generated.
error: command '/usr/bin/clang' failed with exit code 1

EDIT: Manually installing cython beforehand seems to fix the issue (possibly related to cython/cython#4461).

There are 2 PRs #78 and #80 that will address this. @PyYoshi, can you merge and release a new build, please?

There are 2 PRs #78 and #80 that will address this. @PyYoshi, can you merge and release a new build, please?

It is pretty established they have abandoned cchardet, see the PRs you referenced, #78 is nearly 1 year old.

It is pretty established they have abandoned cchardet, see the PRs you referenced, #78 is nearly 1 year old.

Indeed. It's unfortunate since right now many downstream dependencies can't be completely installed with Python 3.11 due to build issues.

At this stage it's come down to either moving to charset_normalizer or if someone is willing to, fork this and make cchardet-ng or similar.

Might want to take a look at this: https://github.com/faust-streaming/cChardet

pip install faust-cchardet

I support Python 3.10+3.11 now, so we're good. I'll open a PR so that some day if @PyYoshi comes back to this project, he can update this.