neuml/paperai

Installation issues

albertY-C opened this issue · 16 comments

The system would report issue with "UnicodeDecodeError: 'gbk' codec can't decode byte 0x82 in position 12007: illegal multibyte sequence" when I execute this command "pip install paperai". I wonder if WINDOWS SYSTEM cannot decompress tar.gz-type packages.
微信图片_20201215222928

I've seen errors like this when wheel isn't installed. Can you try:

pip install wheel

GitHub Actions is being used for continuous integration. The build script may be helpful for debugging Windows installs: https://github.com/neuml/paperai/blob/master/.github/workflows/build.yml

#19 is another active issue right now working on a Windows install. That may also be worth a review.

Thank you very much for your advice;

  • I have already tried the first solution. However, I still get the same error as before.
  • I am not very sure where this script is executed.

The script is executed each check in via GitHub Actions. I would try to just do "pip install mdv" to help narrow down the issue.

Are you using Python 3?

Yes, I am, the version is PYTHON 3.8.0

The issue is codec related, which is always a nagging issue with Windows.

Can you run the following in your command shell and try to reinstall?

set PYTHONUTF8=1

Hi @davidmezzetti , thank you for your help. I have solved the CHARACTER-SET problem.
But, there is a new issue about "Building wheel for hnswlib (setup.py) ... error". I had tried command "pip install hnswlib", however,
it failed. Can you tell me how to install this package? Thanks.

Are you sure that is an error? I've seen that error before but the package still gets installed.

A workaround would be to install pip 20.2.4 if the error is preventing the install

Thank you for your advice. I found some packages from the site "https://www.lfd.uci.edu/~gohlke/pythonlibs/#lxml" such as FASTTEXT, ANNOY and so on. Regrettablly, I didn't find HNSWLIB and I am trying to fix this problem.
WeChat_20201219224224

You need to install C++ build tools. Please see this: https://github.com/neuml/txtai#troubleshooting

This will fix all the issues you've had with building native packages

Thank you @davidmezzetti very much for your patient help. I have successfully installed PAPERAI.
And I have summarized the problems I encountered during the installation, as follow:

"UnicodeDecodeError: 'gbk' codec can't decode byte 0x82 in position 12007: illegal multibyte sequence" error.
( Solution: execution command 'set PYTHONUTF8' in CMD to change the pyhon character-set. )
Some installing package errors such as "ANNOY HNSWLIB FASTTEXT".
( Solution: we can install C++ build tools in "https://visualstudio.microsoft.com/downloads/#build-tools-for-visual-studio-2017" to get these extended package.)
You can also get some source package in "https://www.lfd.uci.edu/~gohlke/pythonlibs/#lxml", but this website does not contain all packages. However, you can choose the packages you need as your willing.

Maybe i think this problem is caused by windows platform @albertY-C Thanks for your summary.

after i used this command, still having this problem :UnicodeDecodeError: 'gbk' codec can't decode byte 0x82 in position 12007:

so did you meet this situation?

@LeoLRH-Grad , sorry about that, the command should be "set PYTHONUTF8=1". Good luck...

Thank you for documenting this. Hopefully, it helps others who try to install on Windows in the future!

Windows Problemshoooting

Well I have encountered the same gbk 0x82 12007 issue, sadly the "set PYTHONUTF8=1" didnt help me.
I have looked into the mdv setup file, where the error come from, it shows already it was using utf-8.
I think it could be possibly caused by the outdated version of mdv, so I used a forked project mdv3, which supports python 3.8 and 3.9.
It solves the gbk issue but I have been facing the error "OSError: [WinError 193] %1 is not a valid Win32 application"(I have already installed MS Build 14 and ran the comand there)
So I have continued my research, some says it was because of the Python was 64bit
I have installed the 32 bit but there was a new error about PEP 517 and there is no MS buildtools.

The error contines...but it's not the same anymore
this time it shows "fatal error C1083: Cannot open include file: 'basetsd.h': No such file or directory"
so I looked up, someone on the Stackoverflow has said that might be the problem of not installing the Windows SDK.
I installed it and tried pip hnswlib agagin and bam! hnswlib was installed perfectly now!
Thenn...there was a new problem while installing the torch...I just cant get the packages to install no matter do it locally or with the official pip command.(maybe due to lacking of CUDA enviroment)

So I just use Anaconda to redo the whole process, installing torch was a piece of cake.
Actually except for the mdv problem, there was only the 3 packages were not able to be installed just as @albertY-C has mentioned. Just use the file he give, that would already work, as for HNSWLIB just download and install it locally with pip.
In some period of installing these 3 packages there were still a problem caused by the MS C++ Build 14 , just follow this answer, it'll fix your problem.

Glad you were able to work through these issues. There is a task in the backlog to upgrade mdv (#21) to avoid this manual step.

Windows installs do work but it can be more challenging. The GitHub actions script is a good reference - https://github.com/neuml/paperai/blob/master/.github/workflows/build.yml

Installing via the DockerFile is another option to consider - https://github.com/neuml/paperai/blob/master/docker/Dockerfile

Thank you for documenting this!

Glad you were able to work through these issues. There is a task in the backlog to upgrade mdv (#21) to avoid this manual step.

Windows installs do work but it can be more challenging. The GitHub actions script is a good reference - https://github.com/neuml/paperai/blob/master/.github/workflows/build.yml

Installing via the DockerFile is another option to consider - https://github.com/neuml/paperai/blob/master/docker/Dockerfile

Thank you for documenting this!

Yeah you are right, I switched to WSL Unbuntu later, it works like charm