ofajardo/pyreadr

ImportError: DLL load failed while importing librdata: Can't find the specified module.

edg956 opened this issue · 39 comments

Describe the issue

Trying to use this package I got the same error as #23. I followed the steps instructed in the issue at pyreadstat referred to in #23, but still can't get it to work.

I have found all DLLs and added to PATH and made sure that the system can find them using Dependencies. See the screenshot:

DependenciesGui.exe screenshot

I had to set the PATH environment variable in order to Dependencies to find them. Here's the code printing the PATH environment variable and the PYTHONPATH environment variable

PATH environment variable

PYTHONPATH environment variable

But importing pyreadr still fails

import pyreadr
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\Eugenio\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\pyreadr\__init__.py", line 1, in <module>
    from .pyreadr import read_r, list_objects, write_rds, write_rdata, download_file
  File "C:\Users\Eugenio\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\pyreadr\pyreadr.py", line 10, in <module>
    from ._pyreadr_parser import PyreadrParser, ListObjectsParser
  File "C:\Users\Eugenio\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\pyreadr\_pyreadr_parser.py", line 17, in <module>
    from .librdata import Parser
ImportError: DLL load failed while importing librdata: No se puede encontrar el módulo especificado.

To Reproduce

  1. Install python 3.9 as a Windows app.
  2. Install pandas and pyreadr with pip (python -m pip install pandas pyreadr)
  3. Open a PowerShell terminal
  4. (Optional, doesn't really make a difference) Set PYTHONPATH $Env:PYTHONPATH = $Env:PATH to match PATH, which points to the folders with the DLLs
  5. Start a python shell
  6. Run import pyreadr

Expected behavior

Import should not fail.

Setup Information:

  • How did you install pyreadr? (pip, conda, directly from repo)
    python -m pip install pyreadr
  • Platform (windows, macOS, linux, 32 or 64 bit)
    Edition Windows 10 Pro
    Version 21H2
    Installed on ‎20/‎05/‎2021
    OS compilation 19044.1348
    Experience Windows Feature Experience Pack 120.2212.3740.0
  • Python Version
    3.9
  • Python Distribution
    Plain python from microsoft store
  • Using Virtualenv or condaenv?
    No

To whomever it may help: executing os.add_dll_directory to add the directories where the DLLs reside helped me to import this package.

I'm leaving open as it appears that the DLL import mechanism changed with python 3.8 (see this SO answer) and maybe the maintainers can provide a more appropriate solution.

thanks a lot for the report and the working solution. I'm going to read about the new dll importing mechanism, but it sounds as if I do the os.add_dll_directory in the init.py for example the problem may be solved. I'm going to test.

hi @edg956 , I have prepared a new version 0.4.4 where I do the os.add_dll_directory on the init.py, but I have not been able to test it since my access to windows is reduced. I am wondering if you would be so kind of testing it on your setup?
If yes, you can find the new wheels here:
https://anaconda.org/ofajardo/pyreadr/files

please download the one appropiate for your python version and tell me how it goes! if it works I release it on pypi.

I managed to test and unfortunately for me it is not working. Also adding manually the folder site-packages\pyreadr to os.add_dll_directory, windows PATH and sys.path neither alone or in combination helps. AM I missing something?

Hi @ofajardo. Thank you for the support in this. I will try to replicate the issue this weekend and get back to you with any findings!

If it's of any help, there were some DLLs (e.g.: python38.dll) which was in the system-wide python installation directory, instead of the user specific directory. That is: C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.X.X.XXXXXXXXX. And that's installing it as a Windows App. Not sure how it behaves when installing python any other way.

OK I know what is happening:

The problem is that depending how you install Python, the folder structure of where the packages get installed looks different. For instance, if you install with miniconda, you get it in Miniconda3\Lib\site-packages\pyreadr. Then my installer copies into this folder the python code and the dlls. Everything works.
Now, when installing from windows app store, what I get is that the package is installed into a weird place LocalCache\local-packages\Python39\site-packages\pyreadr while the dlls are installed into a different place LocalCache\local-packages\Lib\site-packages\pyreadr, that is, the dlls are not togheter with the python code anymore. That is why it cannot find the dlls. The solutions are either copying the dlls where the code is (then nothing else is needed), or to do os.add_dll_directory to the the directory where the dlls are.

The definitive solution would be to able to copy the dlls (data) to wherever the code goes and not somewhere else, automatically, without having to guess where the system is going to install them. No idea how to do that at the moment, but suggestions are welcome.

I agree. That's the real issue with the definitive solution. I am not sure if python sets that sort of information in environment variables, it anyway places them there for a reason... Not sure if I can commit to come up with something this weekend, but I'll try.

OK I was installing/copying the dlls as data_files, as that seemed to me to be the most appropiate, but then it is very difficult to control where the files are copied to. I will try switching to copying the dlls as data, which is not so nice, as they are not data, but at least the dlls will appear in the correct installation folder.

This stuff is complicated. If I put the dlls directly on the package folder (pyreadr) and say setup.py to treat them as package data, then on windows it works like a charm, you can import without having to do anything else as the dlls are effectively copied to the package directory. However, doing this breaks compilation on Mac for some reason.

Putting them into a subfolder into the package folder pyreadr does not work as the dlls are copied but they are not found at import time. The folder can be added with os.add_dlls_directory, but that does not work for python < 3.8, which is a problem right now. For python < 3.8 adding to sys.path or os.environ['PATH'] does not solve anything.

Using the data_files mechanism, I have not found a way to reliably find out where exactly the package is going to be installed. It is easy to get the root of the installation, but the subfolders will change depending what python package you installed (windows app, conda, manually installing from python.org) etc.

Any insight on how to do this properly would be appreciated.

apparently the way Miniconda does is the cannonical and the windows app is the one who is doing something very strange. Still, there will be differences depending if you install for your user only or as admin, if you install in a virtual env or a conda env ... lots of variables, and in the meantime it seems that there is no good way to introspect the path where the package is going to be installed before it is installed ... and also no way to make data_files to automatically go into the package installation folder.

I use python 3.8 with minoconda on windows 10, seems to have the same issue.

image

aha, can you please check if in the site-packages\pyreadr folder there are the *.pyd dlls? if they are not there can you see where they went? If they are there, are you running python with the "Anaconda" command window? That one has all the paths fixed etc.

if in the site-packages\pyreadr folder there are the *.pyd dlls`

yes, there is *.pyd files in the folder.
image

are you running python with the "Anaconda" command window

Not sure what is "Anaconda" command window, so probably not. I'm running from miniconda activate.bat, to activate in virtual environment.

And since it's running from virtual environment, not sure why it installs to system python.

OK it seems as you said that your problem is that you are not using miniconda but the system python. What I would suggest is that you use the "Anaconda prompt" all the time, see screenshot. Use it for everything, for installing the library, to activate the virtualenvironment and to use ipython.
miniconda

In that case, no, I didn't use system python, and never did.

I use anaconda prompt, just running with a shortcut to its activate.bat, instead of from menu, but they should be the same thing. And installed from conda environment.

weird. doing os.add_dll_directory solves the issue? if it does, then I could include that in the init.py ... actually maybe you can try that for now and if it works I can put that in the next release.

Which path to pass to os.add_dll_directory? The dll's absolute path? Tried the dll installation path, doesn't seem to work for me.

image

OK sorry, I got confused. The dlls that are missing in your case are zlib.dll, iconv.dll and liblzma-5.dll, which is the same problem as it was initially reported on this issue, i.e. that the dlls get not installed in the same folder as the package. Could you please locate those files in your system? That is the folder you have to add os.add_dll_directory.

The fundamental problem is that it seems not possible to predict where the package is going to be installed. Without that we cannot fix it, and you have to manually discover where the dlls went and add that with os.add_dll_directory.

I already see that the location of your installation is unconventional for Miniconda, I wonder why. Typically it is C:\Users<username>\Miniconda3\lib\python3.8\site-packages. OK, maybe C:\Users<username> can be somewhere else, and Miniconda3 can be another name, but I wonder why not lib\python3.8\site-packages.

The unconventional folder is where I found the dll it installed to, so I added in os.add_dll_directory.

The installation for miniconda is in typical location
image

ok that is already strange, I don't undertand why the package was installed to the unconventional folder. But besides that, in the screenshot you sent, I do not see the dlls in the uncoventional folder, that means they should be somewhere else ...

Another solution would be to copy the files in this repo in the folder https://github.com/ofajardo/pyreadr/tree/master/win_libs/64bit into the pyreadstat folder in your system.

@ofajardo, I had the same issue: "DLL load failed while importing librdata: The specified module could not be found" Problem solved by copying 6 dll and lib files here: C:*\AppData\Roaming\Python\Lib\site-packages\pyreadr
and pasting them here: C:*\AppData\Roaming\Python\Python39\site-packages\pyreadr

thanks for the feedback!

Same issue, but IDK how this can be dealt with

This works for me!

@MrBeike any information on OS and library versions?

@MrBeike any information on OS and library versions?

os:windows10 22H2
pyreadr:0.4.7

edg956 commented

@ofajardo did you think of adding a step to download the DLLs from a well known source (i.e this repo) and calling os.add_dll_directory on the directory where they are downloaded? Similar to what nltk does, but maybe less interactive and more straight forward.

Another option, of which I've got very little clue being honest, could be setting up a post install a script in setup.py to download the files to a defined directory that's ensure to be discovered later. I'm not sure how that'd play out with conda or other tools.

@edg956 downloading would work I guess, that is a good idea. I would say by default I would try to import the python modules that require the dlls, if the import fails because of a missing dll, then suggest the user to download manually. The manual download would go to the directory where the package is installed by default. If that fails, the user would have the option to download to a folder of her choice, but then she has to call os.add_dll_directory herself.

What do you think of this?

The idea is that the dlls should be ideally in the package directory so that you don't have to call os.add_dll_directory. For that reason I don't like the post install script idea, in fact I already have one, which downloads the files somewhere ... and people seem to still fail to find add them.

edg956 commented

Yeah I reckon it's far from perfect but a patch that could work. I'd say defaulting to downloading the DLLs if they're not found is a good way to go. And only in windows I assume.

Nevertheless, I started reading through this thread again and this got me interested:

This stuff is complicated. If I put the dlls directly on the package folder (pyreadr) and say setup.py to treat them as package data, then on windows it works like a charm, you can import without having to do anything else as the dlls are effectively copied to the package directory. However, doing this breaks compilation on Mac for some reason.

Putting them into a subfolder into the package folder pyreadr does not work as the dlls are copied but they are not found at import time. The folder can be added with os.add_dlls_directory, but that does not work for python < 3.8, which is a problem right now. For python < 3.8 adding to sys.path or os.environ['PATH'] does not solve anything.

Using the data_files mechanism, I have not found a way to reliably find out where exactly the package is going to be installed. It is easy to get the root of the installation, but the subfolders will change depending what python package you installed (windows app, conda, manually installing from python.org) etc.

Any insight on how to do this properly would be appreciated.

You say that setting the dlls in the package folder worked like charm. If I understand correctly that means placing all DLLs under the directory pyreadr, right? Could that be explained here?

If the previous is right, then the problem is that it broke the Mac build. Is it possible to remove them for Mac (or more precisely, only keep those files in Windows) in setup.py?

Just want to see if there's another way to access these files and not resort to downloading them.

I cant remember the details, but I think I tried to do what you suggest and failed.

@edg956 I tried again your suggestion of having packaging data for windows but not for macos/linux and this time I succeeded. Even if the documentation says you can control which files should be included and which ones excluded, in my hands that does not work, but it is an all or nothing process. That means for unix the source files will not be included in the wheel, which is not recommended, but well, the package still works like that so I guess we can live with that. It is also bad because I cannot include test files in the package, which is the only way for conda to run tests, but again we can live without that.

So, here there are the new wheels and it would be great if you could test if in your hands the problem is solved. If it is, then I have to polish a couple of other things, but then I would do a release with these new wheels: https://anaconda.org/ofajardo/pyreadr/files

edg956 commented

Got it. I will try to test it this week. Is this only published in anaconda.org? I remember I installed it through pip

yes, right now it is only in Anaconda, because to put into pip you have to make a new version, and once a version is done you cannot change it. I would like to get it tested before doing a new version, that means then you would need to manually download the wheel from Ananconda, and then you can run pip for the wheel on disk.

edg956 commented

It'll be difficult for me this weekend 😞. I'll keep you updated.

no problem, do it at your best convenience, and thanks actually for the suggestions and testing!

edg956 commented

So I have tried both in Windows (python installed from the microsoft App store) and Ubuntu (wsl) and it worked (cp39). I more or less followed the steps to reproduce this issue:

  • Install pyreadr (from the wheel you linked me to)
  • Created a script that runs import pyreadr; print("Hello, world!")
  • Ran python script.py and got the expected message (meaning, importing pyreadr works now)

Let me know if you want me to run any other test

thanks a lot for testing, I think that demonstrates the wheels work. Next I will do a release and publish to pypi. Only after I get a new release I can test conda forge. If everything works as expected ,then that is the end of this, if for some reason conda does not work I have to adjust that including the possibility to have to roll back the changes in order to get conda to work. Hopefully that is not the case!

the latest version 0.4.9 is out and should solve the issue. Conda is also working. I close this but feel free to reopen if more issues are detected (and also please to provide feedback if it is working =) )