pearu/pylibtiff

Unable to open image with Chinese path using TIFF.open

yang-521 opened this issue · 46 comments

Is there any way to solve this problem

Could you provide some information about the issue? Any information at all? What error(s) are you seeing?

Whether or not this can be fixed comes down to if the error is from python (this pylibtiff package) or from the libtiff C library underneath.

This problem comes from pylibtiff You cannot use TIFF.open to open a file as long as the path contains Chinese
`from libtiff import TIFF

file_path = r"D:\python\tifpix\中文\2015-1-013-01.tif"
imgdir = TIFF.open(file_path)
print(imgdir)
`

Such an error occurs

TIFFOpen: D:\python\tifpix\中文\2015-1-013-01.tif: Cannot open.
Traceback (most recent call last):
File "D:\python\tifpix\tif11.py", line 4, in
imgdir = TIFF.open(file_path)
File "D:\python\tifpix\venv\lib\site-packages\libtiff\libtiff_ctypes.py", line 484, in open
raise TypeError('Failed to open file ' + repr(filename))
TypeError: Failed to open file b'D:\python\tifpix\\xe4\xb8\xad\xe6\x96\x87\2015-1-013-01.tif'

If the path or file name contains Chinese, it cannot be parsed. If there is no Chinese, the file can be opened normally

pearu commented

Could you try:

file_path = "D:\python\tifpix\中文\2015-1-013-01.tif"

?

I am unable to reproduce this on an Ubuntu system with these exact characters copied and pasted. I tried putting the characters in the filename and in a directory name. I tried with raw strings (r"") and without the r. Note that I was typing these exact unicode characters into my ipython session so as far as python was concerned it was normal unicode.

你能不能试试:

file_path = "D:\python\tifpix\中文\2015-1-013-01.tif"

I tried, but it didn't work

I checked the TIFF. open function. It seems that there is a problem with the coding method. I tried to change "mode. encode (" ascii ")" to "mode. encode (" utf-8 ")", but it has no effect

I tried, but it didn't work

It failed with the exact same error?

If you're talking about this encode line:

tiff = libtiff.TIFFOpen(filename, mode.encode('ascii'))

Then this is only controlling/changing the mode parameter, not the filename. The main part of the code is here:

try:
try:
# Python3: it needs bytes for the arguments of type "c_char_p"
filename = os.fsencode(filename) # no-op if already bytes
except AttributeError:
# Python2: it needs str for the arguments of type "c_char_p"
if isinstance(filename, unicode): # noqa: F821
filename = filename.encode(sys.getfilesystemencoding())
except Exception as ex:
# It's probably going to not work, but let it try
print('Warning: filename argument is of wrong type or encoding: %s'
% ex)

It is using this function to convert the unicode to a series bytes that it can pass to libtiff (C) and access the file on the file system:

https://docs.python.org/3/library/os.html#os.fsencode

What do you get when you run this function:

https://docs.python.org/3/library/sys.html#sys.getfilesystemencoding

But also, are you sure this is a valid file? Are you able to run (on a command line somewhere) tiffinfo your_file.tif? Depending on the size of the file can you give us a link it so we can test it out?

This is either a problem with your file, your system/environment, or libtiff/pylibtiff on Windows.

I am sure it is a valid TIF file, because if the file name and path do not contain Chinese, I can use any libtiff function for this file. This problem was found when I was traversing a folder. After testing, it was really impossible to open the path or file name containing Chinese. My system environment is WIN10, and I use Pycharm

I have tried to use TiffFile, which can read the path or file name containing Chinese

from libtiff import TiffFile file_path = r"D:\python\tifpix\中文\2015-1-013-01.tif" imgdir = TiffFile(file_path) print(imgdir.get_info())

But TiffFile can't meet my needs, so I gave up using it.

你能不能试试:

file_path = "D:\python\tifpix\中文\2015-1-013-01.tif"

我试过了,但是没用

Same mistake

Can you run this:

python -c "import sys; print(sys.getfilesystemencoding())"

and let us know what the output is?

You could also try running:

import os
print(os.fsencode(file_path))

As for the TiffFile stuff, that might not be a good test since that is pure python as far as opening the file. Something about the encoded filename as bytes being passed to the C TIFF library isn't going well.

Like I said in my previous comment, you could run tiffinfo your_file.tif on a command line if you can figure out where tiffinfo is installed (not sure how this is installed on Windows).

你能运行这个吗:

python -c "import sys; print(sys.getfilesystemencoding())"

让我们知道输出是什么

utf-8

print(os.fsencode(file_path))

b'D:\python\tifpix\\xe4\xb8\xad\xe6\x96\x87\2015-1-013-01.tif'

I'm not sure if "tiffinfo" is the "info" function. Here are the results

from libtiff import TIFF file_path = r"D:\python\tifpix\tif\2015-1-013-01.tif" pic = TIFF.open(file_path,mode="r") inf = TIFF.info(pic) print(inf)
This is the result of my modification of Chinese. If Chinese is included, TIFF file cannot be opened
export

filename: b'D:\python\tifpix\tif\2015-1-013-01.tif'
ImageWidth: 2240
ImageLength: 3112
RowsPerStrip: 3112
StripByteCounts: c_ulong(194677)
StripOffSets: c_ulong(8)
TileByteCounts: c_ulong(194677)
TileOffSets: c_ulong(8)
BitsPerSample: 8
Compression: COMPRESSION_JPEG
PhotoMetric: PHOTOMETRIC_YCBCR
PlanarConfig: PLANARCONFIG_CONTIG
ResolutionUnit: 2
JPEGQuality: 75
JPEGTablesMode: 3
XResolution: 300.0
YResolution: 300.0
ReferenceBlackWhite: [0.0, 255.0, 128.0, 255.0, 128.0, 255.0]

tiffinfo is a command line tool that you would run from a terminal/console, not from Python. It looks like .info gives the same information, but I was hoping you could try directly from the command line tool because that would make it obvious whether this is C tifflib or pylibtiff.

I'll have to think about the fsencode stuff.

On my own machine:

In [10]: file_path = "/tmp/中文/2015-1-013-01.tif"

In [11]: os.fsencode(file_path)
Out[11]: b'/tmp/\xe4\xb8\xad\xe6\x96\x87/2015-1-013-01.tif'

In [12]: os.path.exists(os.fsencode(file_path))
Out[12]: True

I wonder what you get if you do the os.path.exists(os.fsencode(file_path)) line?

在我自己的计算机上:

In [10]: file_path = "/tmp/中文/2015-1-013-01.tif"

In [11]: os.fsencode(file_path)
Out[11]: b'/tmp/\xe4\xb8\xad\xe6\x96\x87/2015-1-013-01.tif'

In [12]: os.path.exists(os.fsencode(file_path))
Out[12]: True

我想知道如果你做了os.path.exists(os.fsencode(file_path)) 线?

export:
True

As for "tiffinfo", I'll try how to use command line testing first, which I rarely do

tiffinfo 是一个命令行工具,可以从终端/控制台运行,而不是从Python运行。.info 提供了相同的信息,但我希望您可以直接从命令行工具尝试,因为这将使它明显是C tifflib还是pylibtiff。

我得想想fsencode的事。

The following is the result obtained with the command line tool

tiffinfo.cmd -r 2015-1-013-01.tif

export:
TIFF Directory at offset 0x2f87e (194686)
Image Width: 2240 Image Length: 3112
Resolution: 300, 300 pixels/inch
Bits/Sample: 8
Compression Scheme: JPEG
Photometric Interpretation: YCbCr
YCbCr Subsampling: 2, 2
Samples/Pixel: 3
Rows/Strip: 3112
Planar Configuration: single image plane
Reference Black/White:
0: 0 255
1: 128 255
2: 128 255

 leave out.................

TIFF Directory at offset 0x5cf5e8 (6092264)
Image Width: 2188 Image Length: 3100
Resolution: 300, 300 pixels/inch
Bits/Sample: 8
Compression Scheme: JPEG
Photometric Interpretation: YCbCr
YCbCr Subsampling: 2, 2
Samples/Pixel: 3
Rows/Strip: 3100
Planar Configuration: single image plane
Reference Black/White:
0: 0 255
1: 128 255
2: 128 255

Sorry, I really thought I responded to this already. I'm not sure what happened to my comment.

Would it be possible for you to rerun the tiffinfo command, but give it the whole directory with the Chinese characters in it?

Could you also do:

import sys

print(sys.getdefaultencoding())

Right now it seems like your file system is utf-8, your python interpreter it using utf-8, and Python knows that the file exists (that os.path.exists check you ran before returned True). So I don't see anything that should be causing an issue. 😕

Sorry, I really thought I responded to this already. I'm not sure what happened to my comment.

Would it be possible for you to rerun the tiffinfo command, but give it the whole directory with the Chinese characters in it?

Could you also do:

import sys

print(sys.getdefaultencoding())

Right now it seems like your file system is utf-8, your python interpreter it using utf-8, and Python knows that the file exists (that os.path.exists check you ran before returned True). So I don't see anything that should be causing an issue. 😕

After changing the path and file name to Chinese, I rerun the command-line tool and get the same result.

`import sys

print(sys.getdefaultencoding())`

export: utf-8

It looks like libtiff (the C library) has TIFFOpen which we're using, but also a TIFFOpenW for opening filenames that are unicode:

http://www.simplesystems.org/libtiff//functions/TIFFOpen.html#description

At this point I'm not sure how the code would need to be updated. It is very strange to me that this seems to work just fine on non-Windows.

pearu commented

It looks like this issue cannot be resolved by pinpointing a particular problem and fixing it.

However, here follows an idea for a possible workaround where TIFF.open would contain(untested):

        tiff = libtiff.TIFFOpen(filename, mode.encode('ascii'))
        if tiff.value is None and os.name == 'nt' and os.path.exists(filename):
            # see gh-152
            import tempfile
            with tempfile.NamedTemporaryFile() as tmp:
                tmp.write(open(filename).read())
                tmp.flush()
                tiff = libtiff.TIFFOpen(tmp.name, mode.encode('ascii'))
                # TODO: add a hook to remove tmp.name when tiff closes
        if tiff.value is None:
            raise TypeError('Failed to open file ' + repr(filename))
        return tiff

It looks like this issue cannot be resolved by pinpointing a particular problem and fixing it.

However, here follows an idea for a possible workaround where TIFF.open would contain(untested):

        tiff = libtiff.TIFFOpen(filename, mode.encode('ascii'))
        if tiff.value is None and os.name == 'nt' and os.path.exists(filename):
            # see gh-152
            import tempfile
            with tempfile.NamedTemporaryFile() as tmp:
                tmp.write(open(filename).read())
                tmp.flush()
                tiff = libtiff.TIFFOpen(tmp.name, mode.encode('ascii'))
                # TODO: add a hook to remove tmp.name when tiff closes
        if tiff.value is None:
            raise TypeError('Failed to open file ' + repr(filename))
        return tiff

I tried but another error occurred

file_path = r"D:\python\tifpix\中文\2015-1-013-01.tif" #I've tried both of them
file_path = "D:\python\tifpix\中文\2015-1-013-01.tif"

ctypes.ArgumentError: argument 1: <class 'TypeError'>: wrong type

pearu commented

Where exactly the ArgumentError is raised?

Btw, one might need to use tempfile.NamedTemporaryFile(delete=False) to avoid deleting the tmp file when leaving the with block.

ArgumentError究竟在哪里引发?

顺便说一句,你可能需要使用tempfile.NamedTemporaryFile(delete=False) 以避免在离开时删除tmp文件with 块。

tiff = libtiff.TIFFOpen(filename, mode.encode('ascii')) This location

pearu commented

This does not make sense. The line

tiff = libtiff.TIFFOpen(filename, mode.encode('ascii'))

is the original line and according to the issue description it should succeed, otherwise

raise TypeError('Failed to open file ' + repr(filename))
TypeError: Failed to open file b'D:\python\tifpix\\xe4\xb8\xad\xe6\x96\x87\2015-1-013-01.tif'

would not be reached.

Could you check if filename and mode types are valid?

pearu commented

Re:

file_path = "D:\python\tifpix\中文\2015-1-013-01.tif"

Notice that the substring \t is a tab, not \t.

Try:

file_path = "D:\\python\\tifpix\\中文\\2015-1-013-01.tif"

这说不通,这条线

tiff = libtiff.TIFFOpen(filename, mode.encode('ascii'))

是原始行,并且根据问题描述,它应该成功,否则

raise TypeError('Failed to open file ' + repr(filename))
TypeError: Failed to open file b'D:\python\tifpix\\xe4\xb8\xad\xe6\x96\x87\2015-1-013-01.tif'

将无法到达。

你能查一下filename 以及mode 类型是否有效?

The following is the code I tested

from libtiff import TIFF file_path = r"D:\python\tifpix\中文\2015-1-013-01.tif" imgdir = TIFF.open(file_path, mode="r") print(imgdir)
from libtiff import TIFF file_path = "D:\\python\\tifpix\\中文\\2015-1-013-01.tif" imgdir = TIFF.open(file_path, mode="r") print(imgdir)

Are all the same mistakes Strange Questions

I tried to re establish an environment for testing, but I couldn't complete the installation through pip.
“Building wheels for collected packages: pylibtiff
Building wheel for pylibtiff (pyproject.toml) ... error”
Then I remembered that the installation was completed through “.whl” files before
I don't know if there is a problem with the installation, but I can be sure that I can open a TIFF file without a Chinese path and read the data.

@yangyunlv What version of python are you using and what version of pylibtiff?

@pearu Theoretically we should be able to take the python str filename and instead of using TIFFOpen use TIFFOpenW with a wchar_t *. I've never used ctypes to convert python strings, but singe all str objects are unicode it might just make more sense in the long run to use wchar_t completely.

@yangyunlv What version of python are you using and what version of pylibtiff?

@pearu Theoretically we should be able to take the python str filename and instead of using TIFFOpen use TIFFOpenW with a wchar_t *. I've never used ctypes to convert python strings, but singe all str objects are unicode it might just make more sense in the long run to use wchar_t completely.

python3.8.10 pylibtiff-0.4.4-cp38-cp38-win_amd64.whl

I don't expect this to change anything, but could you try installing the newest version of pylibtiff (0.5.1)?

I don't expect this to change anything, but could you try installing the newest version of pylibtiff (0.5.1)?

After updating to version 0.5.1, "libtiff. dll" is missing. I am not sure where to obtain this file.
The error is as follows:“Could not find module 'libtiff.dll' (or one of its dependencies). Try using the full path with constructor syntax.”

How are you installing it and were there any errors during the installation?

您是如何安装的?安装过程中是否出现错误?

Through pip installation, an error was reported during the first installation due to the low version of Visual C++Build Tools. After the update, the pip installation was repeated without any errors

After the update, the pip installation was repeated without any errors

Ok so now your installation works and you are able to run your pylibtiff code? And you get the same error as before with the filename with Chinese characters in it?

After the update, the pip installation was repeated without any errors

Ok so now your installation works and you are able to run your pylibtiff code? And you get the same error as before with the filename with Chinese characters in it?

Although the update process did not prompt an error, the module cannot function properly now, and the dependency libtiff.dll" "is missing.". I have tried to redeploy the environment and install modules using a new computer and virtual machine, and this problem occurs. The 0.4.4 version does not have this problem
04DFCF54

Could you please copy/paste the command you are using to install the package "pylibtiff" and the output of running that command? You may want to try uninstalling it and trying to install it again to see what the output is.

The error about the low version of the C++ build tools is why the .dll is missing (most likely). My guess is the build is being "cached" so when you run the install command again it doesn't try to compile the extension because it thinks it already has. However, after looking at it, there shouldn't be a libtiff.dll created by the installation of pylibtiff. The error is actually coming from pylibtiff trying to find the C libtiff library.

In version 5.x of pylibtiff this pull request was added:

#149

In this pull request, changes were made to check for libtiff (C) with the newer name for the library tiff.dll, but it should still check for libtiff.dll (as your error message suggests). So...there must be something else going on here. Very confusing.

Could you please copy/paste the command you are using to install the package "pylibtiff" and the output of running that command? You may want to try uninstalling it and trying to install it again to see what the output is.

The error about the low version of the C++ build tools is why the .dll is missing (most likely). My guess is the build is being "cached" so when you run the install command again it doesn't try to compile the extension because it thinks it already has. However, after looking at it, there shouldn't be a libtiff.dll created by the installation of pylibtiff. The error is actually coming from pylibtiff trying to find the C libtiff library.

In version 5.x of pylibtiff this pull request was added:

#149

In this pull request, changes were made to check for libtiff (C) with the newer name for the library tiff.dll, but it should still check for libtiff.dll (as your error message suggests). So...there must be something else going on here. Very confusing.

Thanks, bro.
I'm gonna try to get it up and running

@yangyunlv @pearu @djhoese
Hello, I fixed this bug and open a PR #160
This issue only occurs on Windows, and Linux does not have this issue.
Because os.fsencode uses the wrong encoding.

It works on my system. Windows11 + Python 3.11

sys.getfilesystemencoding()      # return 'utf-8' , wrong encoding
locale.getpreferredencoding()  # return 'cp936', correct encoding

Thanks @One-sixth! I'm curious though, would this be considered a bug in Python's getfilesystemencoding()?

@djhoese I'm not sure. Because there are some descriptions in the manual.
https://docs.python.org/3.11/library/sys.html#sys._enablelegacywindowsfsencoding

Here is windows11 + python3.11 output.

import sys
sys.getfilesystemencoding()      # return 'utf-8'
sys._enablelegacywindowsfsencoding()
sys.getfilesystemencoding()      # return 'mbcs'
pearu commented

I wonder if defining PYTHONIOENCODING=cp936 environment variable is equivalent to the fix in #160?

@yangyunlv @pearu @djhoese Hello, I fixed this bug and open a PR #160 This issue only occurs on Windows, and Linux does not have this issue. Because os.fsencode uses the wrong encoding.

It works on my system. Windows11 + Python 3.11

sys.getfilesystemencoding()      # return 'utf-8' , wrong encoding
locale.getpreferredencoding()  # return 'cp936', correct encoding

Thank you very much. After modification, the Chinese path can be read normally

@pearu

Update. I just tried setting PYTHONIOENCODING=cp950, but it doesn't seem to have any effect.

test code

import locale, sys
s = '路径'
print(locale.getpreferredencoding())
print(sys.getfilesystemencoding())
print(s.encode('mbcs') == s.encode(locale.getencoding()))

test out

cp936
utf-8
True

I don't think so. CP936 is the encoding for simplified chinese windows. There are also traditional chinese CP950 and japanese CP932.

I found another method to use ‘mbcs’ encoding.
The 'mbcs' can dynamically links to active system code pages.

In simplified chinese windows
mbcs==cp936
In traditional chinese windows
mbcs==cp950
In japanese windows
mbcs==cp932

Like this (simplified chinese windows).

s = '路径'
print(s.encode ('mbcs') == s.encode('cp936'))   # True