Unable to open image with Chinese path using TIFF.open
yang-521 opened this issue · 46 comments
Is there any way to solve this problem
Could you provide some information about the issue? Any information at all? What error(s) are you seeing?
Whether or not this can be fixed comes down to if the error is from python (this pylibtiff package) or from the libtiff C library underneath.
This problem comes from pylibtiff You cannot use TIFF.open to open a file as long as the path contains Chinese
`from libtiff import TIFF
file_path = r"D:\python\tifpix\中文\2015-1-013-01.tif"
imgdir = TIFF.open(file_path)
print(imgdir)
`
Such an error occurs
TIFFOpen: D:\python\tifpix\中文\2015-1-013-01.tif: Cannot open.
Traceback (most recent call last):
File "D:\python\tifpix\tif11.py", line 4, in
imgdir = TIFF.open(file_path)
File "D:\python\tifpix\venv\lib\site-packages\libtiff\libtiff_ctypes.py", line 484, in open
raise TypeError('Failed to open file ' + repr(filename))
TypeError: Failed to open file b'D:\python\tifpix\\xe4\xb8\xad\xe6\x96\x87\2015-1-013-01.tif'
If the path or file name contains Chinese, it cannot be parsed. If there is no Chinese, the file can be opened normally
Could you try:
file_path = "D:\python\tifpix\中文\2015-1-013-01.tif"
?
I am unable to reproduce this on an Ubuntu system with these exact characters copied and pasted. I tried putting the characters in the filename and in a directory name. I tried with raw strings (r""
) and without the r
. Note that I was typing these exact unicode characters into my ipython session so as far as python was concerned it was normal unicode.
你能不能试试:
file_path = "D:\python\tifpix\中文\2015-1-013-01.tif"?
I tried, but it didn't work
I checked the TIFF. open function. It seems that there is a problem with the coding method. I tried to change "mode. encode (" ascii ")" to "mode. encode (" utf-8 ")", but it has no effect
I tried, but it didn't work
It failed with the exact same error?
If you're talking about this encode line:
pylibtiff/libtiff/libtiff_ctypes.py
Line 540 in b67eae3
Then this is only controlling/changing the mode
parameter, not the filename. The main part of the code is here:
pylibtiff/libtiff/libtiff_ctypes.py
Lines 527 to 538 in b67eae3
It is using this function to convert the unicode to a series bytes that it can pass to libtiff (C) and access the file on the file system:
https://docs.python.org/3/library/os.html#os.fsencode
What do you get when you run this function:
https://docs.python.org/3/library/sys.html#sys.getfilesystemencoding
But also, are you sure this is a valid file? Are you able to run (on a command line somewhere) tiffinfo your_file.tif
? Depending on the size of the file can you give us a link it so we can test it out?
This is either a problem with your file, your system/environment, or libtiff/pylibtiff on Windows.
I am sure it is a valid TIF file, because if the file name and path do not contain Chinese, I can use any libtiff function for this file. This problem was found when I was traversing a folder. After testing, it was really impossible to open the path or file name containing Chinese. My system environment is WIN10, and I use Pycharm
I have tried to use TiffFile, which can read the path or file name containing Chinese
from libtiff import TiffFile file_path = r"D:\python\tifpix\中文\2015-1-013-01.tif" imgdir = TiffFile(file_path) print(imgdir.get_info())
But TiffFile can't meet my needs, so I gave up using it.
你能不能试试:
file_path = "D:\python\tifpix\中文\2015-1-013-01.tif"?
我试过了,但是没用
Same mistake
Can you run this:
python -c "import sys; print(sys.getfilesystemencoding())"
and let us know what the output is?
You could also try running:
import os
print(os.fsencode(file_path))
As for the TiffFile stuff, that might not be a good test since that is pure python as far as opening the file. Something about the encoded filename as bytes being passed to the C TIFF library isn't going well.
Like I said in my previous comment, you could run tiffinfo your_file.tif
on a command line if you can figure out where tiffinfo is installed (not sure how this is installed on Windows).
你能运行这个吗:
python -c "import sys; print(sys.getfilesystemencoding())"
让我们知道输出是什么
utf-8
print(os.fsencode(file_path))
b'D:\python\tifpix\\xe4\xb8\xad\xe6\x96\x87\2015-1-013-01.tif'
I'm not sure if "tiffinfo" is the "info" function. Here are the results
from libtiff import TIFF file_path = r"D:\python\tifpix\tif\2015-1-013-01.tif" pic = TIFF.open(file_path,mode="r") inf = TIFF.info(pic) print(inf)
This is the result of my modification of Chinese. If Chinese is included, TIFF file cannot be opened
export
filename: b'D:\python\tifpix\tif\2015-1-013-01.tif'
ImageWidth: 2240
ImageLength: 3112
RowsPerStrip: 3112
StripByteCounts: c_ulong(194677)
StripOffSets: c_ulong(8)
TileByteCounts: c_ulong(194677)
TileOffSets: c_ulong(8)
BitsPerSample: 8
Compression: COMPRESSION_JPEG
PhotoMetric: PHOTOMETRIC_YCBCR
PlanarConfig: PLANARCONFIG_CONTIG
ResolutionUnit: 2
JPEGQuality: 75
JPEGTablesMode: 3
XResolution: 300.0
YResolution: 300.0
ReferenceBlackWhite: [0.0, 255.0, 128.0, 255.0, 128.0, 255.0]
tiffinfo
is a command line tool that you would run from a terminal/console, not from Python. It looks like .info
gives the same information, but I was hoping you could try directly from the command line tool because that would make it obvious whether this is C tifflib or pylibtiff.
I'll have to think about the fsencode stuff.
On my own machine:
In [10]: file_path = "/tmp/中文/2015-1-013-01.tif"
In [11]: os.fsencode(file_path)
Out[11]: b'/tmp/\xe4\xb8\xad\xe6\x96\x87/2015-1-013-01.tif'
In [12]: os.path.exists(os.fsencode(file_path))
Out[12]: True
I wonder what you get if you do the os.path.exists(os.fsencode(file_path))
line?
在我自己的计算机上:
In [10]: file_path = "/tmp/中文/2015-1-013-01.tif" In [11]: os.fsencode(file_path) Out[11]: b'/tmp/\xe4\xb8\xad\xe6\x96\x87/2015-1-013-01.tif' In [12]: os.path.exists(os.fsencode(file_path)) Out[12]: True
我想知道如果你做了
os.path.exists(os.fsencode(file_path))
线?
export:
True
As for "tiffinfo", I'll try how to use command line testing first, which I rarely do
tiffinfo
是一个命令行工具,可以从终端/控制台运行,而不是从Python运行。.info
提供了相同的信息,但我希望您可以直接从命令行工具尝试,因为这将使它明显是C tifflib还是pylibtiff。我得想想fsencode的事。
The following is the result obtained with the command line tool
tiffinfo.cmd -r 2015-1-013-01.tif
export:
TIFF Directory at offset 0x2f87e (194686)
Image Width: 2240 Image Length: 3112
Resolution: 300, 300 pixels/inch
Bits/Sample: 8
Compression Scheme: JPEG
Photometric Interpretation: YCbCr
YCbCr Subsampling: 2, 2
Samples/Pixel: 3
Rows/Strip: 3112
Planar Configuration: single image plane
Reference Black/White:
0: 0 255
1: 128 255
2: 128 255
leave out.................
TIFF Directory at offset 0x5cf5e8 (6092264)
Image Width: 2188 Image Length: 3100
Resolution: 300, 300 pixels/inch
Bits/Sample: 8
Compression Scheme: JPEG
Photometric Interpretation: YCbCr
YCbCr Subsampling: 2, 2
Samples/Pixel: 3
Rows/Strip: 3100
Planar Configuration: single image plane
Reference Black/White:
0: 0 255
1: 128 255
2: 128 255
Sorry, I really thought I responded to this already. I'm not sure what happened to my comment.
Would it be possible for you to rerun the tiffinfo command, but give it the whole directory with the Chinese characters in it?
Could you also do:
import sys
print(sys.getdefaultencoding())
Right now it seems like your file system is utf-8, your python interpreter it using utf-8, and Python knows that the file exists (that os.path.exists
check you ran before returned True
). So I don't see anything that should be causing an issue. 😕
Sorry, I really thought I responded to this already. I'm not sure what happened to my comment.
Would it be possible for you to rerun the tiffinfo command, but give it the whole directory with the Chinese characters in it?
Could you also do:
import sys print(sys.getdefaultencoding())Right now it seems like your file system is utf-8, your python interpreter it using utf-8, and Python knows that the file exists (that
os.path.exists
check you ran before returnedTrue
). So I don't see anything that should be causing an issue. 😕
After changing the path and file name to Chinese, I rerun the command-line tool and get the same result.
`import sys
print(sys.getdefaultencoding())`
export: utf-8
It looks like libtiff (the C library) has TIFFOpen
which we're using, but also a TIFFOpenW
for opening filenames that are unicode:
http://www.simplesystems.org/libtiff//functions/TIFFOpen.html#description
At this point I'm not sure how the code would need to be updated. It is very strange to me that this seems to work just fine on non-Windows.
It looks like this issue cannot be resolved by pinpointing a particular problem and fixing it.
However, here follows an idea for a possible workaround where TIFF.open
would contain(untested):
tiff = libtiff.TIFFOpen(filename, mode.encode('ascii'))
if tiff.value is None and os.name == 'nt' and os.path.exists(filename):
# see gh-152
import tempfile
with tempfile.NamedTemporaryFile() as tmp:
tmp.write(open(filename).read())
tmp.flush()
tiff = libtiff.TIFFOpen(tmp.name, mode.encode('ascii'))
# TODO: add a hook to remove tmp.name when tiff closes
if tiff.value is None:
raise TypeError('Failed to open file ' + repr(filename))
return tiff
It looks like this issue cannot be resolved by pinpointing a particular problem and fixing it.
However, here follows an idea for a possible workaround where
TIFF.open
would contain(untested):tiff = libtiff.TIFFOpen(filename, mode.encode('ascii')) if tiff.value is None and os.name == 'nt' and os.path.exists(filename): # see gh-152 import tempfile with tempfile.NamedTemporaryFile() as tmp: tmp.write(open(filename).read()) tmp.flush() tiff = libtiff.TIFFOpen(tmp.name, mode.encode('ascii')) # TODO: add a hook to remove tmp.name when tiff closes if tiff.value is None: raise TypeError('Failed to open file ' + repr(filename)) return tiff
I tried but another error occurred
file_path = r"D:\python\tifpix\中文\2015-1-013-01.tif"
#I've tried both of them
file_path = "D:\python\tifpix\中文\2015-1-013-01.tif"
ctypes.ArgumentError: argument 1: <class 'TypeError'>: wrong type
Where exactly the ArgumentError is raised?
Btw, one might need to use tempfile.NamedTemporaryFile(delete=False)
to avoid deleting the tmp file when leaving the with
block.
ArgumentError究竟在哪里引发?
顺便说一句,你可能需要使用
tempfile.NamedTemporaryFile(delete=False)
以避免在离开时删除tmp文件with
块。
tiff = libtiff.TIFFOpen(filename, mode.encode('ascii'))
This location
This does not make sense. The line
tiff = libtiff.TIFFOpen(filename, mode.encode('ascii'))
is the original line and according to the issue description it should succeed, otherwise
raise TypeError('Failed to open file ' + repr(filename))
TypeError: Failed to open file b'D:\python\tifpix\\xe4\xb8\xad\xe6\x96\x87\2015-1-013-01.tif'
would not be reached.
Could you check if filename
and mode
types are valid?
Re:
file_path = "D:\python\tifpix\中文\2015-1-013-01.tif"
Notice that the substring \t
is a tab, not \t
.
Try:
file_path = "D:\\python\\tifpix\\中文\\2015-1-013-01.tif"
这说不通,这条线
tiff = libtiff.TIFFOpen(filename, mode.encode('ascii'))
是原始行,并且根据问题描述,它应该成功,否则
raise TypeError('Failed to open file ' + repr(filename)) TypeError: Failed to open file b'D:\python\tifpix\\xe4\xb8\xad\xe6\x96\x87\2015-1-013-01.tif'
将无法到达。
你能查一下
filename
以及mode
类型是否有效?
The following is the code I tested
from libtiff import TIFF file_path = r"D:\python\tifpix\中文\2015-1-013-01.tif" imgdir = TIFF.open(file_path, mode="r") print(imgdir)
from libtiff import TIFF file_path = "D:\\python\\tifpix\\中文\\2015-1-013-01.tif" imgdir = TIFF.open(file_path, mode="r") print(imgdir)
Are all the same mistakes Strange Questions
I tried to re establish an environment for testing, but I couldn't complete the installation through pip.
“Building wheels for collected packages: pylibtiff
Building wheel for pylibtiff (pyproject.toml) ... error”
Then I remembered that the installation was completed through “.whl” files before
I don't know if there is a problem with the installation, but I can be sure that I can open a TIFF file without a Chinese path and read the data.
@yangyunlv What version of python are you using and what version of pylibtiff?
@pearu Theoretically we should be able to take the python str
filename and instead of using TIFFOpen
use TIFFOpenW
with a wchar_t *
. I've never used ctypes to convert python strings, but singe all str
objects are unicode it might just make more sense in the long run to use wchar_t
completely.
@yangyunlv What version of python are you using and what version of pylibtiff?
@pearu Theoretically we should be able to take the python
str
filename and instead of usingTIFFOpen
useTIFFOpenW
with awchar_t *
. I've never used ctypes to convert python strings, but singe allstr
objects are unicode it might just make more sense in the long run to usewchar_t
completely.
python3.8.10 pylibtiff-0.4.4-cp38-cp38-win_amd64.whl
I don't expect this to change anything, but could you try installing the newest version of pylibtiff (0.5.1)?
I don't expect this to change anything, but could you try installing the newest version of pylibtiff (0.5.1)?
After updating to version 0.5.1, "libtiff. dll" is missing. I am not sure where to obtain this file.
The error is as follows:“Could not find module 'libtiff.dll' (or one of its dependencies). Try using the full path with constructor syntax.”
How are you installing it and were there any errors during the installation?
您是如何安装的?安装过程中是否出现错误?
Through pip installation, an error was reported during the first installation due to the low version of Visual C++Build Tools. After the update, the pip installation was repeated without any errors
After the update, the pip installation was repeated without any errors
Ok so now your installation works and you are able to run your pylibtiff code? And you get the same error as before with the filename with Chinese characters in it?
After the update, the pip installation was repeated without any errors
Ok so now your installation works and you are able to run your pylibtiff code? And you get the same error as before with the filename with Chinese characters in it?
Although the update process did not prompt an error, the module cannot function properly now, and the dependency libtiff.dll" "is missing.". I have tried to redeploy the environment and install modules using a new computer and virtual machine, and this problem occurs. The 0.4.4 version does not have this problem
Could you please copy/paste the command you are using to install the package "pylibtiff" and the output of running that command? You may want to try uninstalling it and trying to install it again to see what the output is.
The error about the low version of the C++ build tools is why the .dll
is missing (most likely). My guess is the build is being "cached" so when you run the install command again it doesn't try to compile the extension because it thinks it already has. However, after looking at it, there shouldn't be a libtiff.dll
created by the installation of pylibtiff. The error is actually coming from pylibtiff trying to find the C libtiff library.
In version 5.x of pylibtiff this pull request was added:
In this pull request, changes were made to check for libtiff (C) with the newer name for the library tiff.dll
, but it should still check for libtiff.dll
(as your error message suggests). So...there must be something else going on here. Very confusing.
Could you please copy/paste the command you are using to install the package "pylibtiff" and the output of running that command? You may want to try uninstalling it and trying to install it again to see what the output is.
The error about the low version of the C++ build tools is why the
.dll
is missing (most likely). My guess is the build is being "cached" so when you run the install command again it doesn't try to compile the extension because it thinks it already has. However, after looking at it, there shouldn't be alibtiff.dll
created by the installation of pylibtiff. The error is actually coming from pylibtiff trying to find the C libtiff library.In version 5.x of pylibtiff this pull request was added:
In this pull request, changes were made to check for libtiff (C) with the newer name for the library
tiff.dll
, but it should still check forlibtiff.dll
(as your error message suggests). So...there must be something else going on here. Very confusing.
Thanks, bro.
I'm gonna try to get it up and running
@yangyunlv @pearu @djhoese
Hello, I fixed this bug and open a PR #160
This issue only occurs on Windows, and Linux does not have this issue.
Because os.fsencode uses the wrong encoding.
It works on my system. Windows11 + Python 3.11
sys.getfilesystemencoding() # return 'utf-8' , wrong encoding
locale.getpreferredencoding() # return 'cp936', correct encoding
Thanks @One-sixth! I'm curious though, would this be considered a bug in Python's getfilesystemencoding()
?
@djhoese I'm not sure. Because there are some descriptions in the manual.
https://docs.python.org/3.11/library/sys.html#sys._enablelegacywindowsfsencoding
Here is windows11 + python3.11 output.
import sys
sys.getfilesystemencoding() # return 'utf-8'
sys._enablelegacywindowsfsencoding()
sys.getfilesystemencoding() # return 'mbcs'
I wonder if defining PYTHONIOENCODING=cp936
environment variable is equivalent to the fix in #160?
@yangyunlv @pearu @djhoese Hello, I fixed this bug and open a PR #160 This issue only occurs on Windows, and Linux does not have this issue. Because os.fsencode uses the wrong encoding.
It works on my system. Windows11 + Python 3.11
sys.getfilesystemencoding() # return 'utf-8' , wrong encoding locale.getpreferredencoding() # return 'cp936', correct encoding
Thank you very much. After modification, the Chinese path can be read normally
@pearu
Update. I just tried setting PYTHONIOENCODING=cp950, but it doesn't seem to have any effect.
test code
import locale, sys
s = '路径'
print(locale.getpreferredencoding())
print(sys.getfilesystemencoding())
print(s.encode('mbcs') == s.encode(locale.getencoding()))
test out
cp936
utf-8
True
I don't think so. CP936 is the encoding for simplified chinese windows. There are also traditional chinese CP950 and japanese CP932.
I found another method to use ‘mbcs’ encoding.
The 'mbcs' can dynamically links to active system code pages.
In simplified chinese windows
mbcs==cp936
In traditional chinese windows
mbcs==cp950
In japanese windows
mbcs==cp932
Like this (simplified chinese windows).
s = '路径'
print(s.encode ('mbcs') == s.encode('cp936')) # True