downloading text files with utf-8 encoding

Question

downloading text files with utf-8 encoding

Schleissheimer-Stieglitz opened this issue 5 years ago · 8 comments

Schleissheimer-Stieglitz commented 5 years ago

Downloading an text file with utf-8 encoding from Artifactory doesn’t work correctly.
If I try it with an example file with the following content:

This is a test.
This file is encoded in utf-8.
Some special character: äöüß

the downloaded file contains some strange symbols instead of the content which it should contain.
Content of downloaded file:

�‹�      ��ÉÈ,V ¢D…’Ôâ�=^®��@ZfN*H45/9?%5E!3O¡´$M×�(�œŸ›ªP\�šœ™˜£�œ‘X”˜\’Zd¥pxÉám‡÷�žÏË� šbçhS

As workaround I added in artifactory.py this function to the class _ArtifactoryAccessor

def writeto(self, fd, out):
    url = str(fd)
    res = fd.session.get(url, stream=True, verify=True, cert=None)
    if res.status_code != 200:
        raise RuntimeError(res.status_code)        
    for chunk in res.iter_content(chunk_size=256):
        if chunk:
            out.write(chunk)

and this function to the class ArtifactoryPath.

def writeto(self, out):
    self._accessor.writeto(self, out)

Now I use

with open(dest, "wb") as out:
    path.writeto(out)

instead of

with path.open() as fd:
    with open(dest, "wb") as out:
        out.write(fd.read())

to download the file and it works fine for me.

Answer 1 · 2020-01-06T20:29:19.000Z

I'm not able to reproduce this using dohq-artifactory==0.7.297 on top of Python 3.7.2 and Artifactory Version 6.14.0. Test was ran on Windows 10 .

Using the default download example, just modified to point to the test file works ok.

from artifactory import ArtifactoryPath

path = ArtifactoryPath(
    "http://sampleaf/artifactory/testrepo-local/testfile.txt"
)

with path.open() as fd:
    with open("testfile.txt", "wb") as out:
        out.write(fd.read())

@Schleissheimer569, after uploading the file to Artifactory, if you download it through the browser does it maintain the correct encoding?

Answer 2 · 2020-01-10T10:20:40.000Z

If i download the file through the browser it works correctly.

I used the default download example, too.

I used dohq-artifactory==0.7.311, Python 3.7.4 and Artifactory Version 6.16.0 on Windows 10.

Answer 3 · 2020-02-27T10:26:24.000Z

I had a different issue which @Schleissheimer569 solution solved.
For me, on Python 3.8 (not 3.7 for some reason), some files would give this error when downloading:

File "C:\Python38\lib\site-packages\urllib3\response.py", line 440, in read
     data = self._fp.read()
   File "C:\Python38\lib\http\client.py", line 467, in read
     s = self._safe_read(self.length)
   File "C:\Python38\lib\http\client.py", line 608, in _safe_read
     data = self.fp.read(amt)
   File "C:\Python38\lib\socket.py", line 669, in readinto
     return self._sock.recv_into(b)
   File "C:\Python38\lib\ssl.py", line 1241, in recv_into
     return self.read(nbytes, buffer)
   File "C:\Python38\lib\ssl.py", line 1099, in read
     return self._sslobj.read(len, buffer)
OverflowError: Python int too large to convert to C long

Implementing Schleissheimer569 's workaround solved it.
It would be nice if there was a build in options to use chunks instead of a direct stream download.

Answer 4 · 2020-03-03T11:26:52.000Z

Probably, we could add these functions to our library, why not. It looks useful.

Could someone create a PR with these changes? I think we also should to add ability to customize chunk_size.

Answer 5 · 2020-12-15T19:51:30.000Z

I have same issue as topic starter.
Trying to download YAML artifact and getting strange set of symbols.

Python 3.7.9 // dohq-artifactory==0.7.468 // Artifactory from Jfrog Cloud

Answer 6 · 2020-12-15T20:11:05.000Z

What's more interesting: if I rename file.yaml file to smth like file.rpt everything become working.
Seems like Artifactory reports text/yaml files in some strange form

Answer 7 · 2020-12-15T20:53:14.000Z

#204 @allburov @fuzzmz

Answer 8 · 2021-07-07T11:13:40.000Z

@allburov
please close this issue, fixes are available in #204 and potentially #242 could be also used