Python 3 Support
lxp opened this issue ยท 41 comments
Python 2 will be end-of-life in 2020 (https://www.python.org/dev/peps/pep-0373/#update), so we have to add support for Python 3.
Is Python 2 still needed? If not I can try port it.
I would be happy to have your help for porting to Python 3. The main goal is to have it running on Python 3. As a bonus it would be great, if it still runs on Python 2.7 with compatibility layers like python-future [1].
I already gave it a try last year using futurize
[2] and manual fixes. I have now pushed my work in the python3 branch. You can use this as a starting point, if you want. However, the test cases do not pass yet.
To run the unit and integration tests, you can use the following commands:
~/cfv/test$ python3 test.py --unit --exit-early
*** testing all_unittests_suite
test_lchoplen_compose (test_strutil.chopTestCase) ... ok
test_lchoplen_simple (test_strutil.chopTestCase) ... ok
test_lchoplen_wide (test_strutil.chopTestCase) ... ok
test_rchoplen_compose (test_strutil.chopTestCase) ... ok
test_rchoplen_simple (test_strutil.chopTestCase) ... ok
test_rchoplen_wide (test_strutil.chopTestCase) ... ok
test_compose_noprecombined (test_strutil.uwidthTestCase) ... ok
test_halfwidth (test_strutil.uwidthTestCase) ... ok
test_nonspacing (test_strutil.uwidthTestCase) ... ok
test_simple (test_strutil.uwidthTestCase) ... ok
test_wide (test_strutil.uwidthTestCase) ... ok
test_get_path_key (test_caching.AbsPathKeyTest) ... ok
test_rename (test_caching.AbsPathKeyTest) ... ok
test_nocase_findfile (test_caching.RelPathKeyTest) ... ok
test_nocase_findfile_parent (test_caching.RelPathKeyTest) ... ok
path_split (cfv.osutil)
Doctest: cfv.osutil.path_split ... ok
strippath (cfv.osutil)
Doctest: cfv.osutil.strippath ... ok
lchoplen (cfv.strutil)
Doctest: cfv.strutil.lchoplen ... ok
rchoplen (cfv.strutil)
Doctest: cfv.strutil.rchoplen ... ok
human_int (benchmark)
Doctest: benchmark.human_int ... ok
----------------------------------------------------------------------
Ran 20 tests in 0.072s
OK
.OK (False)
~/cfv/test$ python3 test.py --exit-early
>>> testing...
...................
>>> failed test: ~/cfv/test/cfv -ZNVRMUI --unquote=no --fixpaths="" --strippaths=0 --showpaths=auto-relative --progress=no --announceurl=url --noprivate_torrent -r -p /tmp/tmpgudzgbaz --renameformat="%(name)s-%(count)i%(ext)s" -C -t sha512
/tmp/tmpgudzgbaz/test.ext.end : undecodable filename
/tmp/tmpgudzgbaz/test2.foo : undecodable filename
/tmp/tmpgudzgbaz/test3 : undecodable filename
/tmp/tmpgudzgbaz/d2/test4.foo : undecodable filename
4 files, 0 OK, 4 file errors. 0.002 seconds, 0.0K/s
FAILED (16)
File "test.py", line 1890, in <module>
failed += all_tests()
File "test.py", line 1636, in all_tests
ren_test(fmt)
File "test.py", line 914, in ren_test
test_generic('%s -C -t %s' % (cmd, f), cfv_test)
File "test.py", line 287, in test_generic
test_log_results(cfvtest.cfvenv + cfvtest.cfvfn + ' ' + cmd, s, o, r, kw)
File "test.py", line 272, in test_log_results
test_log_finish(cmd, s, r, o, kw)
File "test.py", line 255, in test_log_finish
traceback_str = '\n'.join(traceback.format_stack())
[1] https://python-future.org/
[2] https://python-future.org/futurize.html
It's been quite tedious so far, and I reckon having it working with Python 3 is more than enough. Got most of the tests working now, just 32 left!
I'm not sure if I'll continue porting this however, the code doesn't look all that well written from reading it. Maybe I'm better off using other utilities.
But would be happy if you can continue where I left off.
Thanks for your effort! I am sad to hear that you stopped your porting work for now. However, I think you already got a lot further with your port, than my last try. I know the code base is old and not easy to port. Did you find any major show stopper that requires a lot of re-work?
I do not know about any similar tool with that many features, that's why I am trying to keep cfv alive. If you find some other tool with a similar feature-set, I am happy to hear about it myself.
I will give your branch a try, once time allows it.
If you decide to continue your work, you are welcome of course :)
Well the tool has good intentions but it just isn't that well written. This makes me rather nervous in using it. The main problem I saw was that it treated strings and bytes the same, and lots of functions were called assuming they were the same thing. Files were opened in both 'w'
and 'wb'
and passed to functions as if they were the same thing.
The torrent files were parsed in a rather non-documented way, with the torrent file specifying its own encoding. I somehow got those tests to pass.
I agree there aren't any similar tools, but there are individual tools for their own file formats, like cksfv
for cfv and md5sum
, sha1sum
for the hash files, aria2c
for torrent files, etc. I'm gonna just use the individual tools.
But yes I can continue a bit more today see if I can make more progress. I don't intend on using the tool though.
Alright got the tests down to 26 failing ones. The remaining ones all appear to be due to a similar reason. I'm not quite sure what the test does however:
>>> failed test: stdin/out of -ZNVRMUI --unquote=no --fixpaths="" --strippaths=0 --showpaths=auto-relative --progress=no --announceurl=url --noprivate_torrent -tbsdmd5 -C -f- with file data4
MD5 (data4) = 96d879de0782e286d4031a8de9e351f2
-: 1 files, 1 OK. 0.000 seconds, 19038674702.3K/s
- : undecodable filename
1 files, 0 OK, 1 file errors. 0.000 seconds, 0.0K/s
FAILED (3) ((0, 16))
Perhaps you could debug it with both Python 2 & 3 and see how it's failing?
I might stop working on it here on, since doubt I'm gonna use this in the future anyway.
Thanks for your hard work! I will look into it, when I have some free time.
Just a heads up - cfv is marked for autoremoval from testing on 2020-04-17
because of the python2 removal (https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=936288).
I guess we can re-add the package later on unless the current state of the py3 port is complete enough?
Thanks for this heads up!
Sadly, it is unrealistic that there will be a working py3 version until 2020-04-17.
However, my plan is still to release a Python 3 version.
Is there a way to extend the deadline and keep the py2 version a bit longer in Debian?
cfv isn't going to be removed entirely from Debian by 2020-04-17, but only from testing, see https://en.wikipedia.org/wiki/Debian#Branches
As such. it won't be part of the next stable release unless it gets ported to Python 3. If a port is complete and upload to Debian by end of the year, that will still allow it to re-enter for the next release.
Sorry, don't think so. It would block removal of python2. Anyway, lets just ITP it once py3 version is functional enough. By "functional", I think we can disable certain functions if they are blocking the porting progress (i.e. perhaps fixing some uncommon checksum files can be postponed to later if there are py3 problems with those).
As cfv
is no longer present in Ubuntu 20.04 LTS (and I also removed pyhton2 for good, finally): Any recommendations for similar working alternatives?
@bentolor Some alternatives I use are md5sum
, sha1sum
, cksfv
. For torrents I use aria2c
to verify checksums.
@bentolor Some alternatives I use are
md5sum
,sha1sum
,cksfv
. For torrents I usearia2c
to verify checksums.
I tried cksfv
a few days ago but in checking my (very) old sfv files on CD and some DVD, it doesn't find the content of the subdirectories and it gives error even if the file exists and is intact.
md5sum
and sha1sum
in my case are not useful since I used a different hash algoritm.
@bentolor @stevenxxiu @Clover2k rhash with -rc
does more or less the same thing as cfv, performance is good and it supports most hash formats including bittorrent. It lacks cfv's -m
though.
Hi, just thought some background might be helpful, or at least interesting. cfv started back in the Python 1.5.2 days before unicode string was a thing. I was working on converting it to use the Python2 unicode support but it might have been in a weird state when I stopped. If I recall, the reason for passing around byte/unicode strings equivalently was to try to handle invalid filename encodings. At least in Python 2, I recall that if filenames didn't decode properly with the current locale, they would be returned as a byte string instead of a unicode string. I don't know how Python 3 handles this. That might also be the reasoning behind the non-standard torrent handling that was mentioned, I don't recall that.
Now, whether these attempts to handle invalid encodings was wise or not is another question...
I also apologize for the inscrutable tests.
@matt-mueller Thanks for sharing these insights. I would be happy if you could join the project here at Github. I think Github encourages people more to contribute to the project than Sourceforge does.
I still plan to have a working cfv version for Python 3 at some point. It is just not progressing that fast because at least for myself the pain of still having Python 2 around was not that big.
I tried the cfv-python3.zip file with code from 05-24-2020 (latest) with Python 3.8 on FreeBSD 12.2, and it appears that there are some new incompatibilities. Trying to use cfv as-is results in:
TypeError: a bytes-like object is required, not 'str'
I'm not really familiar with Python programming, but a wholesale replacement of open mode 'rb' with mode 'r' results in:
TypeError: can't concat str to bytes
Not that it would be a workable solution, since that would probably cause problems with newlines on at least some platforms.
Can someone suggest a path forward, or at least things for me to try? I don't think trying to futurize the code again would work as there were additional changes made after futurizing. I've looked at the alternatives and none of them seem appealing:
cksfv - doesn't create, only verifies
bsdsfv - limited to 1024 files for some reason
pure-sfv - doesn't seem to display progress information, even with -v
Sadly, Python 3 support is still in an incomplete state. I started the Python 3 port some time ago in the python3
branch. In the meantime @stevenxxiu continued the work in pull request #11. Some tests are still failing, so it is not ready for release yet. However, you could give it a try.
If you invest some time testing, I would appreciate it if you could report back your findings in pull request #11.
Also further contributions are welcome :)
I was able to build and install the code from there. A quick test of functionality seems to be working. What is the method to run the tests without installing external test scaffolding? A simple:
python3.8 setup.py install test
ends with:
running test
WARNING: Testing via this command is deprecated and will be removed in a future
version. Users looking for a generic test entry point independent of test runner
are encouraged to use tox.
I am very encouraged by the progress made by others so far and would like to help. Also, should the various changes made to the cfv-project/cfv master branch since the python3 fork be looked at and possibly cherry-picked for inclusion in the stevenxxiu/cfv python3 branch?
You can run the tests via python3 test/test.py
. You can specify a few options (see --help
) but most important right now are the tests that are run without options. You can also take a look at the Travis CI pipeline for a few hints: https://github.com/cfv-project/cfv/blob/master/.travis.yml
I took some time to work through some changes made by @stevenxxiu. I ended up cherry-picking some fixes and also implementing some fixes differently (see my recent push to python3 branch). I only worked through to commit b69e34a, so @stevenxxiu's branch still has a lot more fixes. However, I want to carefully review them, as a few fixes are suboptimal, because they do not fix the root-cause, which most of the time was introduced by my previous porting effort.
The changes from the master branch are currently not that important for the python3 branch, as they are only changing the CI pipeline from Travis CI to Github Actions. Of course, in the end we want to have those changes in the python3 version, but there is still a long way to go. In the python3 branch we are still struggling to have the test suite not crash... I am not even talking about having all tests pass.
cksfv - doesn't create, only verifies
cksfv
both creates and verifies. See https://man.archlinux.org/man/cksfv.1#EXAMPLES.
I made a lot of progress in the last days. All normal testcases pass now, at least on my Linux system. However, I haven't tried to use it in real-world scenarios yet. Also there seem to be some issues with the long tests.
@Terry-Kennedy I think it would be a good opportunity, if you could try the python3 branch on your system. Please also run the test suite with something like python3 test/test.py
and report back any issues, preferably in new Github issues.
I was finally able to get back to this today (a combination of work and vacation was interfering). It looks pretty good in summary:
tests finished: ok: 3167 failed: 1
which is a vast improvement over my previous testing (where the test scaffolding itself fails after reporting a bunch of individual failed tests).
The one failure in both "testing" and "testing without mmap" is:
failed test: /sysprog/terry/cfv-python3/test/cfv -ZNVRMUI --unquote=no --fixpaths="" --strippaths=0 --showpaths=autorelative --progress=no --announceurl=url --noprivate_torrent --strippaths=none -T -f teststrip-none.csv4
/data1 : file size does not match (13!=33)
teststrip-none.csv4: 1 files, 0 OK, 1 badsize. 0.000 seconds, 0.0K/s
FAILED (4)
File "test/test.py", line 1876, in
File "test/test.py", line 1719, in all_tests
File "test/test.py", line 269, in test_generic
File "test/test.py", line 254, in test_log_results
File "test/test.py", line 237, in test_log_finish
I also get a bunch of:
max open files is big (3771108) clipping to 4096. Use --long to try the real value
which I assume is just because I'm running this on an account with no limits at all.
This is with Python 3.8.11 on FreeBSD 12.2-STABLE r370177 if that matters.
Thanks to everyone for your hard work on this! With a little further testing we should be able to get this back into the FreeBSD ports tree.
I decided to create a FreeBSD port in case anyone wants to try this. This is not an official port although I plan on submitting it once we're all happy with with the state of the python3 branch.
@Terry-Kennedy Thank you for testing.
I think the failure is just a hiccup with the test suite on your system. Basically, the test suite assumes that /data1
does not exist. However, it seems to exist on your system, which leads to this unexpected error. Can you verify this?
The clipping of max open files is a feature of the test suite, it keeps the test duration and resource usage within bounds. I would suggest you to reduce the max open files limit to <=4096 before running the tests (on Linux you can do this using ulimit -n 4096
, not sure how to do it on FreeBSD). If the max open files limit is clipped, the manyfiles_test
cannot really detect issues with leaking file descriptors. Alternatively, you could add the --long
option to also enable longer and more I/O intensive tests, which also runs the manyfiles_test
with your real limit. With your current limit the test will create more than 3771108 files.
Thanks for creating the FreeBSD port. I would be glad, if you could submit it, when the time has come :)
Yes, I do have a partition named /data1 (I'm the guy with 512TB in his dining room 8-):
Filesystem Size Used Avail Capacity Mounted on
rz1:/storage/data 83T 22T 61T 27% /data1
If I temporarily unmount it (and remove the mountpoint) the test passes. May I suggest changing the "/data1" reference (and any other similar references) in the tests to something like "/ZxCvB" that people are less likely to run into? OTOH, maybe things are just more 'fun' here at the Edge Case Saloon. (cf. "On Testing" by Bill Sempf)
Doing "ulimit -n 4096" lets those other tests pass without warning, so we're at 100% on FreeBSD 12.2. I see that a subsequent commit appears to address this.
I'll be glad to submit the port. It would be best to wait until we have something where it makes sense to tag it as cfv-3_0_0 or similar - the port framework gets upset if the checksum of the tarball it downloads doesn't match what is coded in the port. In fact, if I re-run the port build now it fails because the python3 branch changed since Thursday.
I believe that a bunch of the other *BSDs base their ports on FreeBSD's, so once the port lands back in FreeBSD it should be picked up by the others.
For the /data1
issue, please see #26.
The commit regarding the max open file limit does not change the code behavior. It just fixes the issue in the CI pipeline by calling ulimit -n 4096
. So you still have to do it on your local system too. You will most likely see now additional warnings that I added regarding external testing tools. They are not mandatory, but it makes sense to test cfv created checksum files against external tools like sha256sums
or cksfv
. Previously, the tests were silently skipped, if the tools were not available.
I still want to do some more practical testing before releasing 3.0.0. However, I could release 3.0.0-beta1, if that helps.
What I am also unsure about, if I want to have Windows support in the 3.0.0 release, just delay it to some later release or drop it at all. Are there any Windows users using cfv?
@Terry-Kennedy I just tagged and released version 3.0.0.dev0. I hope this helps with the FreeBSD port.
Request to reinstate the FreeBSD port submitted as PR 257546:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=257546
Committed to FreeBSD ports:
https://cgit.freebsd.org/ports/commit/?id=0377086a1f58526c3026ae470cd8191838b4c1f9
If anyone following this issue has contacts at any of the other *BSDs, feel free to ask them to pick this up from FreeBSD.
@Terry-Kennedy Thank you!
What's missing for releasing 3.0.0?
What's missing for releasing 3.0.0?
The only things missing for 3.0.0 is more testing, some minor code formatting changes (see #19) and fixing the PyPI project description formatting (https://pypi.org/project/cfv/3.0.0.dev0/).
I personally did not use the 3.0.0.dev0 pre-release in any real-world scenarios yet. Therefore, I am not confident enough to release 3.0.0.
If you want to give it a try, you can simply install it via pip. On Python 3, the 3.0.0.dev0 pre-release version is automatically selected, because no stable release is available yet. So you only need to run:
pip install cfv
Please report back any problems and also if it is just working for you :)
Version 3.0.0 is out now ๐ ๐ ๐ : https://github.com/cfv-project/cfv/releases/tag/v3.0.0
Thanks to everyone supporting the Python 3 port, especially @stevenxxiu!
@Terry-Kennedy This would be a good time to update the FreeBSD port :)
@alfs If you can help resurrecting the Debian package, please leave a comment here: #4
Closing here :)
Submitted: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=267449
โฆ
-- Terri
Thank you!
If you plan to keep the FreeBSD port up-to-date, you can also provide us some install instructions for inclusion in the README.md.
I thought about something like this:
### From OS-specific Repositories
#### FreeBSD
To install from FreeBSD port (maintained by @Terry-Kennedy):
```sh
...install cfv...
```
Thank you @Terry-Kennedy! I have updated the README.md with your instructions. If you find any mistake or want to improve the wording, just leave a comment or directly open a pull request :)