UniversalDependencies/tools

Is conllu-stats.py Python 3 compatible?

Closed this issue · 8 comments

lauma commented

Because of validate.py I updated my system to Python 3. After that conllu-stats.py started to fail with this error message:

  File "conllu-stats.py", line 132
    print json.dumps(d)
             ^
SyntaxError: invalid syntax

Does this mean conllu-stats.py still need Python 2.7? Is there a way I can avoid needing two different Python versions to make UD release?

Does this mean conllu-stats.py still need Python 2.7?

I suspect that it might be the case. Do you need the program to prepare your data for the release? I do not use it.

lauma commented

Statistics for readme. Is there some other way to get them?

As a part of the release process, I generate (or update) the file stats.xml in every treebank, so I do not think it is necessary to also have statistics in the README file. One shortcoming is that stats.xml currently does not break up the numbers for train, dev and test (which I think conllu-stats.py does) but I am planning to add it. The statistics in stats.xml are generated using conllu-stats.pl (a Perl program, also in the tools repository).

lauma commented

I seem to remember that release checklist once suggested using it, but looks like it is not the case anymore. Okay, if I don't need it for the readme, then I'll just won't use it and thus won't need Python 2. Thanks!

You are right, it used to be on the checklist. I removed it some time ago because I realized it was no longer needed.

Not sure if it's used, but I'll upgrade it to Py3 one of these days. Should be simple.

Not sure if it's used, but I'll upgrade it to Py3 one of these days. Should be simple.

@fginter , that'll be nice. I would like to use the script to get the statistics of the dev version of my treebank.

Not sure if it's used, but I'll upgrade it to Py3 one of these days. Should be simple.

@fginter , that'll be nice. I would like to use the script to get the statistics of the dev version of my treebank.

You can also use conllu-stats.pl, which produces much more information:

conllu-stats.pl *.conllu > stats.xml ; git diff stats.xml

In case you meant UD_Nheengatu-CompLin, I just ran the script, updated the statistics there (in the dev branch) and pushed the change to Github.