delph-in/pydelphin

mkprof not working on empty sentences with --delimiter="@"

Closed this issue · 6 comments

% delphin mkprof -v -v -i all.txt --delimiter "@" --relations ~/hpsg/logon/lingo/lkb/src/tsdb/skeletons/english/Relations --skeleton all
Traceback (most recent call last):
  File "/Users/ar/work/wn/venv/bin/delphin", line 8, in <module>
    sys.exit(main())
  File "/Users/ar/work/wn/venv/lib/python3.8/site-packages/delphin/main.py", line 42, in main
    args.func(args)
  File "/Users/ar/work/wn/venv/lib/python3.8/site-packages/delphin/cli/mkprof.py", line 38, in call_mkprof
    return mkprof(
  File "/Users/ar/work/wn/venv/lib/python3.8/site-packages/delphin/commands.py", line 360, in mkprof
    _mkprof_from_lines(
  File "/Users/ar/work/wn/venv/lib/python3.8/site-packages/delphin/commands.py", line 390, in _mkprof_from_lines
    tsdb.write(destination,
  File "/Users/ar/work/wn/venv/lib/python3.8/site-packages/delphin/tsdb.py", line 857, in write
    for record in records:
  File "/Users/ar/work/wn/venv/lib/python3.8/site-packages/delphin/commands.py", line 424, in _lines_to_records
    colmap['i-length'] = len(colmap['i-input'].split())
AttributeError: 'NoneType' object has no attribute 'split'

This is the file all.txt, I am trying to have the first column of the file mapped to the i-origin and the second column to the i-comment:

i-origin@i-comment@i-input
05882226-n-1@def@law stating that when two elements can combine to form more than one compound the amounts of one of them that combines with a fixed amount of the other will exhibit a simple multiple relation;
05883296-n-1@def@law stating that the entropy of a substance approaches zero as its temperature approaches absolute zero;
05883035-n-1@def@a law stating that mechanical work can be derived from a body only when that body interacts with another at a lower temperature; any spontaneous process results in an increase of entropy;
05880432-n-1@def@two laws governing electric networks in which steady currents flow: the sum of all the currents at a point is zero and the sum of the voltage gains and drops around any closed circuit is zero;
05883473-n-1@def@the law that if two bodies are in thermal equilibrium with a third body then the first two bodies are in thermal equilibrium with each other;
05882793-n-1@def@a law governing the relations between states of energy in a closed system;
05885622-n-1@def@one of three basic laws of classical mechanics;
05884433-n-1@def@one of two principles of heredity formulated by Gregor Mendel on the basis of his experiments with plants; the principles were limited and modified by subsequent genetic research;
05884736-n-1@def@members of a pair of homologous chromosomes separate during the formation of gametes and are distributed to different gametes so that every gamete receives only one member of the pair;

note also that verbose level is not increasing.

For logging verbosity: the -v option is shared by all commands, but for mkprof there are no logging calls in the command code nor in the underlying delphin.tsdb module, so the option doesn't do anything right now.

For the error you're seeing: I'm not able to reproduce it. Can you tell me what version you're running?

$ delphin mkprof -i all.txt --delimiter "@" --relations ~/grammars/erg-trunk/tsdb/skeletons/Relations --skeleton all
    9746 bytes	relations
    1614 bytes	item
$ delphin --version
delphin 1.5.1

I am running the last version. The problem is an empty i-input, but maybe we can try to have a better error handling. If you guide me suggesting how to add the loginng and better try/catch in the profile creation, I can try to make a PR.

Ok thanks, I see the error when an i-input field is empty.

If you guide me suggesting how to add the loginng and better try/catch in the profile creation, I can try to make a PR.

For logging, use the logger object. Avoid logging inside tight loops, but if you must, use logging.isEnabledFor(level) (see here and examples in code). I don't want to spam the logger, either, so only log things that aid in debugging.

For the error, rather than just try-excepting the offending line, I'd rather figure out why it's getting to that state and try to avoid that. Try-excepts tend to hide errors that I'd rather be aware of. In this case, when you use --delimiter="@", the splitter function (determined in _make_split() in delphin/commands.py) is tsdb.split(), which is defined to return None for empty columns. Since this is an expected (but overlooked) column value, it's probably best to just detect this special case and set i-length to 0.

I have released v1.6.0 already, so I've removed this issue from the milestone. I'm happy to put it in a 1.6.1 patch release.

As a hint, the problem is here:

if with_i_length and 'i-length' not in colmap and 'i-input' in colmap:
colmap['i-length'] = len(colmap['i-input'].split())

Line 424 assumes that colmap['i-input'] will be a string, but it's None when the column is empty, so this needs to be checked.