mgormley/concrete-chunklink

Concrete-python error on running the script

Opened this issue · 1 comments

Hi @mgormley

I'm trying to run your tool but I get the following error:

$ python concrete_chunklink/add_chunks.py --chunklink scripts/chunklink_2-2-2000_for_conll.pl ~/data/ptb/ output/
INFO:root:Processing: /home/swabhas/data/ptb/WSJ_2412.MRG
Traceback (most recent call last):
  File "concrete_chunklink/add_chunks.py", line 190, in <module>
    main()
  File "concrete_chunklink/add_chunks.py", line 184, in main
    add_chunks_to_dir(in_path, out_path, chunklink, options.fail_on_error)
  File "concrete_chunklink/add_chunks.py", line 48, in add_chunks_to_dir
    add_chunks_to_file(in_file, out_file, chunklink, fail_on_error)
  File "concrete_chunklink/add_chunks.py", line 55, in add_chunks_to_file
    comm = read_communication_from_file(in_file)
  File "/home/swabhas/miniconda3/envs/py27/lib/python2.7/site-packages/concrete/util/file_io.py", line 103, in read_communication_from_file
    comm = read_thrift_from_file(Communication(), communication_filename)
  File "/home/swabhas/miniconda3/envs/py27/lib/python2.7/site-packages/concrete/util/file_io.py", line 86, in read_thrift_from_file
    protocol_factory=factory.protocolFactory)
  File "/home/swabhas/miniconda3/envs/py27/lib/python2.7/site-packages/thrift/TSerialization.py", line 37, in deserialize
    base.read(protocol)
  File "/home/swabhas/miniconda3/envs/py27/lib/python2.7/site-packages/concrete/communication/ttypes.py", line 288, in read
    iprot._fast_decode(self, iprot, (self.__class__, self.thrift_spec))
TypeError: struct field had wrong type: expected 12 but got 14

My input file is formatted as below:


( (S
    (SBAR-TMP
      (WHADVP-1 (WRB When) )
      (S
        (NP-SBJ (PRP it) )
        (VP (VBZ 's)
          (NP-PRD
            (NP (NN time) )
            (PP (IN for)
              (NP (PRP$ their) (JJ biannual) (NN powwow) )))
          (ADVP-TMP (-NONE- *T*-1) ))))
    (, ,)
    (NP-SBJ
      (NP (DT the) (NN nation) (POS 's) )
      (VBG manufacturing) (NNS titans) )
    (ADVP (RB typically) )
    (VP (VBP jet)
      (PRT (RP off) )
      (PP-DIR (TO to)
        (NP
          (NP (DT the) (JJ sunny) (NNS confines) )
          (PP (IN of)
            (NP
              (NP (NN resort) (NNS towns) )
              (PP (IN like)
                (NP
                  (NP (NNP Boca) (NNP Raton) )
                  (CC and)
                  (NP (NNP Hot) (NNP Springs) ))))))))
    (. .) ))
( (FRAG (RB Not)
    (NP-TMP (DT this) (NN year) )
    (. .) ))

Could you please help me understand what might be going on?

Turns out that runnning the script directly with something like
cat WSJ_2101.MRG | perl chunklink_2-2-2000_for_conll.pl -N -ns | more works fine.