glimmerphoenix/WikiDAT

redis.exceptions.DataError: Invalid input of type: 'NoneType'

Opened this issue · 0 comments

First, thank you for you big word on this project, help me a lot in writing my phd dissertation (sociology);

In parsing frwiki dump, I get the following error :

Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/local/lib/python3.6/dist-packages/WikiDAT-0.1-py3.6.egg/wikidat/retrieval/processors.py", line 220, in run
    for item in target(self.items(), **self.kwargs):
  File "/usr/local/lib/python3.6/dist-packages/WikiDAT-0.1-py3.6.egg/wikidat/retrieval/revision.py", line 359, in revs_to_file
    redis_cache.hset(lang + ':userzero', int(rev['id']), username)
  File "/usr/local/lib/python3.6/dist-packages/redis-3.0.1-py3.6.egg/redis/client.py", line 2617, in hset
    return self.execute_command('HSET', name, key, value)
  File "/usr/local/lib/python3.6/dist-packages/redis-3.0.1-py3.6.egg/redis/client.py", line 754, in execute_command
    connection.send_command(*args)
  File "/usr/local/lib/python3.6/dist-packages/redis-3.0.1-py3.6.egg/redis/connection.py", line 619, in send_command
    self.send_packed_command(self.pack_command(*args))
  File "/usr/local/lib/python3.6/dist-packages/redis-3.0.1-py3.6.egg/redis/connection.py", line 659, in pack_command
    for arg in imap(self.encoder.encode, args):
  File "/usr/local/lib/python3.6/dist-packages/redis-3.0.1-py3.6.egg/redis/connection.py", line 124, in encode
    "byte, string or number first." % typename)
redis.exceptions.DataError: Invalid input of type: 'NoneType'. Convert to a byte, string or number first.

I am precisely parsing this file : http://dumps.wikimedia.org/frwiki/20181101/frwiki-20181101-pages-meta-history1.xml-p65189p81585.7z

The error don't show up after I put in comment this lines on revision.py :

 if user == 0:
    user = -2  # Special value for case: (NULL, username)
   redis_cache.hset(lang + ':userzero', int(rev['id']), username)

Sorry for this question, this does not seem to be a big bug, but i'm quite a noob in python :)