ToeBee/ChangesetMD

Error with replication branch

Opened this issue · 1 comments

opening replication file at http://planet.osm.org/replication/changesets/001/507/867.osm.gz
Traceback (most recent call last):
  File "changesetmd.py", line 190, in <module>
    md.doReplication(conn)
  File "changesetmd.py", line 150, in doReplication
    self.parseFile(connection, self.fetchReplicationFile(currentSequence), True)
  File "changesetmd.py", line 71, in parseFile
    action, root = context.next()
  File "iterparse.pxi", line 208, in lxml.etree.iterparse.__next__ (src/lxml/lxml.etree.c:131322)
lxml.etree.XMLSyntaxError: no element found

On next run

$ python changesetmd.py -d changesets -r
concurrent update in progress. Bailing out!

The contents of this replication diff are empty

I guess there's two issues here. One is handling empty diffs better, the other is to make sure that the status is set to not in progress when exiting.

I can see two ways to do this...

  1. Set the running flag
  2. Do update work
    2b. Catch any errors, and unset the running flag then exit
  3. Unset the running flag

or

  1. Acquire an explicit lock on the status table
  2. begin transaction
  3. Set the running flag
  4. Commit. This keeps the lock
  5. For each diff
    1. download
    2. begin transaction
    3. insert new data
    4. update sequence in status table
    5. commit
  6. catch errors from the above
  7. begin transaction
  8. de-set the running flag
  9. commit

Thinking about it, this could still result in a flag set problem is changesetmd crashes or is terminated after 4 and before 7, but it would require a hard enough crash to not throw an exception that can be caught and the flag de-set.

To completely avoid that, the best route is probably to get rid of the flag and use the explicit lock to indicate if it's running or not.