mochi-hpc/mobject

potential race problems in get_or_create_oid in mobject server

Opened this issue · 1 comments

In GitLab by @shanedsnyder on Feb 20, 2019, 11:19

Before performing any operations within a write_op, mobject checks oid existence by checking the kv mapping of names->oids. If the oid is not found it is created, a process involving the following:

  1. Hashing the object_name to an oid, verifying the generated OID does not already exist in the kv mapping oids->names
  2. Putting the name->oid mapping in a kv
  3. Putting the oid->name mapping in a kv

A server receiving a number of different write_ops referencing the same underlying oid is susceptible to races here. E.g., 2 clients writing to an object that does not yet exist on a specific server will result in both of them attempting steps 1--3. If one client is able to finish step 3 while another is checking for potential hash collisions, we get some inconsistent state that is going to cause issues (namely, two different oids referring to the same underlying object).

In GitLab by @shanedsnyder on Feb 21, 2019, 14:30

I committed something in 3743573 to help avoid race conditions in this particular code path. Particularly, server read ULTs can detect if multiple operations are trying to compete to create an object, with the operations that lost the race falling back to lookup the newly created object.

As @carns pointed out, we should probably find another way to manage consistency here than the keyval. Some sort of data structure to serialize operations going to the same OID but to allow operations to different OIDs complete in parallel.

Will leave this issue open as reminder to investigate.