internetarchive/openlibrary

Remove simplejson dependency

Closed this issue · 12 comments

Subtask of #2852
See #4296, #4297, #4298, #4299

simplejson is no longer needed since the json module was added to Python. There are some concerns with performance on Python 2, so let's wait until we finish our Python 3 migration, and then drop this dependency.

Additional context

  • #1451 : Discussion of whether we want to abandon simplejson
  • Blocked on Python 3 migration

Stakeholders

@cclauss

Why close one issue to create another on the same topic?

The initial ticket was to decide whether we want to remove simplejson as a dependency. This one is to actually remove it. The person who takes on this issue doesn't have to spend time reading the discussion in the previous issue, or doing research about whether to remove it; then can just start editing the code. I'm trying to make life easier for the people that come after us (which might also be us!).

My approach would be to change the title but preserve the context.

The makes the context required reading for a future dev. The context is still preserved, but now it's just optional.

I think we can move on. Hopefully you heard the message that both @cclauss and I disagree with this approach.

Even though this issue is blocked, it needs an assignee. I'd recommend @cclauss or @salman-bhai since they're team leads for Python3. Let me know your thoughts

edit: typo

I actually don’t believe that this is blocked and I have not seen any modern benchmarks that say that simplejson is faster than the current Standard Library json.

This SO answer claims that simplejson is faster than Python 2.7 json for loading, but I'm not sure it's significant for the amount of JSON we deal with.

Given that we're only a couple of months from the retirement of Python 2.7 and we don't want to have to do all the work on Dec. 31, I'm happy to consider this unblocked.

Blocked state removed. @cclauss Are you willing to be assignee for this issue? Note, being assignee does not necessarily mean you are responsible for doing the work for this issue, just responsible for gathering pertinent information to start the work. From the Managed Label Wiki:

At submission and prior to triage, the assigned owner is not necessarily the person who will fix the issue (it is not necessarily even established, at that point, if or when the issue will be fixed at all), but rather they are the person who will do as much or as little as needed to handle the issue (asking questions, soliciting input, establishing and updating the priority, checking if it is a duplicate, etc).

Once an issue is labeled State: Work In Progress, the owner is the individual doing the work, or leading/coordinating the group that is doing the work.

@cclauss one thing I have found that is different between simplejson and json is regarding collections.namedtuple. If you see https://stackoverflow.com/a/36091411 and example is below, which seems the same between python2 and python3 when I test it.

    import collections, simplejson, json
    >>> TupleClass = collections.namedtuple("TupleClass", ("a", "b"))
    >>> value = TupleClass(1, 2)
    >>> json.dumps(value)
    '[1, 2]'
    >>> simplejson.dumps(value)
    '{"a": 1, "b": 2}'
    >>> simplejson.dumps(value, namedtuple_as_object=False)
    '[1, 2]'

I found the following places where namedtuples are used:

  • scripts/copydocs.py - doesn't seem affected because the namedtuple is not serialized with json
  • scripts/solr_builder/solr_builder_main.py which is also not serialized with json.

I don't see any namedtuple in infogami.

As far as I can tell, namedtuple will not be an issue in openlibrary from what I have seen so far.

@dherbst Awesome work! Only eight Python files mentioning simplejson remaining in openlibrary:

grep simplejson **/*.py > files.txt

with open("files.txt") as in_file:
    files = {line.split(":")[0] for line in in_file}
print("\n".join(sorted(files)))

Remove the scripts/20xx (#1449) and vendor/infogami (internetarchive/infogami#133) files:

  • openlibrary/utils/bulkimport.py #4463
  • openlibrary/accounts/model.py #4461
  • scripts/oclc_to_marc.py #4475 #4515
  • scripts/jsondump.py #4476
  • scripts/migrate_db.py #4477
  • scripts/copydocs.py #4478
  • openlibrary/utils/solr.py #4479
  • setup.py

Closing. Awesome work @dherbst! Once we have remove simplejson from Infogami, we can also remove it from our requirements*.txt files.