ga4gh/ga4gh-schemas

pip install ga4gh-schemas fails

kozbo opened this issue · 7 comments

kozbo commented

Steps to reproduce:

virtualenv schema-test; source schema-test/bin/activate
pip install ga4gh-schemas --ignore-installed --no-cache-dir

then you will get this error:

(jan17) KO-Mac:server kozbo$ pip install ga4gh-schemas --ignore-installed --no-cache-dir
Collecting ga4gh-schemas
  Downloading ga4gh_schemas-0.0.9.tar.gz (2.1MB)
    100% |████████████████████████████████| 2.1MB 2.4MB/s 
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/private/var/folders/xn/1bh5_q793113z57s4g9kr0wr0000gn/T/pip-build-vU6vnu/ga4gh-schemas/setup.py", line 37, in <module>
        process_schemas.main([PROTOCOL_VERSION, 'python'])
      File "scripts/process_schemas.py", line 222, in main
        pb.run(parsedArgs)
      File "scripts/process_schemas.py", line 207, in run
        protoc = self._getProtoc(destination_path)
      File "scripts/process_schemas.py", line 169, in _getProtoc
        protocs))
    Exception: Can't find a good protoc. Tried [u'/private/var/folders/xn/1bh5_q793113z57s4g9kr0wr0000gn/T/pip-build-vU6vnu/ga4gh-schemas/python/protobuf/src/protoc']

Note that the same error occurs for schemas version 0.6.0a9, but not version 0.0.8, which depended on the older version of protobuf. So the error seems to be connected with the dependency on the new protobuf package.

@dcolligan I am hoping that you can sort out this dependency problem. The release is blocked.

You do need to have the protocol buffers compiler installed if you want to generate your own bindings, but we shouldn't place that requirement on the release. When you do the release you can run protoc locally on your system and comment out the lines for process_schemas in setup.py.

kozbo commented

It makes sense to require developers who want to use the pip installable schemas to have protoc pre-installed (though we should inform them of this), but there are other pathways that try to install the schemas that do not require the protoc compiler. Document generation in the virtual environment provided through Read The Docs is one example (fails in the same way). Someone installing the ga4gh-client (which also fails now) library is another, perhaps more important example.

Works for me (since I have protoc installed on my system). Have we always required protoc for the install to work, or is this a recent development?

Yes, recent. I added that so that the compiled proto weren't checked into the tree, but a build result. When it comes to releases this isn't really very helpful, but when building from source it makes sense to have binary requirements. I believe it should be possible to use the Python protobuf module's internal API in place of protoc. https://github.com/google/protobuf/blob/master/python/google/protobuf/proto_builder.py

Here's someone asking if such a thing is possible:

http://stackoverflow.com/questions/35342328/dynamically-creating-python-class-from-a-protobuf-file-at-run-time

Looks like the answer is in theory yes, but to do so we need libraries that come with the protoc install, which is precisely what we can't depend upon during setup, hence this bug.

So, it looks like we have a couple options:

  1. Just require users of schemas and users of all downstream packages to have protoc installed (based on Kevin's comment above, I take it we're ruling that out)
  2. Check in pb2 files into schemas' python dir

Is there any reason we shouldn't pursue (2)? It will be an inconvenience during development to run an extra compile step every time the schemas are changed, but that's better than having the present issue. (And if one feels so inclined, one can simply hack one's setup.py to perform the current behavior in one's local schema branch to get the best of both worlds.)

There is the issue, if we pursue (2), of the pb2 files and proto files getting mismatched on check-ins to master (i.e. someone modifies the proto but doesn't compile the pb2s). We can either a) live with this or b) mandate parallel pb2 check ins by creating a test to compile the pb2s based on the checked-in protos and ensuring there are no discrepancies with the checked-in pb2 files.

Yeah, 2 (b) sounds good to me! To guarantee we can add a test that runs the process schemas to some temp location and compares the output to what is checked in.

I like the idea of setup.py failing gracefully when protoc isn't available so that the release and pb2 generation mechanism stay in one place, as opposed to separate scripts.

Closed with #781