quiltdata/t4

Package.load fails with InvalidLineError

evamaxfield opened this issue · 5 comments

OS: MacOS Mojave 10.14.4 (18E226)
Python Version: 3.7.3
t4 Version: 0.0.10

Used normal pkg.build(), build manifest was created and seemingly looks good, tried to load manifest and received error InvalidLineError. Traceback:

---------------------------------------------------------------------------
JSONDecodeError                           Traceback (most recent call last)
/usr/local/anaconda3/envs/aicsdatapackaging/lib/python3.7/site-packages/jsonlines/jsonlines.py in read(self, type, allow_none, skip_empty)
    158         try:
--> 159             value = self._loads(line)
    160         except ValueError as orig_exc:

/usr/local/anaconda3/envs/aicsdatapackaging/lib/python3.7/json/__init__.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    347             parse_constant is None and object_pairs_hook is None and not kw):
--> 348         return _default_decoder.decode(s)
    349     if cls is None:

/usr/local/anaconda3/envs/aicsdatapackaging/lib/python3.7/json/decoder.py in decode(self, s, _w)
    336         """
--> 337         obj, end = self.raw_decode(s, idx=_w(s, 0).end())
    338         end = _w(s, end).end()

/usr/local/anaconda3/envs/aicsdatapackaging/lib/python3.7/json/decoder.py in raw_decode(self, s, idx)
    354         except StopIteration as err:
--> 355             raise JSONDecodeError("Expecting value", s, err.value) from None
    356         return obj, end

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

The above exception was the direct cause of the following exception:

InvalidLineError                          Traceback (most recent call last)
<ipython-input-4-0c27a554d04e> in <module>
      1 import t4
----> 2 pkg = t4.Package.load("/allen/aics/modeling/t4/manifests/aics/pipeline_integrated_cell/2019-04-16T18:33:28.json")
      3 pkg

/usr/local/anaconda3/envs/aicsdatapackaging/lib/python3.7/site-packages/t4/packages.py in load(cls, readable_file)
    512         """
    513         reader = jsonlines.Reader(readable_file)
--> 514         meta = reader.read()
    515         meta.pop('top_hash', None)  # Obsolete as of PR #130
    516         pkg = cls()

/usr/local/anaconda3/envs/aicsdatapackaging/lib/python3.7/site-packages/jsonlines/jsonlines.py in read(self, type, allow_none, skip_empty)
    162                 "line contains invalid json: {}".format(orig_exc),
    163                 line, lineno)
--> 164             six.raise_from(exc, orig_exc)
    165 
    166         if value is None:

/usr/local/anaconda3/envs/aicsdatapackaging/lib/python3.7/site-packages/six.py in raise_from(value, from_value)

InvalidLineError: line contains invalid json: Expecting value: line 1 column 1 (char 0) (line 1)

Maybe relevant: The path provided to load is a symlink to the actual manifest.
/allen/aics/modeling/t4/manifests/aics/pipeline_integrated_cell/2019-04-16T18:33:28.json -> /allen/aics/modeling/t4/manifests/aics/pipeline_integrated_cell/.quilt/packages/3ac72ee165e825341cba7803e474faaa0fb51c0a33f5cfd705dd0c153413483e

More context:
pkg.build() was ran with options:
hash = pkg.build(name=f"{self.package_owner}/{self.package_name}", registry=self.build_path)

self.package_owner = aics
self.package_name = pipeline_integrated_cell
self.build_path = Path(f"/allen/aics/modeling/t4/manifests/{self.PACKAGE_OWNER}/{self.PACKAGE_NAME}")

Tested Package.load() on non-symlink path, same error.

One quick comment would be to try using Package.browse("{self.PACKAGE_OWNER}/{self.PACKAGE_NAME}") instead. Package.load() is a low-level API.

More info https://quiltdocs.gitbook.io/t4/walkthrough/installing-a-package#browsing-a-package-manifest.

I was able to reproduce this error locally.

When you open a package by name via t4.Package.browse, it eventually executes the following method:

    @classmethod
    def _from_path(cls, uri):
        """ Takes a URI and returns a package loaded from that URI """
        src_url = urlparse(uri)
        if src_url.scheme == 'file':
            with open(parse_file_url(src_url)) as open_file:
                pkg = cls.load(open_file)
        elif src_url.scheme == 's3':
            body, _ = get_bytes(uri)
            pkg = cls.load(io.BytesIO(body))
        else:
            raise NotImplementedError
        return pkg

Which calls on load to open the manifest. load constructs a jsonlines.Reader from the file handle and uses that to do a line-by-line read. Either the docs are wrong, or the code changed without me knowing.

So the solution is simply to instead run:

Package.load(open("name_of_your_file", "r"))

I will update the docs accordingly.

Nice catch! I would update the docs to use the context manager instead of just open.

import t4
with open("/path/to/manifest.json", "r") as manifest_obj:
    pkg = t4.Package.load(manifest_obj)

pkg

Feel free to close issue!