marketneutral/alphatools

Minimal Blaze Example Error

RaymondMcT opened this issue · 12 comments

Been stuck trying to get past this error. Possibly a timestamp formatting issue with a dependency version.

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-18-ddd795584d2e> in <module>()
      2     p,
      3     pd.Timestamp('2016-01-05', tz='utc'),
----> 4     pd.Timestamp('2018-01-04', tz='utc')
      5 )

~/dev/alphatools/lib/python3.5/site-packages/zipline/pipeline/engine.py in run_pipeline(self, pipeline, start_date, end_date)
    309             dates,
    310             assets,
--> 311             initial_workspace,
    312         )
    313 

~/dev/alphatools/lib/python3.5/site-packages/zipline/pipeline/engine.py in compute_chunk(self, graph, dates, assets, initial_workspace)
    522                 loader = get_loader(term)
    523                 loaded = loader.load_adjusted_array(
--> 524                     to_load, mask_dates, assets, mask,
    525                 )
    526                 assert set(loaded) == set(to_load), (

~/dev/alphatools/lib/python3.5/site-packages/zipline/pipeline/loaders/blaze/core.py in load_adjusted_array(self, columns, dates, assets, mask)
    891             self.pool.imap_unordered(
    892                 partial(self._load_dataset, dates, assets, mask),
--> 893                 itervalues(groupby(getitem(self._table_expressions), columns)),
    894             ),
    895         )

~/dev/alphatools/lib/python3.5/site-packages/toolz/dicttoolz.py in merge(*dicts, **kwargs)
     36 
     37     rv = factory()
---> 38     for d in dicts:
     39         rv.update(d)
     40     return rv

~/dev/alphatools/lib/python3.5/site-packages/zipline/pipeline/loaders/blaze/core.py in _load_dataset(self, dates, assets, mask, columns)
    985                 assets,
    986                 columns,
--> 987                 all_rows,
    988             )
    989         else:

~/dev/alphatools/lib/python3.5/site-packages/zipline/pipeline/loaders/blaze/_core.pyx in zipline.pipeline.loaders.blaze._core.adjusted_arrays_from_rows_with_assets()

~/dev/alphatools/lib/python3.5/site-packages/zipline/pipeline/loaders/blaze/_core.pyx in zipline.pipeline.loaders.blaze._core.adjusted_arrays_from_rows_with_assets()

~/dev/alphatools/lib/python3.5/site-packages/zipline/pipeline/loaders/blaze/_core.pyx in zipline.pipeline.loaders.blaze._core.arrays_from_rows_with_assets()

~/dev/alphatools/lib/python3.5/site-packages/zipline/pipeline/loaders/blaze/_core.pyx in zipline.pipeline.loaders.blaze._core.arrays_from_rows()

~/dev/alphatools/lib/python3.5/site-packages/numpy/core/_internal.py in _view_is_safe(oldtype, newtype)
    387 
    388     if newtype.hasobject or oldtype.hasobject:
--> 389         raise TypeError("Cannot change data-type for object array.")
    390     return
    391 

TypeError: Cannot change data-type for object array.

Current list dependencies causing the error is here. I don't believe sqlite3 versioning is the issue (tried on 3.11, 3.18, 3.24)

Ok, I am actually having difficulty myself repro the nb pipeline-blaze-minimal. The dshape is not sticking.

ds_dshape = dshape("var*{asof_date: datetime, sid: int64, value: float64}")
expr = bz.Data(
    'sqlite:///temp.db::ds_table',
    dshape=ds_dshape
)
expr.dshape

gives dshape("var * {asof_date: ?date, sid: ?int32, value: ?float32}").

I'll file an issue with blaze and revert back.

I had that same problem using the newest version of blaze. It works (gets past that step) if you use the specific older version of blaze from quantopian:

https://github.com/quantopian/zipline/blob/master/etc/requirements_blaze.txt

Wow, that's pretty specific. Thanks @RaymondMcT .

Do you know, what's the best way to get pip to see that file in my local zipline package? i.e., is there some pip syntax that says "use the requirements_blaze.txt" file in zipeline.etc without specifying the fully qualified path?

I'm unsure how to get it to interact with setup.py. Up to now I have just been installing the local zipline pip3 install -e . followed by pip3 install -r etc/requirements_blaze.txt

Hey @RaymondMcT , thank you for that. After following the (updated) install process described in the README.md here, I can run the pipeline-blaze-minimal nb. Note that I am using Python 2.7, as that's what the Quantopian Research environment uses. I see you are using 3.5 and that may be a source of issue.

Excellent, at least we know something works all the way through. Looked through the install process described in the README. The dependencies I've been using are significantly different than what you've documented. It's going to take me some time to setup a new environment but hopefully that will solve the issues. I'll keep you posted. Thank you for all the documentation!

hey @RaymondMcT , check out my new branch called runtime_factors. This allows you to simply point to a data source (on disk for example) and you get to use it right way in Pipeline. See the README in that branch. It implements the BlazeLoader in the background and gets the data for you...no special coding up for it.

Edit: now merged to master.

I believe I have finally gotten it working using up to date dependencies. I had to modify a piece of zipline internal code which I really didn't want to do, but at least I have a starting point. I should be able to investigate what's going on now and possibly get a pull request into Quantopian. Thank you very much for all your help!

Great news. You should be able to run_pipeline in a Jupyter NB without any change to the zipline code. However, if you want to run a backtest (i.e., run_algorithm) then, as per this issue, you won't be able to access any data via the BlazeLoader because it isn't registered by default.

@RaymondMcT , How do you change the code in zipline, i met the issue when use python v3.6.8

@inevity Sorry for the delay, been sick. I'm unsure of the consequences of doing this, but it didn't appear to affect anything negatively.

diff --git a/zipline/pipeline/loaders/blaze/core.py b/zipline/pipeline/loaders/blaze/core.py
index e6c8b89d..7c700aab 100644
--- a/zipline/pipeline/loaders/blaze/core.py
+++ b/zipline/pipeline/loaders/blaze/core.py
@@ -971,6 +971,9 @@ class BlazeLoader(object):
         all_rows[TS_FIELD_NAME] = all_rows[TS_FIELD_NAME].astype(
             'datetime64[ns]',
         )
+        all_rows[AD_FIELD_NAME] = all_rows[AD_FIELD_NAME].astype(
+            'datetime64[ns]',
+        )
         all_rows.sort_values([TS_FIELD_NAME, AD_FIELD_NAME], inplace=True)
 
         if have_sids:

Oh, How did you trace the issue to here? Recently i use ipdb /pdb to find the issue occurs in pyx cython file. So i need cython debug and hang on installing gdb cython on macos. Waste too many time.
Anyway ,Thank you!