Minimal Blaze Example Error
RaymondMcT opened this issue · 12 comments
Been stuck trying to get past this error. Possibly a timestamp formatting issue with a dependency version.
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-18-ddd795584d2e> in <module>()
2 p,
3 pd.Timestamp('2016-01-05', tz='utc'),
----> 4 pd.Timestamp('2018-01-04', tz='utc')
5 )
~/dev/alphatools/lib/python3.5/site-packages/zipline/pipeline/engine.py in run_pipeline(self, pipeline, start_date, end_date)
309 dates,
310 assets,
--> 311 initial_workspace,
312 )
313
~/dev/alphatools/lib/python3.5/site-packages/zipline/pipeline/engine.py in compute_chunk(self, graph, dates, assets, initial_workspace)
522 loader = get_loader(term)
523 loaded = loader.load_adjusted_array(
--> 524 to_load, mask_dates, assets, mask,
525 )
526 assert set(loaded) == set(to_load), (
~/dev/alphatools/lib/python3.5/site-packages/zipline/pipeline/loaders/blaze/core.py in load_adjusted_array(self, columns, dates, assets, mask)
891 self.pool.imap_unordered(
892 partial(self._load_dataset, dates, assets, mask),
--> 893 itervalues(groupby(getitem(self._table_expressions), columns)),
894 ),
895 )
~/dev/alphatools/lib/python3.5/site-packages/toolz/dicttoolz.py in merge(*dicts, **kwargs)
36
37 rv = factory()
---> 38 for d in dicts:
39 rv.update(d)
40 return rv
~/dev/alphatools/lib/python3.5/site-packages/zipline/pipeline/loaders/blaze/core.py in _load_dataset(self, dates, assets, mask, columns)
985 assets,
986 columns,
--> 987 all_rows,
988 )
989 else:
~/dev/alphatools/lib/python3.5/site-packages/zipline/pipeline/loaders/blaze/_core.pyx in zipline.pipeline.loaders.blaze._core.adjusted_arrays_from_rows_with_assets()
~/dev/alphatools/lib/python3.5/site-packages/zipline/pipeline/loaders/blaze/_core.pyx in zipline.pipeline.loaders.blaze._core.adjusted_arrays_from_rows_with_assets()
~/dev/alphatools/lib/python3.5/site-packages/zipline/pipeline/loaders/blaze/_core.pyx in zipline.pipeline.loaders.blaze._core.arrays_from_rows_with_assets()
~/dev/alphatools/lib/python3.5/site-packages/zipline/pipeline/loaders/blaze/_core.pyx in zipline.pipeline.loaders.blaze._core.arrays_from_rows()
~/dev/alphatools/lib/python3.5/site-packages/numpy/core/_internal.py in _view_is_safe(oldtype, newtype)
387
388 if newtype.hasobject or oldtype.hasobject:
--> 389 raise TypeError("Cannot change data-type for object array.")
390 return
391
TypeError: Cannot change data-type for object array.
Current list dependencies causing the error is here. I don't believe sqlite3 versioning is the issue (tried on 3.11, 3.18, 3.24)
Ok, I am actually having difficulty myself repro the nb pipeline-blaze-minimal
. The dshape
is not sticking.
ds_dshape = dshape("var*{asof_date: datetime, sid: int64, value: float64}")
expr = bz.Data(
'sqlite:///temp.db::ds_table',
dshape=ds_dshape
)
expr.dshape
gives dshape("var * {asof_date: ?date, sid: ?int32, value: ?float32}")
.
I'll file an issue with blaze
and revert back.
I had that same problem using the newest version of blaze. It works (gets past that step) if you use the specific older version of blaze from quantopian:
https://github.com/quantopian/zipline/blob/master/etc/requirements_blaze.txt
Wow, that's pretty specific. Thanks @RaymondMcT .
Do you know, what's the best way to get pip
to see that file in my local zipline
package? i.e., is there some pip
syntax that says "use the requirements_blaze.txt" file in zipeline.etc
without specifying the fully qualified path?
I'm unsure how to get it to interact with setup.py. Up to now I have just been installing the local zipline pip3 install -e .
followed by pip3 install -r etc/requirements_blaze.txt
Hey @RaymondMcT , thank you for that. After following the (updated) install process described in the README.md here, I can run the pipeline-blaze-minimal
nb. Note that I am using Python 2.7, as that's what the Quantopian Research environment uses. I see you are using 3.5 and that may be a source of issue.
Excellent, at least we know something works all the way through. Looked through the install process described in the README. The dependencies I've been using are significantly different than what you've documented. It's going to take me some time to setup a new environment but hopefully that will solve the issues. I'll keep you posted. Thank you for all the documentation!
hey @RaymondMcT , check out my new branch called runtime_factors
. This allows you to simply point to a data source (on disk for example) and you get to use it right way in Pipeline. See the README in that branch. It implements the BlazeLoader
in the background and gets the data for you...no special coding up for it.
Edit: now merged to master
.
I believe I have finally gotten it working using up to date dependencies. I had to modify a piece of zipline internal code which I really didn't want to do, but at least I have a starting point. I should be able to investigate what's going on now and possibly get a pull request into Quantopian. Thank you very much for all your help!
Great news. You should be able to run_pipeline
in a Jupyter NB without any change to the zipline code. However, if you want to run a backtest (i.e., run_algorithm
) then, as per this issue, you won't be able to access any data via the BlazeLoader
because it isn't registered by default.
@RaymondMcT , How do you change the code in zipline, i met the issue when use python v3.6.8
@inevity Sorry for the delay, been sick. I'm unsure of the consequences of doing this, but it didn't appear to affect anything negatively.
diff --git a/zipline/pipeline/loaders/blaze/core.py b/zipline/pipeline/loaders/blaze/core.py
index e6c8b89d..7c700aab 100644
--- a/zipline/pipeline/loaders/blaze/core.py
+++ b/zipline/pipeline/loaders/blaze/core.py
@@ -971,6 +971,9 @@ class BlazeLoader(object):
all_rows[TS_FIELD_NAME] = all_rows[TS_FIELD_NAME].astype(
'datetime64[ns]',
)
+ all_rows[AD_FIELD_NAME] = all_rows[AD_FIELD_NAME].astype(
+ 'datetime64[ns]',
+ )
all_rows.sort_values([TS_FIELD_NAME, AD_FIELD_NAME], inplace=True)
if have_sids:
Oh, How did you trace the issue to here? Recently i use ipdb /pdb to find the issue occurs in pyx cython file. So i need cython debug and hang on installing gdb cython on macos. Waste too many time.
Anyway ,Thank you!