installation on google colab, self.numt assert
Opened this issue · 2 comments
Hi @laudv
-
Thank you for your paper and for sharing your project; it presents a very interesting approach.
-
I've successfully built and installed it on Google Colab. You can find it here:
https://colab.research.google.com/drive/1lz6ps34TWMsVm07cnW3EC--pIPTY4PgT?usp=sharing
I'm relatively new to Rust, so I'm not entirely sure if I've followed all the necessary steps correctly. During the build process, I encountered several warnings like the one below:
warning: unused import: `BitsliceLayout`
--> src/count_and_sum.rs:11:23
|
11 | use crate::bitslice::{BitsliceLayout, BitsliceWithLayout};
| ^^^^^^^^^^^^^^
|
= note: `#[warn(unused_imports)]` on by default
but your example seems to run smoothly.
- I've attempted to test BitBoost with the Numerai dataset, as it appears to be a perfect fit (features with only 0, 1, 2, 3, 4 values). The training phase seems fine, but during prediction, I encountered an assert error:
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
[<ipython-input-20-7aef979a8edf>](https://localhost:8080/#) in <cell line: 1>()
----> 1 validation[f"prediction_{target}"] = model2.predict(validation[features].to_numpy())
1 frames
[/content/bitboost/python/bitboost/sklearn.py](https://localhost:8080/#) in predict(self, X)
68 check_is_fitted(self, "_is_fitted")
69
---> 70 self._bitboost.set_data(X)
71 return self._bitboost.predict()
72
[/content/bitboost/python/bitboost/bitboost.py](https://localhost:8080/#) in set_data(self, data, cat_features)
126 self._check()
127 assert isinstance(data, np.ndarray)
--> 128 assert data.dtype == self.numt
129 assert data.shape[1] == self._nfeatures
130
AssertionError:
I'm not sure what's causing it or how to resolve the issue.
- In lightGBM, I'm using following parameters:
model = lgb.LGBMRegressor(
n_estimators=100, # If you want to use a larger model we've found 20_000 trees to be better
learning_rate=0.01, # and a learning rate of 0.001
max_depth=5, # and max_depth=6
num_leaves=2**5-1, # and num_leaves of 2**6-1
colsample_bytree=0.1
)
To ensure an "apples-to-apples" comparison, how should I configure BitBoost?
It'd be great if BitBoost could give similar accuracy with fraction of time :)
Best regards,
Marek
Hi Marek,
Thanks for the clear overview and thanks for taking interest in BitBoost.
First a disclaimer. I made BitBoost about 4-5 years ago now. As is often the case with research code, unfortunately, BitBoost is not a nicely packaged, finished product. How exactly do you want to use it? Is it for research purposes, or do you want to use it in another way? I would not recommend using BitBoost in a production environment.
If you are trying BitBoost to experiment with the bit-level optimizations, then I'm very happy to provide help where necessary.
- The unused import warnings are resolved by removing the unused imports.
- If that dataset contains many low-cardinality categorical features, then yes, it is a good fit. The error seems to indicate that your evaluation data is of the wrong
dtype
. Try.astype(BitBoostRegressor.numt)
on your numpy array. - You should also use feature subsampling in BitBoost (use the
feature_fraction
parameter). This corresponds to thecolsample_bytree
paramter of LightGBM.
Best, Laurens
Hi Laurens,
Indeed, .astype(BitBoostRegressor.numt) solves the issue :)
My main goal is to have a faster and more memory-efficient library than LightGBM for datasets like Numerai, and BitBoost seems to be a good starting point for experiments.
The first idea that comes to my mind is to replace ctypes float with bfloat16 (https://crates.io/keywords/bfloat16). I'll try to investigate it.
Best regards,
Marek