bashtage/linearmodels

MemoryError with IV2SLS

Opened this issue · 0 comments

I'm trying to run a 2SLS to estimate price elasticity with IV2SLS. This is what my data looks like:
| ln_q | ln_p | .... weather variables ... | ... instruments ... |... user id dummies ...|

all data is np.float32. My data array is approx. (200000, 20000) which is about 16GB.

Using linearmodels IV2SLS I set up my model like:

dependent = ln_q
endog = weather variables + user id dummies
exog = ln_p
instruments = instruments
results = IV2SLS(dependent, endog, exog, instruments).fit()

When running with the full dataset I consistently get the error:
Unable to allocate 27.8GiB of memory to an array with shape (202507, 18450) and data type float64 and it looks like this line is the culprit:
self._wz = self._z * w
which is where weights are assigned.
I'm running 64-bit python on a machine with 128 GB of RAM. I've tried to circumvent this issue by passing my own weights:
results = IV2SLS(dependent, endog, exog, instruments, weights=np.ones(dependent.shape, dtype=np.float32)).fit()
but still get the same MemoryError even when I explicitly pass my own weights of data type float32.
32 GB of RAM usage just to create an array of 1s when weights = None seems like an awful lot of memory usage to essentially keep the input values unchanged. Further, why is it getting recast to float64, when all my other data is of data type float32 and I explicitly pass weights of datatype float32?
Why is an array of ~16GB using >100GB of RAM in this process? What can I do to get this regression to run?