ageron/handson-ml2

[BUG] [Chapter2] ValueError when calculating the correlations in corr_matrix = housing.corr()

tobihh12 opened this issue · 4 comments

The following ValueError occurs when calling

corr_matrix = housing.corr()

ValueError: could not convert string to float: 'INLAND'

Obviously the DataFrame gets confused by the values in "ocean_proximity"

For me the solution was to change the 2 lines were the correlations are calculated to:

corr_matrix = housing.corr(numeric_only=True)

According to:

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.corr.html

The default value of numeric_only for DataFrame.corr() was changed to False effective as of pandas 2.0.0

I got the same problem too.
As you said numeric_only should be added.

Changed in version 2.0.0: The default value of numeric_only is now False.

Problem I got below

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[36], line 1
----> 1 corr_matrix = housing.corr()

File [c:\Users\hansu\anaconda3\envs\handson-ml2\Lib\site-packages\pandas\core\frame.py:10054](file:///C:/Users/hansu/anaconda3/envs/handson-ml2/Lib/site-packages/pandas/core/frame.py:10054), in DataFrame.corr(self, method, min_periods, numeric_only)
  10052 cols = data.columns
  10053 idx = cols.copy()
> 10054 mat = data.to_numpy(dtype=float, na_value=np.nan, copy=False)
  10056 if method == "pearson":
  10057     correl = libalgos.nancorr(mat, minp=min_periods)

File [c:\Users\hansu\anaconda3\envs\handson-ml2\Lib\site-packages\pandas\core\frame.py:1838](file:///C:/Users/hansu/anaconda3/envs/handson-ml2/Lib/site-packages/pandas/core/frame.py:1838), in DataFrame.to_numpy(self, dtype, copy, na_value)
   1836 if dtype is not None:
   1837     dtype = np.dtype(dtype)
-> 1838 result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)
   1839 if result.dtype is not dtype:
   1840     result = np.array(result, dtype=dtype, copy=False)

File [c:\Users\hansu\anaconda3\envs\handson-ml2\Lib\site-packages\pandas\core\internals\managers.py:1732](file:///C:/Users/hansu/anaconda3/envs/handson-ml2/Lib/site-packages/pandas/core/internals/managers.py:1732), in BlockManager.as_array(self, dtype, copy, na_value)
   1730         arr.flags.writeable = False
   1731 else:
-> 1732     arr = self._interleave(dtype=dtype, na_value=na_value)
   1733     # The underlying data was copied within _interleave, so no need
   1734     # to further copy if copy=True or setting na_value
...
-> 1794     result[rl.indexer] = arr
   1795     itemmask[rl.indexer] = 1
   1797 if not itemmask.all():

ValueError: could not convert string to float: 'INLAND'
Output is truncated. View as a [scrollable element](command:cellOutput.enableScrolling?b1c2f400-b237-4442-b924-315adf80e9c3) or open in a [text editor](command:workbench.action.openLargeOutput?b1c2f400-b237-4442-b924-315adf80e9c3). Adjust cell output [settings](command:workbench.action.openSettings?%5B%22%40tag%3AnotebookOutputLayout%22%5D)...

same with you guys.

In new version of python, numeric_only by default is set to False.