BUG: Integrated pandas can't Read CSV while latest pandas can
charliedream1 opened this issue · 1 comments
charliedream1 commented
Describe the bug
- Problem 1: a 25G csv file, latest pandas can load properly, however, "import xorbits.pandas as pd" can't, xorbits gives out EOF error
- Problem 2: a data frame data loaded from latest pandas can't be send to dup function (from xorbits.experimental import dedup)
- Problem 3: dedup function can handle a str with very long str, e.g. length between 4000-100,000, it gives out error "too many open files"
To Reproduce
To help us to reproduce this bug, please provide information below:
- Your Python version: 3.10
- The version of Xorbits you use: 0.6.3
- Versions of crucial packages, such as numpy, scipy and pandas: numpy 1.26.0, scipy 1.11.3, pandas 2.1.1
- Full stack of the error.
- Minimized code to reproduce the error.
Expected behavior
A clear and concise description of what you expected to happen.
Additional context
Add any other context about the problem here.
codingl2k1 commented
Problem 1: Is your csv file located in local disk or remote (by a url)?
Probelm 2: Are you using pandas to load the csv and constructing a xorbit Dataframe by the pandas Dataframe? If so, it could be out of memory crash, because the full data will be serilialized to worker.
Problem 3: The too many open files
can be fixed by configure the ulimit.