pandas fixed width file support
johnayoub opened this issue · 8 comments
Koalas is now in Apache Spark officially. Let's file an issue there. From a cursory look, looks like we can implement it by 1. distributing input StringIO, 2. reading any file from the distributed file source.
cc @xinrong-databricks and @itholic since you guys are triaging the issues.
Thanks @HyukjinKwon. let me know if you need me to open an issue there.
@johnayoub Sure, can you open an issue to Apache Spark JIRA ?
@itholic @HyukjinKwon any update on this and when I can expect it to be included with koalas?
Hi, @johnayoub
Unfortunately, we have no clear plan to add read_fwf
yet (at least it's available after Spark 3.3 or later)
Anyway, at least it will be added to the PySpark first, and added to the Koalas after then.
(So, we'd recommend to use PySpark rather than Koalas since Koalas is now in maintenance mode)
FYI, you can easily convert your Koalas code to PySpark with single line change as below:
# import databricks.koalas as ks
import pyspark.pandas as ks
btw, just in case, maybe if you want to read files from http
, it will take longer since PySpark doesn't support reading from such file sources yet. Refer to #1219 for more detail about http
support.