input_file_name returns empty string
btelle opened this issue · 6 comments
Hello,
I'm trying to create a dataframe from an Excel doc and then append a column with the input file's name as returned by input_file_name
. Instead of returning the file path, input_file_name
is returning an empty string.
Test code
from pyspark.sql.functions import input_file_name
df = (sql_context.read.format("com.crealytics.spark.excel")
.option('sheetName', "Sheet1")
.option('useHeader', True)
.load("/home/btelle/test.xlsx")
)
df = df.withColumn('file_name', input_file_name())
df.select('file_name').show(1)
Expected result
+--------------------+
| file_name|
+--------------------+
|file:///home/btel...|
+--------------------+
Actual result
+---------+
|file_name|
+---------+
| |
+---------+
Does the same code work with other DataFrames?
I've tested the same code using multiple xls and xlsx files, same empty string result every time. Using other formats like csv
results in the expected result.
TBH I don't have any idea how input_file_name is supposed to work. Can you dig out the corresponding documentation and maybe some code examples what one has to do to make it work?
Maybe you can find the corresponding code in the CSV package.
Closing this due to inactivity.
Hi, I too face the same issue, the input_file_name() return empty when used with this.
example how this is supposed to work.
file_read = spark.read.option("header","true").csv(blob_location)
file_loc = file_read.withColumn("file_loc",input_file_name())
display(file_loc)
result:
col1 | col2 | file_loc
merin | 25 | wasbs://temp-blob@xxxxxxx.blob.core.windows.net/folder/file.csv
So this will give full path name of the file processed. But when using spark excell, the file path return empty.
See above:
TBH I don't have any idea how input_file_name is supposed to work. Can you dig out the corresponding documentation and maybe some code examples what one has to do to make it work?
Maybe you can find the corresponding code in the CSV package.