Gmousse/dataframe-js

fromCSV not seeming to work with local files on node

Closed this issue · 3 comments

Hi,
I needed some more performant handling of large CSV files for a project I'm working on. So I'm giving dataframe-js a spin, but I can't seem to even load my csv files. I'm just trying to load a local CSV file, but keep getting an error that seems to suggest I need to be running a server to serve the files. Which seems kind of ridiculous, and also not what the docs say. Here's some output:

> fs.existsSync('./raw_data/pro_record_db.csv')
true
> var df; DataFrame.fromCSV('./raw_data/pro_record_db.csv').then(function(theDf) { df = theDf });
Promise { <pending> }
> (node:54415) UnhandledPromiseRejectionWarning: Unhandled promise rejection (rejection id: 6): FileNotFoundError: ./raw_data/pro_record_db.csv not found. You maybe use a wrong path or url. Be sure you use absolute path, relative one being not supported.

Then looking at the source of addFileProtocol I changed it to not have a ./ at the beginning. So I changed my path a bit, but no luck.

> var df; DataFrame.fromCSV('~/code/bettor/js/raw_data/pro_record_db.csv').then(function(theDf) { df = theDf });
Promise { <pending> }
> df.show()
| Error:... |
------------
|     at... |
|     at... |
|     at... |
'| Error:... |\n------------\n|     at... |\n|     at... |\n|     at... |'
> df.listColumns()
[ 'Error: connect ECONNREFUSED 127.0.0.1:80' ]

The same thing happens when I use a fully absolute path of like 'Users/blakewest/code/project1/raw_data/pro_record_db.csv'.
Clearly, it appears it's trying to load the file from a server. But It's a local file. How do I get it to load the local file correctly? Thanks! - Blake

Hi @blakewest,
Thanks for reporting this issue.

In order to handle both http urls and system paths I have used d3 (which is good to handle csvs, json, ...) but it's little tricky and have some limitations (it use a browser equivalent file system).

Indeed, the use of "~/" (equivalent for "/home/youruser/") is not supported by d3 (and even by nodejs fs module I guess).
However, the absolute path should work.

I have tested this on my nodejs cli (node v8.5.0, linux manjaro 4.13.3-2-MANJARO):

var DataFrame = require('dataframe-js').DataFrame;

DataFrame.fromCSV('./titanic.csv').then(df => df.show())
# Throw an error as expected by the current version
# FileNotFoundError: ./titanic.csv not found. You maybe use a wrong path or url. Be sure you use absolute path, relative one being not supported.

DataFrame.fromCSV('/home/mousse/Desktop/tmp/titanic.csv').then(df => df.show())
# Works as expected by providing the absolute path.

But I admit that using d3 to load files with nodejs is a bit tricky (and probably don't handle the different os types). I will try to detect when dataframe-js is used in nodejs (and when it's not a http url) in order to use the fs module instead of d3.

Can you give me your os specifications in order to test my hotfix ?

Stay tuned.

@blakewest uh I have look your absolute path "Users/blakewest/code/project1/raw_data/pro_record_db.csv" but it doesn't look like an absolute path (an absolute path start from the root dir \ on windows or / on unix system). If you are on linux, use pwd (or event $PWD) to get your absolute path to your current directory.

Can you check this point ?

@Gmousse you're right. I had done pwd actually, but forgot to put the / at the beginning. When I did that, the file did load, though it took a really long time. Do you have tools for dealing with large CSV's? It's not clear from the docs what kind of performance or space savings you would get from using dataframe-js. I assume immutability helps on this, but I'm curious if you have benchmarks or rules of thumb for what kind of datasizes the library can handle, or how to handle larger datasizes in node. Thanks! - Blake