Update to_parquet/pyarrow tests
Closed this issue · 12 comments
This issue will have a bit of trickiness on setting up locally the latest (master
branch) version of arrow/pyarrow, but should be easy other than that:
Please add a comment to the original issue (also here) to claim it if you plan to work on it.
I'd like to work on this
@datapythonista Since there are two things that need to be done (update the tests and docs), should these be done in two separate branches or just one?
@datapythonista also, I've setup the latest version of arrow/pyarrow in my local, however I'm having trouble telling my conda environment to use the latest version of pyarrow. I've been searching for docs/tutorials that might help me with this but to no avail. Do you have any pointers regarding this? Thank you!
@galuhsahid Try giving the pyarrow documentation a read. It has information regarding the development settings via conda.
@TanyaaCJain Did you mean this doc? I've followed through & built pyarrow in its own environment successfully. I should've been clearer - I meant I had trouble telling my pandas-dev conda environment to refer to the pyarrow I've just built, so that my tests in pandas would pass.
@galuhsahid, if I'm understanding correctly what you mean, I think using the PYTHONPATH
environment variable is a good option. In the terminal run:
$ PYTHONPATH=/path/where/pyarrow/module/is/ python
And your python interpreter should be able to import your pyarrow from master. You can also have: PYTHONPATH=/path/to/pyarrow:/path/to/pandas
Let me know if this is not what you needed.
Since there are two things that need to be done (update the tests and docs), should these be done in two separate branches or just one?
It's fine to do it in a single branch/PR (since they are related)
I've followed through & built pyarrow in its own environment successfully
You will need to install pandas and pyarrow in the same environment (technically it might be possible to point to an install in a different environment, but it is not something I would recommend).
What I did locally is install pandas master also in my arrow-dev environment (for the arrow/parquet tests, you don't need all optional dependencies that are installed in the pandas-dev environment)
@galuhsahid What about trying code from this doc? Somehow, the html version does not have the same code as that in the "Developing with conda - Environment Setup and Build" topic in the pdf version. The code in here does what @jorisvandenbossche is talking about.
@datapythonista @jorisvandenbossche @TanyaaCJain I ended up following @jorisvandenbossche's approach which works for me. Thanks a lot for the help everyone! I'll make sure to add this to our learning points as well.
@galuhsahid would you like to try to fix https://issues.apache.org/jira/browse/ARROW-6302 as well?
It's in C++, so that's yet something else as contributing to pandas, but if you are interested (certainly don't feel obliged!), I can certainly help a bit where needed.
@jorisvandenbossche Sure, I'd like to fix that as well. I'll ask you if I get stuck on something (if you don't mind)
Great ;) Don't hesitate to ask any questions! (I am no C++ expert, but I start to be a bit familiar with the Arrow codebase)