jfloff/pywFM

Python3?

yitang opened this issue · 17 comments

Hi,

Does pywFM support Python3? I tried to run the example and got this error

model = fm.run(features[:5], target[:5], features[5:], target[5:])
/bin/sh: /usr/local/lib/python3.5/site-packages/pywFM/libfm/bin/libFM: cannot execute binary file
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-8-85e95dc4c2fd> in <module>()
----> 1 model = fm.run(features[:5], target[:5], features[5:], target[5:])

/usr/local/lib/python3.5/site-packages/pywFM/__init__.py in run(self, x_train, y_train, x_test, y_test)
    184         # parses rlog into
    185         import pandas as pd
--> 186         rlog = pd.read_csv(rlog_path, sep='\t')
    187
    188         # removes temporary output file after using

/usr/local/lib/python3.5/site-packages/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, dialect, compression, doublequote, escapechar, quotechar, quoting, skipinitialspace, lineterminator, header, index_col, names, prefix, skiprows, skipfooter, skip_footer, na_values, true_values, false_values, delimiter, converters, dtype, usecols, engine, delim_whitespace, as_recarray, na_filter, compact_ints, use_unsigned, low_memory, buffer_lines, warn_bad_lines, error_bad_lines, keep_default_na, thousands, comment, decimal, parse_dates, keep_date_col, dayfirst, date_parser, memory_map, float_precision, nrows, iterator, chunksize, verbose, encoding, squeeze, mangle_dupe_cols, tupleize_cols, infer_datetime_format, skip_blank_lines)
    496                     skip_blank_lines=skip_blank_lines)
    497
--> 498         return _read(filepath_or_buffer, kwds)
    499
    500     parser_f.__name__ = name

/usr/local/lib/python3.5/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
    273
    274     # Create the parser.
--> 275     parser = TextFileReader(filepath_or_buffer, **kwds)
    276
    277     if (nrows is not None) and (chunksize is not None):

/usr/local/lib/python3.5/site-packages/pandas/io/parsers.py in __init__(self, f, engine, **kwds)
    588             self.options['has_index_names'] = kwds['has_index_names']
    589
--> 590         self._make_engine(self.engine)
    591
    592     def _get_options_with_defaults(self, engine):

/usr/local/lib/python3.5/site-packages/pandas/io/parsers.py in _make_engine(self, engine)
    729     def _make_engine(self, engine='c'):
    730         if engine == 'c':
--> 731             self._engine = CParserWrapper(self.f, **self.options)
    732         else:
    733             if engine == 'python':

/usr/local/lib/python3.5/site-packages/pandas/io/parsers.py in __init__(self, src, **kwds)
   1101         kwds['allow_leading_cols'] = self.index_col is not False
   1102
-> 1103         self._reader = _parser.TextReader(src, **kwds)
   1104
   1105         # XXX

pandas/parser.pyx in pandas.parser.TextReader.__cinit__ (pandas/parser.c:5030)()

ValueError: No columns to parse from file

Hello,

Since in my current project I'm trapped with python 2.7 , I haven't had time to update this package to Python3. Will add this to the todo list! Feel free to PR this change as well, it shouldn't be that much work!

Thanks for the feedback.

EDIT: I was answering @chezou comment which he deleted.

My intention was not to hardcode the path. In setup.py I clone libFM repo, on that line I was just getting the path of that clone. Are you saying you had to mode the libFM executable manually?

I'm not very experienced with creating python packages so I struggled a bit with this, but within the dockerfiles I've provided, everything works fine.

After posting my post, I found cloning libFM.
I tried to install and uninstall several times with Python 3.5, I found sometime libFM is not executable even if I tried to run libFM on shell script.

Anyway, I didn't except cloning and I think using docker is overspec for pywFM.

Thank you for the comment! Have you installed libFM manually? https://github.com/srendle/libfm Does the make all succeeds?

Regarding docker, the files are there just to get a dev environment asap. It's not mandatory :)

Yes, I've installed libFM manually, and I can use pyFM at all :)

Perhaps its best if I remove the clone from the package and add manual instructions to install libFM. Then I check for a env variable with the path for the libFM executable (e.g. LIBFM_PATH)

Do you feel its better this way?

I've separated libFM install from pywFM package. It is now a separated installation step (instructions here).

@chezou could you please test this change? (don't forget to install the latest version 0.9.0, use pip --no-cache-dir install pywFM)

It works. But there are many I/O overheads reading/writing tmp files, I expected to use shared library of libfm.

I also like the shared library idea, but I think that would require me to digg into libFM a bit more, and at this moment I don't have much time. When I built this I was looking at the quickest way to put libFM up and running. I will add this change to the TODO, next time I get a free time window, I'll dig this up a bit.

Meanwhile, even though this does have high I/O overhead, in my current experiments (dealing with Yelp dataset) I haven't had any trouble getting good performance out of this.

@chezou Btw, could you also confirm that this is python3 compatible?

Yes, it is works with Python 3.5! Of course you should fix document such as parentheses of print().

Finally, I realized it is very hard to execute without system function because libfm doesn't prepare shared library. But this changes makes good for windows user with compiled official binary :)

Great point!

I've updated the README and the simple example with print(). I think I can close this issue.

Thank you for the input @chezou !

Thanks.

Hi,

I tried to use pywFM on Python3.5.2 (ubuntu16.04), and I could use pywFM after little modified not only print() but also byte/str type conversion.

Particularly, in __init__.py, I changed following,

for print(),
print(os.stat(rlog_fd.name).st_size)

for byte/str type conversion,
if c_githash.decode("utf-8") != GITHASH:
preds = [float(p) for p in out_fd.read().decode("utf-8").split('\n') if p]
for line in model_fd.read().decode("utf-8").splitlines():

just for information.

Would you mind doing a PR with this change?

Thank you for the feedback!

Thank you for reply and nice wrapper!
I will do PR. Because I'm new to github, it may take time :-)

That's alright! I'll wait for it and review it with care!