Python3?
yitang opened this issue · 17 comments
Hi,
Does pywFM support Python3? I tried to run the example and got this error
model = fm.run(features[:5], target[:5], features[5:], target[5:])
/bin/sh: /usr/local/lib/python3.5/site-packages/pywFM/libfm/bin/libFM: cannot execute binary file
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-8-85e95dc4c2fd> in <module>()
----> 1 model = fm.run(features[:5], target[:5], features[5:], target[5:])
/usr/local/lib/python3.5/site-packages/pywFM/__init__.py in run(self, x_train, y_train, x_test, y_test)
184 # parses rlog into
185 import pandas as pd
--> 186 rlog = pd.read_csv(rlog_path, sep='\t')
187
188 # removes temporary output file after using
/usr/local/lib/python3.5/site-packages/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, dialect, compression, doublequote, escapechar, quotechar, quoting, skipinitialspace, lineterminator, header, index_col, names, prefix, skiprows, skipfooter, skip_footer, na_values, true_values, false_values, delimiter, converters, dtype, usecols, engine, delim_whitespace, as_recarray, na_filter, compact_ints, use_unsigned, low_memory, buffer_lines, warn_bad_lines, error_bad_lines, keep_default_na, thousands, comment, decimal, parse_dates, keep_date_col, dayfirst, date_parser, memory_map, float_precision, nrows, iterator, chunksize, verbose, encoding, squeeze, mangle_dupe_cols, tupleize_cols, infer_datetime_format, skip_blank_lines)
496 skip_blank_lines=skip_blank_lines)
497
--> 498 return _read(filepath_or_buffer, kwds)
499
500 parser_f.__name__ = name
/usr/local/lib/python3.5/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
273
274 # Create the parser.
--> 275 parser = TextFileReader(filepath_or_buffer, **kwds)
276
277 if (nrows is not None) and (chunksize is not None):
/usr/local/lib/python3.5/site-packages/pandas/io/parsers.py in __init__(self, f, engine, **kwds)
588 self.options['has_index_names'] = kwds['has_index_names']
589
--> 590 self._make_engine(self.engine)
591
592 def _get_options_with_defaults(self, engine):
/usr/local/lib/python3.5/site-packages/pandas/io/parsers.py in _make_engine(self, engine)
729 def _make_engine(self, engine='c'):
730 if engine == 'c':
--> 731 self._engine = CParserWrapper(self.f, **self.options)
732 else:
733 if engine == 'python':
/usr/local/lib/python3.5/site-packages/pandas/io/parsers.py in __init__(self, src, **kwds)
1101 kwds['allow_leading_cols'] = self.index_col is not False
1102
-> 1103 self._reader = _parser.TextReader(src, **kwds)
1104
1105 # XXX
pandas/parser.pyx in pandas.parser.TextReader.__cinit__ (pandas/parser.c:5030)()
ValueError: No columns to parse from file
Hello,
Since in my current project I'm trapped with python 2.7 , I haven't had time to update this package to Python3. Will add this to the todo list! Feel free to PR this change as well, it shouldn't be that much work!
Thanks for the feedback.
EDIT: I was answering @chezou comment which he deleted.
My intention was not to hardcode the path. In setup.py I clone libFM repo, on that line I was just getting the path of that clone. Are you saying you had to mode the libFM executable manually?
I'm not very experienced with creating python packages so I struggled a bit with this, but within the dockerfiles I've provided, everything works fine.
After posting my post, I found cloning libFM.
I tried to install and uninstall several times with Python 3.5, I found sometime libFM is not executable even if I tried to run libFM on shell script.
Anyway, I didn't except cloning and I think using docker is overspec for pywFM.
Thank you for the comment! Have you installed libFM manually? https://github.com/srendle/libfm Does the make all
succeeds?
Regarding docker, the files are there just to get a dev environment asap. It's not mandatory :)
Yes, I've installed libFM manually, and I can use pyFM at all :)
Perhaps its best if I remove the clone from the package and add manual instructions to install libFM. Then I check for a env variable with the path for the libFM executable (e.g. LIBFM_PATH
)
Do you feel its better this way?
It works. But there are many I/O overheads reading/writing tmp files, I expected to use shared library of libfm.
I also like the shared library idea, but I think that would require me to digg into libFM a bit more, and at this moment I don't have much time. When I built this I was looking at the quickest way to put libFM up and running. I will add this change to the TODO, next time I get a free time window, I'll dig this up a bit.
Meanwhile, even though this does have high I/O overhead, in my current experiments (dealing with Yelp dataset) I haven't had any trouble getting good performance out of this.
Yes, it is works with Python 3.5! Of course you should fix document such as parentheses of print()
.
Finally, I realized it is very hard to execute without system function because libfm doesn't prepare shared library. But this changes makes good for windows user with compiled official binary :)
Great point!
I've updated the README and the simple example with print()
. I think I can close this issue.
Thank you for the input @chezou !
Thanks.
Hi,
I tried to use pywFM on Python3.5.2 (ubuntu16.04), and I could use pywFM after little modified not only print()
but also byte/str type conversion.
Particularly, in __init__.py
, I changed following,
for print()
,
print(os.stat(rlog_fd.name).st_size)
for byte/str type conversion,
if c_githash.decode("utf-8") != GITHASH:
preds = [float(p) for p in out_fd.read().decode("utf-8").split('\n') if p]
for line in model_fd.read().decode("utf-8").splitlines():
just for information.
Would you mind doing a PR with this change?
Thank you for the feedback!
Thank you for reply and nice wrapper!
I will do PR. Because I'm new to github, it may take time :-)
That's alright! I'll wait for it and review it with care!