Direct inference on pandas dataframe
Opened this issue · 2 comments
rbhatia46 commented
Hi,
I see that to make a new inference everytime, I have to save a seperate CSV and then load it by providing path to dm.data.process_unlabeled
Is there a way to directly pass pandas dataframe to this function and perform inference without creating a new csv
rbhatia46 commented
@sidharthms could you please assist with this ?
etiennekintzler commented
Hey @rbhatia46
It's possible to handle pandas.DataFrame
by modifying MatchingDataset.__init__
and process_unlabeled
(I've tried on a fork of the project). Should I make a PR @sidharthms or it's out of scope ?
To make it work without changing the source code you could also use a temporary file:
import os
import tempfile
import pandas as pd
import deepmatcher as dm
def run_prediction(df, model, **kwargs):
fd, path = tempfile.mkstemp()
try:
with os.fdopen(fd, 'w') as tmp:
tmp.write(df.to_csv(None, index=False))
unlabeled = dm.data.process_unlabeled(path=path, trained_model=model)
predictions = model.run_prediction(unlabeled, **kwargs)
finally:
os.remove(path)
return predictions
Then
model = dm.MatchingModel()
model.load_state('path/to/model.pth')
df = pd.DataFrame({
"id": [0], "left_name": ["surname"], "right_name": ["name surname"]
})
run_prediction(df, model, output_attributes=True)