anhaidgroup/deepmatcher

Direct inference on pandas dataframe

Opened this issue · 2 comments

Hi,
I see that to make a new inference everytime, I have to save a seperate CSV and then load it by providing path to dm.data.process_unlabeled

Is there a way to directly pass pandas dataframe to this function and perform inference without creating a new csv

@sidharthms could you please assist with this ?

Hey @rbhatia46

It's possible to handle pandas.DataFrame by modifying MatchingDataset.__init__ and process_unlabeled (I've tried on a fork of the project). Should I make a PR @sidharthms or it's out of scope ?

To make it work without changing the source code you could also use a temporary file:

import os
import tempfile

import pandas as pd
import deepmatcher as dm

def run_prediction(df, model, **kwargs):
    fd, path = tempfile.mkstemp()
    try:
        with os.fdopen(fd, 'w') as tmp:
            tmp.write(df.to_csv(None, index=False))
        unlabeled = dm.data.process_unlabeled(path=path, trained_model=model)
        predictions = model.run_prediction(unlabeled, **kwargs)
    finally:
        os.remove(path)
        return predictions

Then

model = dm.MatchingModel()
model.load_state('path/to/model.pth')
df = pd.DataFrame({
    "id": [0], "left_name": ["surname"], "right_name": ["name surname"]
})

run_prediction(df, model, output_attributes=True)