koaning/embetter

Issue with OpenAI Encoder

iamyihwa opened this issue · 2 comments

Hello @koaning Thanks for the great package!

Was trying on the openai embedding

Got some error,
first fixed it by loading instead of CohereEncoder, OpenAIEncoder in line 6. (from embetter.external import OpenAIEncoder)

However still getting an error that says openai not defined
I did assign openai.api_key in the code. Not the organization code though, since from openai page, it didn't give me one.

Code

import pandas as pd
from sklearn.pipeline import make_pipeline
from sklearn.linear_model import LogisticRegression
​
from embetter.grab import ColumnGrabber
from embetter.external import OpenAIEncoder
​
import openai
​
# You must run this first!
#openai.organization = OPENAI_ORG
openai.api_key = 'MY_OWN_KEY'

# Let's suppose this is the input dataframe
dataf = pd.DataFrame({
    "text": ["positive sentiment", "super negative"],
    "label_col": ["pos", "neg"]
})

# This pipeline grabs the `text` column from a dataframe
# which then get fed into Cohere's endpoint
text_emb_pipeline = make_pipeline(
    ColumnGrabber("text"),
    OpenAIEncoder()
)
X = text_emb_pipeline.fit_transform(dataf, dataf['label_col'])

# This pipeline can also be trained to make predictions, using
# the embedded features.
text_clf_pipeline = make_pipeline(
    text_emb_pipeline,
    LogisticRegression()
)

# Prediction example
text_clf_pipeline.fit(dataf, dataf['label_col']).predict(dataf)

Error message:

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
~\AppData\Local\Temp\1\ipykernel_24268\2450760316.py in <module>
     10     OpenAIEncoder()
     11 )
---> 12 X = text_emb_pipeline.fit_transform(dataf, dataf['label_col'])
     13 
     14 # This pipeline can also be trained to make predictions, using

~\Miniconda3\envs\SemanticMatching\lib\site-packages\sklearn\pipeline.py in fit_transform(self, X, y, **fit_params)
    432             fit_params_last_step = fit_params_steps[self.steps[-1][0]]
    433             if hasattr(last_step, "fit_transform"):
--> 434                 return last_step.fit_transform(Xt, y, **fit_params_last_step)
    435             else:
    436                 return last_step.fit(Xt, y, **fit_params_last_step).transform(Xt)

~\Miniconda3\envs\SemanticMatching\lib\site-packages\sklearn\base.py in fit_transform(self, X, y, **fit_params)
    853         else:
    854             # fit method of arity 2 (supervised transformation)
--> 855             return self.fit(X, y, **fit_params).transform(X)
    856 
    857 

~\Miniconda3\envs\SemanticMatching\lib\site-packages\embetter\external\_openai.py in transform(self, X, y)
     79         result = []
     80         for b in _batch(X, self.batch_size):
---> 81             resp = openai.Embedding.create(input=X, model=self.model)  # fmt: off
     82             result.extend([_["embedding"] for _ in resp["data"]])
     83         return np.array(result)

NameError: name 'openai' is not defined

Ah. It seems that openai must be imported in that file in order for it to work? That somewhat surprises me but I'll dive in and have a look.

Thanks @koaning ! Works perfect!