JuliaML/OpenAI.jl

Not possible to create a post request with more than 268 strings for create_embeddings()?

atantos opened this issue · 2 comments

Hi there.

It seems there is a time or string limit for doing post requests? Although I am able to do a one time request for 1000 strings out of the overview column in R, I cannot do more than 268 right now. Is it an issue with the package or am I missing something?

Thanks!

using CSV, DataFrames, OpenAI
horror_movies = CSV.read(Downloads.download("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-11-01/horror_movies.csv"), DataFrame);

r = create_embeddings(
        ENV["OPENAI_API_KEY"],
        horror_movies.overview[1:268],
        "text-embedding-ada-002"
    )

Here is the error message I get:

{
  "error": {
    "message": "'$.input' is invalid. Please check the API reference: https://platform.openai.com/docs/api-reference.",
    "type": "invalid_request_error",
    "param": null,
    "code": null
  }
}

Although horror_movies.overview is a string vector..

UPDATE I: I try different vector sizes and it seems there is no hard upper bound for the string vector size. I just managed to get 700 string of horror_movies.overview with horror_movies.overview[1:700]. Is there something that we as users should know or is it simply random luck related to the traffic limits that their server puts?
UPDATE II: However, in R with the following code written by Julia Silge it works every single time for all the 1000 overview texts:

library(tidyverse)

set.seed(123)
horror_movies <- read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-11-01/horror_movies.csv') %>%
  filter(!is.na(overview), original_language == "en") %>%
  slice_sample(n = 1000)

library(httr)
embeddings_url <- "https://api.openai.com/v1/embeddings"
auth <- add_headers(Authorization = paste("Bearer", "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"))
body <- list(model = "text-embedding-ada-002", input = horror_movies$overview)

resp <- POST(
  embeddings_url,
  auth,
  body = body,
  encode = "json"
)

embeddings <- content(resp, as = "text", encoding = "UTF-8") %>%
  jsonlite::fromJSON(flatten = TRUE) %>%
  pluck("data", "embedding")

Please check out the answer here.

Closing as this is resolved in the discourse post linked above.