Not possible to create a post request with more than 268 strings for create_embeddings()?
atantos opened this issue · 2 comments
Hi there.
It seems there is a time or string limit for doing post requests? Although I am able to do a one time request for 1000 strings out of the overview column in R, I cannot do more than 268 right now. Is it an issue with the package or am I missing something?
Thanks!
using CSV, DataFrames, OpenAI
horror_movies = CSV.read(Downloads.download("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-11-01/horror_movies.csv"), DataFrame);
r = create_embeddings(
ENV["OPENAI_API_KEY"],
horror_movies.overview[1:268],
"text-embedding-ada-002"
)
Here is the error message I get:
{
"error": {
"message": "'$.input' is invalid. Please check the API reference: https://platform.openai.com/docs/api-reference.",
"type": "invalid_request_error",
"param": null,
"code": null
}
}
Although horror_movies.overview is a string vector..
UPDATE I: I try different vector sizes and it seems there is no hard upper bound for the string vector size. I just managed to get 700 string of horror_movies.overview with horror_movies.overview[1:700]. Is there something that we as users should know or is it simply random luck related to the traffic limits that their server puts?
UPDATE II: However, in R with the following code written by Julia Silge it works every single time for all the 1000 overview texts:
library(tidyverse)
set.seed(123)
horror_movies <- read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-11-01/horror_movies.csv') %>%
filter(!is.na(overview), original_language == "en") %>%
slice_sample(n = 1000)
library(httr)
embeddings_url <- "https://api.openai.com/v1/embeddings"
auth <- add_headers(Authorization = paste("Bearer", "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"))
body <- list(model = "text-embedding-ada-002", input = horror_movies$overview)
resp <- POST(
embeddings_url,
auth,
body = body,
encode = "json"
)
embeddings <- content(resp, as = "text", encoding = "UTF-8") %>%
jsonlite::fromJSON(flatten = TRUE) %>%
pluck("data", "embedding")
Closing as this is resolved in the discourse post linked above.