Update API endpoints
svilupp opened this issue · 7 comments
It would be excellent to add the newly available API endpoints and capabilities that have been announced today, eg,
[ ] Assistants (incl. Code Interpreter): https://platform.openai.com/docs/assistants/overview
[x] Vision (with B64 encoding of images): https://platform.openai.com/docs/guides/vision
[ ] Image generation (with DALL-E 3): https://platform.openai.com/docs/guides/images/usage?context=node
[ ] Audio (TTS/STT): https://platform.openai.com/docs/guides/text-to-speech
[ ] Moderations: https://platform.openai.com/docs/guides/moderation/quickstart
[ ] Files upload: https://platform.openai.com/docs/api-reference/files
[ ] Finetuning: https://platform.openai.com/docs/guides/fine-tuning
Extending current API calls, eg,
[x] JSON mode
[x] Seed
[ ] Logprobs -- not available on the API as of 11th Nov
[ ] Function calling (available through the current interface but we should add some examples here)
Ideally, we would also add some simple examples to the docs.
There are many cool capabilities (eg, function calling, logit-bias-classifiers, ...) that are easy to do with the existing APIs, but hard to access for beginners.
I'm happy to take a stab at it over the weekend if no one is interested!
I can plink away this weekend as well, maybe we can create a branch for every item listed to reduce stepping on toes
Btw I’ll tackle gpt-4 vision first!
Also, I forgot to add the finetuning API - I’ll add that to the list.
A few updates from my end, the functionality I've tried so far does not require any changes to the create_chat
(unless we want to add nicer interfaces).
Vision comprehension
(not image generation)
using OpenAI, Base64
base64_image = open("julia.png", "r") |> base64encode
hist = [
Dict("role" => "user",
"content" => [
Dict("type" => "text", "text" => "Describe the provided image"),
Dict("type" => "image_url",
"image_url" => "data:image/png;base64,$(base64_image)")]),
]
resp = create_chat(api_key, "gpt-4-vision-preview", hist; max_tokens = 300)
resp.response[:choices][1][:message][:content]
# "This is the logo for the programming language Julia. The logo consists of the word \"julia\" written in lowercase black letters. Above the 'i' in \"julia,\" there are four colored circles in blue, green, red, and purple, arranged in a slight upward arc from left to right. The typeface is sans-serif and modern-looking, and the circles give the logo a playful and distinctive appearance. Julia is known as a high-level, high-performance programming language for technical computing."
Notice the array in the "content" key.
We could add a kwarg `image_file, but it would require new dependency on Base64 and it would get complicated if there are multiple messages provided (ie, where would we inject the image_url).
I propose no action here (but some examples in the docs would be probably useful for newcomers!)
JSON mode
Nice utility to enforce JSON output format. You can use the prompt when you need some exact keys/information.
Notice the "JSON" reference in the system message. That's a requirement.
using OpenAI: JSON3
resp = create_chat(api_key,
"gpt-4-1106-preview",
[
Dict("role" => "system",
"content" => "You are a helpful assistant designed to output JSON."),
Dict("role" => "user", "content" => "Who won the world series in 2020?")];
max_tokens = 300,
response_format = Dict("type" => "json_object"))
resp.response[:choices][1][:message][:content] |> JSON3.pretty
{
"World_Series_Winner_2020": {
"Team": "Los Angeles Dodgers",
"League": "Major League Baseball",
"Opponent": "Tampa Bay Rays",
"Result": "Dodgers won 4-2 in a best-of-seven series"
}
}
Seed model parameter
Seed parameter should enhance reproducibility (it's not perfect but it goes much further than just temperature=0
)
Key output to look for here: resp.response[:system_fingerprint]
, which should confirm the deterministically sampled output (if it's the same across calls), but there is still some randomness due to hardware etc.
resp = create_chat(api_key,
"gpt-4-1106-preview",
[Dict("role" => "user", "content" => "say what you're thinking about right now!")];
seed = 123,
temperature = 0)
resp.response[:choices][1][:message][:content]
# "As an AI, I don't have personal thoughts or feelings, but I'm programmed to assist you with any questions or tasks you have in mind. If you're curious about a specific topic or need help with something, feel free to ask!"
resp.response[:system_fingerprint]
# fp_a24b4d720c
# Run it repeatedly and you'll get the same.
Function calling
Ask for structured output that matches your required schema (advantage over "JSON mode"), but there is little bit of extra latency, because of it. It's super useful for data extraction - I use it often for mini-extraction tasks.
(I'm working on a wrapper that would automatically produce the function signature from an output struct provided by the user)
Note: Supported already by the previous GPT 3.5 Turbo and GPT 4, but not available in the GPT4V model.
model = "gpt-4-1106-preview"
functions = [
Dict("name" => "get_current_weather",
"description" => "Get the current weather in a given location",
"parameters" => Dict("type" => "object",
"properties" => Dict("location" => Dict("type" => "string",
"description" => "The city and state, e.g. San Francisco, CA"),
"unit" => Dict("type" => "string",
"enum" => ["celsius", "fahrenheit"])),
"required" => ["location"])),
]
resp = create_chat(api_key, model,
[Dict("role" => "user", "content" => "What is the weather like in Boston in Celsius?")];
functions, function_call = "auto")
# Always defaults to: function_call="auto", function_call="get_current_weather" would enforce the JSON output regardless of the "fit" of the inputs
# When you ask for function_call="auto", function _might_ be called, you can see the finish reason
resp.response[:choices][begin][:finish_reason] # "function_call"
resp.response[:choices][begin][:message][:content] # nothing -> because output was a function call
resp.response[:choices][begin][:message][:function_call] # what "function" from the options provided was called
# {
# "name":"get_current_weather",
# "arguments":"{\"location\":\"Boston, MA\",\"unit\":\"celsius\"}",
# }
To get the arguments only (useful for extraction of data):
resp.response[:choices][begin][:message][:function_call][:arguments] |> JSON3.pretty
# {
# "location": "Boston, MA",
# }
Logprobs model parameter
Adding logprobs=5
should give you 5 tokens with the highest logprob (5 is the maximum allowed, I guess to prevent distillation?).
However, it is not available yet (11th Nov).
EDIT: The specs for function calling have changed! Now it should be referred to as tools.
In addition, the vision API also had a few minor changes.
Catching up after having no time the past week..
As far as JSON mode, what do you think of this? It seems to be at least tangentially related.
BTW, I'm all ears (and approvals 😆 ) on improvements, nicer interfaces, etc. from power users like yourself. I favor a laissez-faire approach in these initial stages, so feel free to open PRs and we can discuss.
I’ve commented on the above mentioned thread. I personally know very little about the streaming use case - everything I do doesn’t need it. I’m familiar with it only for the ChatGPT-like interfaces.
As far as the nicer interfaces go, I am not sure I have any ideas. I think what you’ve done is actually great and allows everyone who wants to build on top of OpenAI API to get started.
Based on my experience and talking to a few people, I think the issue now is awareness and how-tos. IMO, we need to focus on surfacing practical applications and lowering the barrier to entry for first time users, eg, examples, blogs, and potentially also some downstream libraries that can be very opinionated and focus on specific tasks.
That’s why I have bundled up bunch of scripts I had and wrapped them in a library: PromptingTools.jl. The hope is to abstract and re-use prompts and be backend-agnostic (eg, switch between Open AI API or Mistral depending on your needs/tasks). I’m targeting the daily “mini-tasks” that we all have and don’t enjoy.
I have a few more PRs locally but then I hope to change over to churn out bunch of how-to guides. The coverage of GenAI applications in Julia is sooooo poor right now.
You’ll see that some of the Issues I opened today reflect the above beliefs.
I’m keen to add the above APIs, but I don’t have any mini-tasks that need them right now, so I might not get to it for 1-2 weeks.
Agreed on documentation and lowering the barrier to entry, I updated the readme yesterday with an example of overriding the base URL.
I'm starting on the Files item above, then Finetuning because the latter is dependent on the former.