22-hours/cabrita

Translation scripts stops after a few minutes

Opened this issue ยท 7 comments

After 1-4% the translation script stops changing numbers and looks like this:

grafik

I experimented with lower numbers of parallel calls, but nothing really seems to help.

Any ideas? Could this be related to the rate limits? Is it just a visual issue?

I have not used this script, but OpenAI is having issues with their API service for the last few days, including today.

After 1-4% the translation script stops changing numbers and looks like this:

grafik

I experimented with lower numbers of parallel calls, but nothing really seems to help.

Any ideas? Could this be related to the rate limits? Is it just a visual issue?

did you managed to solve this problem?

No, right now I wait until the AlpacaDataCleaned progresses further, then I'll try again.

Aciid commented

Exceptions blocking the task execution is the most probable cause of this issue. The ratelimits are per account, typically very small 90000 tokens / minute. Start with 10 threads. Had this problem for myself and traced it.

You can add simple handling for to translate_text. I'm not sure are ratelimited requests billed in openai's API. I suggest testing the translation script with a smaller chunk. 0-100 for example.

Here is what you need to catch openai.error.RateLimitError

def translate_text(value):
    try:
        response = openai.ChatCompletion.create(
            model="gpt-3.5-turbo",
            messages=[
                    {"role": "system", "content": "You are a helpful assistant."},
                    {"role": "user", "content": f"Translate the following text to Finnish: '{value}'"},
                ],
            max_tokens=1024,
            temperature=0,
            )
        return response.choices[0]["message"]["content"].strip()
    except openai.error.RateLimitError as e:
        print("Rate limit error: ", e)
        pass

This will lead to null returned instead of the translated string so you may prefer rewriting the code to retry the request or to skip the whole prompt from being appended if one of the instructions is null.

But please check the billing terms of how execption request are handled before adding retry to the requests in openai first, this may become very expensive other vice.


Retrying the request with backoff on exception as per openai's official documentation which I've implemented and using now.

import backoff
...
@backoff.on_exception(backoff.expo, openai.error.RateLimitError)
def translate_text(value):
    response = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=[
                {"role": "system", "content": "You are a helpful assistant."},
                {"role": "user", "content": f"Translate the following text to Finnish: '{value}'"},
            ],
        max_tokens=1024,
        temperature=0,
        )
    return response.choices[0]["message"]["content"].strip()


I'm being ratelimited so much that even using only 10 threads it takes ~5 minutes to translate 350 complete instruction prompts with system, user and output. Insane.

Br;

Thanks for the analysis, very interesting. Maybe just paying for the Google Translate or DeepL API instead could be a better option?

Aciid commented

Thanks for the analysis, very interesting. Maybe just paying for the Google Translate or DeepL API instead could be a better option?

Hi, I'm unfamiliar with the these other translate services, but I'll surely look them up. I think if any of them have by default better rate limits they are better worth if they are priced accordingly. For me 1000 alpaca-lora cleaned promprts was around $0.75-1 with openai while debugging this code, but the rate-limit is killing producitvity to accomplish this in the matter of time I could with threads.

I created a whole rewrite of the code that now translates the whole prompt object string values, but not the keys and is tuned for performance. This is still slow 100-150/minute, due to spurious rate-limits on openai "model busy"/"ratelimit reached" then falling to exponential backoff timer.


import openai
import json
from concurrent.futures import ThreadPoolExecutor, as_completed
from threading import Lock
from concurrent.futures import ThreadPoolExecutor
from concurrent.futures import as_completed

# from tqdm import tqdm
import backoff

# Replace 'your_api_key' with your actual API key
openai.api_key = "your-key-here"


def backoff_hdlr(details):
    print(
        "Backing off {wait:0.1f} seconds after {tries} tries "
        "calling function {target} with args {args} and kwargs "
        "{kwargs}".format(**details)
    )


@backoff.on_exception(
    backoff.expo, openai.error.RateLimitError, max_tries=50, on_backoff=backoff_hdlr
)
def translate_text(value):
    response = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=[
            {
                "role": "system",
                "content": "Translate the following JSON objects string values into Finnish, do not translate string "
                           "keys. No english string values are accepted.",
            },
            {
                "role": "user",
                "content": f"{json.dumps(value)}'",
            },
        ],
        max_tokens=1024,
        temperature=0,
    )
    return response.choices[0]["message"]["content"].strip()


def translate_item(item):
    translated_item = translate_text(item)
    return translated_item


# Maximum number of parallel requests
MAX_PARALLEL_REQUESTS = 25

# Assuming the input JSON is in a file named 'input.json'
with open("../data/alpaca_data_cleaned_archive.json", "r") as f:
    data = json.load(f)

lock = Lock()
start = 0
end = 51785
translated_data = []
data = data[start:end]
tasks_completed = 0


# simple progress indicator callback function
def progress_indicator(future):
    global lock, tasks_completed
    # obtain the lock
    with lock:
        try:
            translated_data.append(json.loads(future.result()))
            # update the counter
            tasks_completed += 1
            # report progress
            print(
                f"{tasks_completed}/{len(futures)} completed, {len(futures) - tasks_completed} remain."
            )

            # if tasks_completed is divisable by 1000 save snapshot
            if tasks_completed % 1000 == 0:
                # Save the translated data to a new JSON file named 'translated_data.json'
                with open(f"translated_data_up_to_{start}_to_{end}.json", "w") as f:
                    json.dump(translated_data, f, ensure_ascii=False, indent=4)

                print(
                    f"Translation snapshot is saved in 'translated_data_from_{start}_to_{tasks_completed}.json'"
                )
        except Exception as e:
            print(f"exception: {e}")
            pass


with ThreadPoolExecutor(max_workers=MAX_PARALLEL_REQUESTS) as executor:
    futures = {executor.submit(translate_item, item): item for item in data}

    for future in futures:
        future.add_done_callback(progress_indicator)

# Finally save the translated data to a new JSON file named 'translated_data.json'
with open(f"translated_data_up_to_{start}_to_{end}.json", "w") as f:
    json.dump(translated_data, f, ensure_ascii=False, indent=4)

print(
    f"Translation complete. The translated data is saved in 'translated_data_from_{start}_to_{end}.json'"
)

Aciid commented

DeepL API https://www.deepl.com/pro#developer

  • Pay as you go (โ‚ฌ20.00 per 1,000,000 characters)

--
โžœ wc -c alpaca_data_cleaned_archive.json
22,680,910 alpaca_data_cleaned_archive.json

This is actually much more expensive : D