hsiehjackson/RULER

No Generated Output and JSON Serialization Error when calling llm directly in VLLMClient

yaswanth-iitkgp opened this issue · 2 comments

Description:

I was unable to run the vllm server on my server, so I modified the VLLMClient class in the RULER/scripts/pred/client_wrappers.py file. Below is the modified code for the VLLMClient class:

class VLLMClient(Client):
    def _single_call(
        self,
        prompts,
        tokens_to_generate,
        temperature,
        top_p,
        top_k,
        random_seed,
        stop: List[str],
    ):
        request = {
            "prompt": prompts[0],
            "max_tokens": tokens_to_generate,
            "temperature": temperature,
            "top_k": top_k,
            "top_p": top_p,
            "stop": stop
        }

        from vllm import LLM, SamplingParams
        # Create a sampling params object.
        sampling_params = SamplingParams(temperature=0.0, top_p=0.95) #kept purposefully

        # Create an LLM.
        llm = LLM(model="meta-llama/Meta-Llama-3-8B-Instruct", gpu_memory_utilization=0.6)
        # Generate outputs containing the prompt, generated text, and other information.
        outputs = llm.generate(prompts, sampling_params)
        # Print the outputs.
        for output in outputs:
            prompt = output.prompt
            generated_text = output.outputs[0].text if output.outputs else None
            print(f"Generated text: {generated_text!r}")
        return outputs

When I run the command bash run.sh llama3 synthetic, I do not see any generated text and encounter the following error:

Generated text: ''
Traceback (most recent call last):
  File "/RULER/scripts/pred/call_api.py", line 280, in <module>
    main()
  File "/RULER/scripts/pred/call_api.py", line 276, in main
    fout.write(json.dumps(outputs_parallel[computed_idx]) + '\n')
  File "/anaconda3/envs/ruler/lib/python3.10/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
  File "/anaconda3/envs/ruler/lib/python3.10/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/anaconda3/envs/ruler/lib/python3.10/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
  File "/anaconda3/envs/ruler/lib/python3.10/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type RequestOutput is not JSON serializable

I am trying to get the scores for llama3-8B and then I want to test on llama3-8B-1M context. Could you please help me resolve this issue?

Hi @yaswanth-iitkgp, can you make your _single_call function return a list of string (e.g., [generated_text])?

Hi @hsiehjackson ,
Thanks for the help, but I was able to use the model by just using HF model for now.