vocodedev/vocode-core

[Bug]: Function calls do not work for passing parameters or returning results

rjheeta opened this issue · 3 comments

Brief Description

Parameters are not being passed to functions (Actions), nor are functions (Actions) returning results.

I've attached a custom (and very simple) action called GetCompanyDirectory that is just supposed to return static JSON.

There are embedded print lines that attempt to show the parameters passed, and the results before sending back. The parameters are not printed, which tells me they are not being passed into the function correctly? Similarly, while the result is printed here, it does not appear to be returned back to the caller correctly.

from typing import Optional, Type
from pydantic import BaseModel, Field
from vocode.streaming.action.base_action import BaseAction
from vocode.streaming.models.actions import (
    ActionConfig,
    ActionInput,
    ActionOutput,
    ActionType,
)


class GetCompanyDirectoryActionConfig(ActionConfig, type=ActionType.GET_COMPANY_DIRECTORY):
    pass


class GetCompanyDirectoryParameters(BaseModel):
    first_name: str = Field(..., description="First name of the user")
    last_name: Optional[str] = Field(..., description="Last name of the user")


class GetCompanyDirectoryResponse(BaseModel):
    company_directory : list[dict]


class GetCompanyDirectory(
    BaseAction[
        GetCompanyDirectoryActionConfig,
        GetCompanyDirectoryParameters,
        GetCompanyDirectoryResponse
    ]
):
    description: str = "Returns a list of users in the company directory (first name, last name) \
        and phone number)."
    parameters_type: Type[GetCompanyDirectoryParameters] = GetCompanyDirectoryParameters
    response_type: Type[GetCompanyDirectoryResponse] = GetCompanyDirectoryResponse
    
    def lookup_user(self):
        return [
            {'first_name': 'John', 'last_name': 'Doe', 'phone_number': '555-888-9999'},
            {'first_name': 'James', 'last_name': 'Smith', 'phone_number': '444-111-8888'},
            {'first_name': 'David', 'last_name': 'Thomas', 'phone_number': '333-222-1234'}
        ]

    async def run(
        self, action_input: ActionInput[GetCompanyDirectoryParameters]
    ) -> ActionOutput[GetCompanyDirectoryResponse]:
        """
        Returns a list of users in the company directory (first name, last name, and phone number)
        Pass parameters as a pipe-separated list like <first_name>|<last_name>
        """

        # Note: We're not actually doing anything with the params here. This is
        # just a proof of concept to show params are not being passed
        print('****** Lookup User ******')
        print('Parameters passed:', action_input.params)

        result = self.lookup_user()

        print('Result:', result)

        return ActionOutput(
            action_type=self.action_config.type,
            response=GetCompanyDirectoryResponse(
                company_directory=result
            ),
        )

LLM

GPT-4

Transcription Services

Deepgram

Synthesis Services

Eleven Labs

Telephony Services

Twilio

Conversation Type and Platform

Real-time streaming / Twilio

Steps to Reproduce

  1. Copy the code above and place into /streaming/action/get_company_directory.py
  2. Modify action/factory.py to include an elif clause for GetCompanyDirectoryActionConfig
from vocode.streaming.action.base_action import BaseAction
from vocode.streaming.action.nylas_send_email import (
    NylasSendEmail,
    NylasSendEmailActionConfig,
)
from vocode.streaming.action.get_company_directory import (
    GetCompanyDirectory,
    GetCompanyDirectoryActionConfig,
)
from vocode.streaming.models.actions import ActionConfig
from vocode.streaming.action.transfer_call import TransferCall, TransferCallActionConfig

class ActionFactory:
    def create_action(self, action_config: ActionConfig) -> BaseAction:
        if isinstance(action_config, NylasSendEmailActionConfig):
            return NylasSendEmail(action_config, should_respond=True)
        elif isinstance(action_config, GetCompanyDirectoryActionConfig):
            return GetCompanyDirectory(action_config, should_respond=True)
        elif isinstance(action_config, TransferCallActionConfig):
            return TransferCall(action_config)
        else:
            raise Exception("Invalid action type")
  1. Modify models/actions.py to include GET_COMPANY_DIRECTORY
# Rest of code here...

class ActionType(str, Enum):
    BASE = "action_base"
    NYLAS_SEND_EMAIL = "action_nylas_send_email"
    TRANSFER_CALL = "action_transfer_call"
    GET_COMPANY_DIRECTORY = "action_get_company_directory"

# Rest of code here...
  1. In your main app file, include your action in your call
telephony_server = TelephonyServer(
    base_url=BASE_URL,
    config_manager=config_manager,
    inbound_call_configs=[
        TwilioInboundCallConfig(
            url="/vocode",
            agent_config=ChatGPTAgentConfig(
                initial_message=BaseMessage(text=prompt),
                end_conversation_on_goodbye=False,
                send_filler_audio=FillerAudioConfig(silence_threshold_seconds=0.5),
                prompt_preamble=preamble,
                temperature=0,
                model_name="gpt-4",
                actions=[
                    GetCompanyDirectoryActionConfig(),
                ]
            ),
            synthesizer_config=ElevenLabsSynthesizerConfig.from_telephone_output_device(
                voice_id=voice_id,
                api_key="xxx"
            ),
            twilio_config=TwilioConfig(
                account_sid="xxx",
                auth_token="xxx",
            )
        )
    ],
    logger=logger,
)

Expected Behavior

It should correctly pass parameters into the function, and the function should return the JSON so that the main caller can parse that JSON and give the correct phone number.

Screenshots

See output log below. Draw your attention to a few key items:

  1. I provide the employee's first & last name
  2. The "Parameters passed: " print statement does not print anything, telling me that the parameters were not passed to the function
  3. While the Action body prints the JSON, I suspect it's not correctly being returned either (much like parameters are not passed into it) because the main LLM states it cannot find the user.

Note that I am using the latest version (git clone of the main branch)

INFO:     Started server process [96703]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:3000 (Press CTRL+C to quit)
INFO:     35.174.106.67:0 - "POST /vocode HTTP/1.1" 200 OK
INFO:     ('3.85.62.132', 0) - "WebSocket /connect_call/s9VwDrnr5ZHjxG1bP5g-5Q" [accepted]
DEBUG:__main__:Phone WS connection opened for chat s9VwDrnr5ZHjxG1bP5g-5Q
DEBUG:__main__:[s9VwDrnr5ZHjxG1bP5g-5Q] Trying to attach WS to outbound call
DEBUG:__main__:[s9VwDrnr5ZHjxG1bP5g-5Q] Attached WS to outbound call
INFO:     connection open
DEBUG:__main__:[s9VwDrnr5ZHjxG1bP5g-5Q] Media WS: Received event 'start': {"event":"start","sequenceNumber":"1","start":{"accountSid":"REDACTED","streamSid":"MZe87d0cc727d8a13f8a718e173865ad71","callSid":"CA764f2327e7f6dd93d43e76af8dfa8347","tracks":["inbound"],"mediaFormat":{"encoding":"audio/x-mulaw","sampleRate":8000,"channels":1},"customParameters":{}},"streamSid":"MZe87d0cc727d8a13f8a718e173865ad71"}
DEBUG:__main__:[s9VwDrnr5ZHjxG1bP5g-5Q] Filling 1008 bytes of silence
DEBUG:__main__:[s9VwDrnr5ZHjxG1bP5g-5Q] Synthesizing speech for message
DEBUG:__main__:[s9VwDrnr5ZHjxG1bP5g-5Q] Sent chunk 0 with size 8000
DEBUG:__main__:[s9VwDrnr5ZHjxG1bP5g-5Q] Sent chunk 1 with size 4121
DEBUG:__main__:[s9VwDrnr5ZHjxG1bP5g-5Q] Message sent: Hi, how can I direct your call?
INFO:__main__:[s9VwDrnr5ZHjxG1bP5g-5Q] Ignoring empty transcription
INFO:__main__:[s9VwDrnr5ZHjxG1bP5g-5Q] Ignoring empty transcription
DEBUG:__main__:[s9VwDrnr5ZHjxG1bP5g-5Q] Got transcription:  Hi. Can I speak to James Smith, please?, confidence: 0.9992042
DEBUG:__main__:[s9VwDrnr5ZHjxG1bP5g-5Q] Human started speaking
DEBUG:__main__:[s9VwDrnr5ZHjxG1bP5g-5Q] Responding to transcription
DEBUG:__main__:[s9VwDrnr5ZHjxG1bP5g-5Q] Sending filler audio
DEBUG:__main__:[s9VwDrnr5ZHjxG1bP5g-5Q] No filler audio available for synthesizer
DEBUG:__main__:[s9VwDrnr5ZHjxG1bP5g-5Q] Synthesizing speech for message
****** Lookup User ******
Parameters passed: 
Result: [{'first_name': 'John', 'last_name': 'Doe', 'phone_number': '555-888-9999'}, {'first_name': 'James', 'last_name': 'Smith', 'phone_number': '444-111-8888'}, {'first_name': 'David', 'last_name': 'Thomas', 'phone_number': '333-222-1234'}]
DEBUG:__main__:[s9VwDrnr5ZHjxG1bP5g-5Q] Responding to transcription
DEBUG:__main__:[s9VwDrnr5ZHjxG1bP5g-5Q] Synthesizing speech for message
DEBUG:__main__:[s9VwDrnr5ZHjxG1bP5g-5Q] Synthesizing speech for message
DEBUG:__main__:[s9VwDrnr5ZHjxG1bP5g-5Q] Sent chunk 0 with size 8000
DEBUG:__main__:[s9VwDrnr5ZHjxG1bP5g-5Q] Sent chunk 1 with size 8000
DEBUG:__main__:[s9VwDrnr5ZHjxG1bP5g-5Q] Sending filler audio
DEBUG:__main__:[s9VwDrnr5ZHjxG1bP5g-5Q] No filler audio available for synthesizer
DEBUG:__main__:[s9VwDrnr5ZHjxG1bP5g-5Q] Synthesizing speech for message
DEBUG:__main__:[s9VwDrnr5ZHjxG1bP5g-5Q] Sent chunk 2 with size 8000
DEBUG:__main__:[s9VwDrnr5ZHjxG1bP5g-5Q] Sent chunk 3 with size 3795
DEBUG:__main__:[s9VwDrnr5ZHjxG1bP5g-5Q] Message sent: Sure, let me check the company directory for James Smith's phone number.
DEBUG:__main__:[s9VwDrnr5ZHjxG1bP5g-5Q] Synthesizing speech for message
DEBUG:__main__:[s9VwDrnr5ZHjxG1bP5g-5Q] Sent chunk 0 with size 6897
DEBUG:__main__:[s9VwDrnr5ZHjxG1bP5g-5Q] Message sent: Just a moment.
DEBUG:__main__:[s9VwDrnr5ZHjxG1bP5g-5Q] Sent chunk 0 with size 8000
DEBUG:__main__:[s9VwDrnr5ZHjxG1bP5g-5Q] Sent chunk 1 with size 8000
DEBUG:__main__:[s9VwDrnr5ZHjxG1bP5g-5Q] Sent chunk 2 with size 7615
DEBUG:__main__:[s9VwDrnr5ZHjxG1bP5g-5Q] Message sent: Let me check the company directory for James Smith's phone number.
DEBUG:__main__:[s9VwDrnr5ZHjxG1bP5g-5Q] Sent chunk 0 with size 8000
DEBUG:__main__:[s9VwDrnr5ZHjxG1bP5g-5Q] Sent chunk 1 with size 8000
DEBUG:__main__:[s9VwDrnr5ZHjxG1bP5g-5Q] Sent chunk 2 with size 8000
DEBUG:__main__:[s9VwDrnr5ZHjxG1bP5g-5Q] Sent chunk 3 with size 6302
DEBUG:__main__:[s9VwDrnr5ZHjxG1bP5g-5Q] Message sent: I'm sorry, but I couldn't find anyone by the name of James Smith in our directory.
DEBUG:__main__:[s9VwDrnr5ZHjxG1bP5g-5Q] Sent chunk 0 with size 8000
DEBUG:__main__:[s9VwDrnr5ZHjxG1bP5g-5Q] Sent chunk 1 with size 7883
DEBUG:__main__:[s9VwDrnr5ZHjxG1bP5g-5Q] Message sent: Is there anyone else you would like to speak to?
DEBUG:__main__:[s9VwDrnr5ZHjxG1bP5g-5Q] Media WS: Received event 'stop': {"event":"stop","sequenceNumber":"1256","streamSid":"MZe87d0cc727d8a13f8a718e173865ad71","stop":{"accountSid":"REDACTED","callSid":"CA764f2327e7f6dd93d43e76af8dfa8347"}}
DEBUG:__main__:[s9VwDrnr5ZHjxG1bP5g-5Q] Stopping...
DEBUG:__main__:[s9VwDrnr5ZHjxG1bP5g-5Q] Terminating check_for_idle Task
DEBUG:__main__:[s9VwDrnr5ZHjxG1bP5g-5Q] Tearing down synthesizer
DEBUG:__main__:[s9VwDrnr5ZHjxG1bP5g-5Q] Terminating agent
DEBUG:__main__:[s9VwDrnr5ZHjxG1bP5g-5Q] Terminating output device
DEBUG:__main__:[s9VwDrnr5ZHjxG1bP5g-5Q] Terminating speech transcriber
DEBUG:__main__:[s9VwDrnr5ZHjxG1bP5g-5Q] Terminating transcriptions worker
DEBUG:__main__:[s9VwDrnr5ZHjxG1bP5g-5Q] Terminating final transcriptions worker
DEBUG:__main__:[s9VwDrnr5ZHjxG1bP5g-5Q] Terminating synthesis results worker
DEBUG:__main__:[s9VwDrnr5ZHjxG1bP5g-5Q] Terminating filler audio worker
DEBUG:__main__:[s9VwDrnr5ZHjxG1bP5g-5Q] Terminating actions worker
DEBUG:__main__:[s9VwDrnr5ZHjxG1bP5g-5Q] Successfully terminated
DEBUG:__main__:Phone WS connection closed for chat s9VwDrnr5ZHjxG1bP5g-5Q
DEBUG:__main__:[s9VwDrnr5ZHjxG1bP5g-5Q] Terminating Deepgram transcriber sender
INFO:     connection closed
^CINFO:     Shutting down
INFO:     Waiting for application shutdown.
INFO:     Application shutdown complete.
INFO:     Finished server process [96703]
^C

Here's an experiment I tried.

Modify the process method in streaming/action/worker.py to include some print statements about the return value of running an action's run method:

    async def process(self, item: InterruptibleEvent[ActionInput]):
        action_input = item.payload
        action = self.action_factory.create_action(action_input.action_config)
        action.attach_conversation_state_manager(self.conversation_state_manager)
        action_output = await action.run(action_input)

        # **** Start of my additions ****
        print(f'Type of action is: {type(action)}') 
        print(f'Action output: {action_output}') 

        # This should print GetCompanyDirectoryResponse but it prints BaseModel
        print(f'Action output response type: {type(action_output.response)}') 

        # This should print the JSON, but it doesn't
        print(f'Action output response: {action_output.response}') 
        # **** End of my additions ****

        self.produce_interruptible_event_nonblocking(
            ActionResultAgentInput(
                conversation_id=action_input.conversation_id,
                action_input=action_input,
                action_output=action_output,
                vonage_uuid=action_input.vonage_uuid
                if isinstance(action_input, VonagePhoneCallActionInput)
                else None,
                twilio_sid=action_input.twilio_sid
                if isinstance(action_input, TwilioPhoneCallActionInput)
                else None,
                is_quiet=action.quiet,
            )
        )

I've included the relevant parts of the log when this is used

...
DEBUG:__main__:[oyOPrK-HyOSAl665kX40eA] Message sent: Hello, thank you for calling Acme. How can I help you?
INFO:__main__:[oyOPrK-HyOSAl665kX40eA] Ignoring empty transcription
DEBUG:__main__:[oyOPrK-HyOSAl665kX40eA] Got transcription:  Yeah. Can I speak to Mike., confidence: 0.9946289
DEBUG:__main__:[oyOPrK-HyOSAl665kX40eA] Human started speaking
DEBUG:__main__:[oyOPrK-HyOSAl665kX40eA] Responding to transcription
DEBUG:__main__:[oyOPrK-HyOSAl665kX40eA] Sending filler audio
DEBUG:__main__:[oyOPrK-HyOSAl665kX40eA] No filler audio available for synthesizer
DEBUG:__main__:[oyOPrK-HyOSAl665kX40eA] Synthesizing speech for message
****** Lookup User ******
Parameters passed: 
Result: [{'first_name': 'John', 'last_name': 'Doe', 'phone_number': '555-888-9999'}, {'first_name': 'James', 'last_name': 'Smith', 'phone_number': '444-111-8888'}, {'first_name': 'David', 'last_name': 'Thomas', 'phone_number': '333-222-1234'}]
Type of action is: <class 'vocode.streaming.action.get_company_directory.GetCompanyDirectory'>
Action output: action_type='action_get_company_directory' response=BaseModel()
Action output response type: <class 'pydantic.v1.main.BaseModel'>
Action output response:
DEBUG:__main__:[oyOPrK-HyOSAl665kX40eA] Responding to transcription
DEBUG:__main__:[oyOPrK-HyOSAl665kX40eA] Sending filler audio
DEBUG:__main__:[oyOPrK-HyOSAl665kX40eA] No filler audio available for synthesizer
DEBUG:__main__:[oyOPrK-HyOSAl665kX40eA] Synthesizing speech for message
DEBUG:__main__:[oyOPrK-HyOSAl665kX40eA] Sent chunk 0 with size 8000
DEBUG:__main__:[oyOPrK-HyOSAl665kX40eA] Synthesizing speech for message
DEBUG:__main__:[oyOPrK-HyOSAl665kX40eA] Sent chunk 1 with size 6471
DEBUG:__main__:[oyOPrK-HyOSAl665kX40eA] Message sent: Sure, let me find Mike for you.
DEBUG:__main__:[oyOPrK-HyOSAl665kX40eA] Sent chunk 0 with size 8000
DEBUG:__main__:[oyOPrK-HyOSAl665kX40eA] Sent chunk 1 with size 8000
DEBUG:__main__:[oyOPrK-HyOSAl665kX40eA] Sent chunk 2 with size 7537
DEBUG:__main__:[oyOPrK-HyOSAl665kX40eA] Message sent: I'm sorry, but there are multiple employees named Mike.
DEBUG:__main__:[oyOPrK-HyOSAl665kX40eA] Sent chunk 0 with size 8000
DEBUG:__main__:[oyOPrK-HyOSAl665kX40eA] Sent chunk 1 with size 8000
^CINFO:     Shutting down

Or more specifically

****** Lookup User ******
Parameters passed: 
Result: [{'first_name': 'John', 'last_name': 'Doe', 'phone_number': '555-888-9999'}, {'first_name': 'James', 'last_name': 'Smith', 'phone_number': '444-111-8888'}, {'first_name': 'David', 'last_name': 'Thomas', 'phone_number': '333-222-1234'}]
Type of action is: <class 'vocode.streaming.action.get_company_directory.GetCompanyDirectory'>
Action output: action_type='action_get_company_directory' response=BaseModel()
Action output response type: <class 'pydantic.v1.main.BaseModel'>
Action output response:

A few things to point out

  1. For the line printing Action output response:, I am expecting it to return the JSON, but it does not return any output. This explains why the LLM does not get the data
  2. For the line printing Action output response type:, I am not sure why it's printing the type as pydantic.v1.main.BaseModel. I am expecting this to be vocode.streaming.action.get_company_directory.GetCompanyDirectoryResponse shouldn't it?

I am not clear if these help, but just thought I would share.

More investigations

I included some debug statements in the create_action_input method in streaming/action/base_action.py and it shows how params type and contents do not persist after creating the ActionInput object.

    def create_action_input(
        self,
        conversation_id: str,
        params: Dict[str, Any],
        user_message_tracker: Optional[asyncio.Event] = None,
    ) -> ActionInput[ParametersType]:
        if "user_message" in params:
            del params["user_message"]

        print(f'** PRE Type of transformed_params: {type(self.parameters_type(**params))}')
        print(f'** PRE Contents of transformed_params: {self.parameters_type(**params)}')

        result = ActionInput(
            action_config=self.action_config,
            conversation_id=conversation_id,
            params=self.parameters_type(**params),
            user_message_tracker=user_message_tracker,
        )
        
        print(f'** POST params content: {result.params}')
        print(f'** POST params content type: {type(result.params)}')
        print(f'** POST ActionInput full object: {result}')
        return result

Below is the relevant output

Note: I am using a different (simpler) action class here which is just passing a phone number to a function to send an SMS through Twilio. I can share the class if you need. However, the idea I want to draw focus to is how the parameters & parameter types seemingly change after creating the ActionInput object.

** Before create_action_input. params: {'recipient_phone': '+15555551234', 'user_message': 'Alright, I will page them for you.'}
** PRE Type of transformed_params: <class 'vocode.streaming.action.twilio_send_sms.TwilioSendSmsParameters'>
** PRE Contents of transformed_params: recipient_phone='+15555551234'
** POST params content: 
** POST params content type: <class 'pydantic.v1.main.BaseModel'>
** POST ActionInput full object: action_config=TwilioSendSmsActionConfig() conversation_id='xsM_yF4O5DpiSchPzyAtCw' params=BaseModel() user_message_tracker=<asyncio.locks.Event object at 0x17ef06e90 [unset]>

Why is it (correctly) of type vocode.streaming.action.twilio_send_sms.TwilioSendSmsParameters before the call, and then changes to pydantic.v1.main.BaseModel after the call?

The only way I could get the parameters to persist was to modify the ActionInput class (see streaming/modles/actions.py) to include an __init__() method where I explicitly reset the params after super() is called. See here:

class ActionInput(BaseModel, Generic[ParametersType]):
    action_config: ActionConfig
    conversation_id: str
    params: ParametersType
    user_message_tracker: Optional[asyncio.Event] = None

    def __init__(self, **data):
        params_data = data.get('params')
        super().__init__(**data) # <-- This is what's resetting the params...
        self.params = params_data

    class Config:
        arbitrary_types_allowed = True

(Note: I'm not suggesting this is correct; just sharing data)

Closing issue. Sorry, this was my bad. I was using an out of date example. The issue was to use pydantic.v1 not pydantic. So the following fixed it.

from pydantic.v1 import BaseModel, Field