UKGovernmentBEIS/inspect_ai

[Bug] Tool use broken with Google models

Closed this issue · 9 comments

Using a Google model with tool use raises:

TypeError: Tool.__init__() takes 1 positional argument but 2 were given

at:

# src/inspect_ai/model/_providers/google.py
def chat_tools(tools: list[ToolInfo]) -> list[Tool]:
    declarations = [
        FunctionDeclaration(
            name=tool.name,
            description=tool.description,
            parameters=schema_from_param(tool.parameters),
        )
        for tool in tools
    ]
    return [Tool(declarations)]

When I look at the Google provider module, indeed it seems like many of the calls to objects imported from Google use incompatible signatures. They are highlighted by my IDE. mypy doesn't raise anything because these imports have # type: ignore.

Here #696 is one failing test (marked as xfail) for the chat_tools function, but the problem is broader.

I am not happy about the type: ignore but I wasn't able to get types working w/ the google library (looks like there is an issue filed for this here: google-gemini/generative-ai-python#264).

I wonder if the signatures changed with a recent package update? All of my tests pass with:

pytest -k google_tools --runapi
pytest -k google_tool_types --runapi

This is with google-generativeai version 0.8.1 from Sept. 12th. Their release notes for 0.8.2 don't mention any breaking changes -- what version are you running with?

0.8.3 is failing for me. So they put out a breaking upgrade going from 0.8.1 -> 0.8.3? So much for semantic versioning :-(

@tadamcz I addressed the breaking change I observed in google-generativeai 0.8.3 by just changing the Tool invocation. You mentioned though that there are many incorrect invocations? Let me know if you think there are other issues (the Google tool calling tests are now all passing).

I wonder if the signatures changed with a recent package update? [...] 0.8.3 is failing for me. So they put out a breaking upgrade going from 0.8.1 -> 0.8.3? So much for semantic versioning :-(

I suspected it might be a package version problem. But I don't think it's entirely fair to blame the lack of semantic versioning on Google's part here, when inspect_ai doesn't specify any version at all for this dependency, or almost any of its dependencies. Google could have put this out as a major version change, and nothing in your code would have caught that either, right?

In my view, the way this repo handles its dependencies is a major weak point. :) Currently, it doesn't implement best practices that, in my neck of the woods, have been standard for at least 5 years. If you're open to it, I think it would be great to switch to poetry or another modern dependency resolver. I'm happy to handle making the change (including necessary changes to CI workflows etc). But for such a change to the core tooling I'd want your blessing before I make PR.

[EDIT: opened a separate issue for this: https://github.com//issues/699]

You mentioned though that there are many incorrect invocations? Let me know if you think there are other issues (the Google tool calling tests are now all passing).

I'm not sure how best to share these since mypy doesn't raise anything. I'm seeing them in my IDE. These IDE annotations are not always correct, but they were correct in the case of the Tool invocation, so that is some indication that at least some of them may be genuine problems.

Here are a couple of screenshots:

image image

In my view, the way this repo handles its dependencies is a major weak point. :) Currently, it doesn't implement best practices that, in my neck of the woods, have been standard for at least 5 years. If you're open to it, I think it would be great to switch to poetry or another modern dependency resolver. I'm happy to handle making the change (including necessary changes to CI workflows etc). But for such a change to the core tooling I'd want your blessing before I make PR.

Wrote a longer reply in that issue but we are going to avoid upper-bounding like the plauge (as this practice if widely applied will being the entire ecosystem down!). More discussion of this here: https://iscinumpy.dev/post/bound-version-constraints. Packages IMO aren't the right place for narrow version constraints but rather deployments are. Packages need to play well with the requirements of other packages and the solvers that adjudicate all of this -- for us that means lower-bounding only when we have evidence we need to and almost never upper bounding.

I'm not sure how best to share these since mypy doesn't raise anything. I'm seeing them in my IDE. These IDE annotations are not always correct, but they were correct in the case of the Tool invocation, so that is some indication that at least some of them may be genuine problems.

I'm not sure what to make of these type errors. A lot of them seem to convinced that everything should be a dict even though the docs indicate otherwise. Part of the problem may be that since Google hasn't published types there is no good guidance for type checkers and they can easily get confused.

I can say that even w/o types we did look carefully at the underlying source code to ensure we were calling things correctly and that as far as I know we are using the right types in all cases (note that all of our tool calling tests pass and they do AFAIK exercise all of the code).

Resolved with #697

note that all of our tool calling tests pass and they do AFAIK exercise all of the code

I hadn't realised this when I wrote the above. In that case I'm happy to consider this resolved. The remaining IDE annotations were probably just wrong.