Some InferenceClient tasks missing `parameters` argument, inconsistent with task specifications
hanouticelina opened this issue ยท 7 comments
Some task methods in the huggingface_hub.InferenceClient
do not include a parameters
argument to allow passing additional inference params.
The tasks are : audio-classification
, automatic-speech-recognition
, fill-mask, image-classification
, image-segmentation
, object-detection
, question-answering
, table-question-answering
, text-classification
, token-classification
and translation
.
This inconsistency makes the implementation not fully aligned with the task specs here and the documentation here.
taking the example of text-classification:
with python requests:
import os
import requests
from huggingface_hub import InferenceClient
API_URL = "https://api-inference.huggingface.co/models/distilbert/distilbert-base-uncased-finetuned-sst-2-english"
headers = {"Authorization": f"Bearer {os.environ.get('HF_TOKEN')}"}
def query(payload):
response = requests.post(API_URL, headers=headers, json=payload)
return response.json()
output = query(
{
"inputs": "I like you. I love you",
"parameters": {"top_k": 1},
}
)
print(output)
[{'label': 'POSITIVE', 'score': 0.9998738765716553}]
With InferenceClient
client = InferenceClient()
output = client.text_classification(
model="distilbert-base-uncased-finetuned-sst-2-english",
text="I like you. I love you",
parameters={"top_k": 1},
)
print(output)
output = client.text_classification(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: InferenceClient.text_classification() got an unexpected keyword argument 'parameters'
Yes that is an identified problem. For context, when we first implemented the InferenceClient
, we didn't have that much harmonization yet (specs didn't exists). To avoid future maintenance we decided not to add them in a lot of tasks -always better to do less and avoid future breaking changes-. That's why the tasks you've listed above don't have parameters.
That been said, if we start adding parameters, we most likely would add a parameters
parameter to the method signatures but individual parameters instead (e.g. client.text_classification(..., top_k=1)
).
EDIT: another part to take into consideration is that those tasks are much much less used than the text generation or image generation ones, hence the reduced scope.
Agree, I prefer too adding a parameters
argument instead of individual parameters, at least for those "less popular" tasks. Despite having low usage of those tasks though, I think it's still important to have the client aligned with the Inference API documentation.
Sorry sorry, my bad ๐
I meant we most likely wouldn't add a parameters parameter to the method signatures but individual parameters instead.
The problem of a parameters
parameter is that it doesn't play well with annotations/autocomplete. Even if we start using typing.TypedDict
as OAI does, it's still not as natively supported as kwargs.
I think it's still important to have the client aligned with the Inference API documentation.
Makes sense now that we have a proper documentation ๐
Since we are talking about ~10 tasks that follows exactly the same pattern (i.e. inputs
+ parameters
), I think we should think of a semi-automated way to do that as well. This is how I would picture it (taking text_classification
example):
- take
TextClassificationParameters
class that is auto-generated based on the specs - take
InferenceClient.text_classification
method that is for now entirely manually written - check if a parameter is not listed in
text_classification
method. If a parameter is missing,- add it to the method signature with the same type annotation as the dataclass (for instance
function_to_apply: Optional["ClassificationOutputTransform"] = None
) - add the attribute docstring (for instance
"""When specified, limits the output to the top K most probable classes."""
to the method docstring inside theargs
section - make sure the imports are correct (typically,
from ... import ClassificationOutputTransform
)
- add it to the method signature with the same type annotation as the dataclass (for instance
With such a script, we would be sure all those "less used methods" are still well maintained according to the specs. In the past, I've created a few script to auto-generate things (see the ./utils
folder), always based on static analysis i.e. only working on the module text. For such a script here, I think we'll need to work at the ast
level.
It's usually good for such a script to have a "check" mode (fails if something has to be changed) and a "format" mode (modify the file inplace). Once we have it, we can integrate it to the makefile / CI. Let me know what you think :)
Sorry sorry, my bad ๐
I meant we most likely wouldn't add a parameters parameter to the method signatures but individual parameters instead.
The problem of a
parameters
parameter is that it doesn't play well with annotations/autocomplete. Even if we start usingtyping.TypedDict
as OAI does, it's still not as natively supported as kwargs.
Yes, I was actually thinking about adding parameters
to avoid modifying the client code if there are future changes in the parameters on the API side. But makes sense to add individual parameters, I don't think there will be much changes so let's do that.
Since we are talking about ~10 tasks that follows exactly the same pattern (i.e.
inputs
+parameters
), I think we should think of a semi-automated way to do that as well. This is how I would picture it (takingtext_classification
example):
take
TextClassificationParameters
class that is auto-generated based on the specstake
InferenceClient.text_classification
method that is for now entirely manually writtencheck if a parameter is not listed in
text_classification
method. If a parameter is missing,
- add it to the method signature with the same type annotation as the dataclass (for instance
function_to_apply: Optional["ClassificationOutputTransform"] = None
)- add the attribute docstring (for instance
"""When specified, limits the output to the top K most probable classes."""
to the method docstring inside theargs
section- make sure the imports are correct (typically,
from ... import ClassificationOutputTransform
)With such a script, we would be sure all those "less used methods" are still well maintained according to the specs. In the past, I've created a few script to auto-generate things (see the
./utils
folder), always based on static analysis i.e. only working on the module text. For such a script here, I think we'll need to work at theast
level.It's usually good for such a script to have a "check" mode (fails if something has to be changed) and a "format" mode (modify the file inplace). Once we have it, we can integrate it to the makefile / CI. Let me know what you think :)
Yes, I like the idea, that will be cool to have this in the makefile/CI.
I think the script should:
- Focus solely on adding new missing parameters
- Not handle parameter renaming or deprecation. For example, if
function_to_apply
is renamed tofn_to_apply
, we'd still need to manually keep the old name and deprecate it. The idea is really to make the script simple with not a lot of maintenance to do.
I will open later a separate issue for this ๐
Not handle parameter renaming or deprecation. For example, if function_to_apply is renamed to fn_to_apply, we'd still need to manually keep the old name and deprecate it. The idea is really to make the script simple with not a lot of maintenance to do.
Agree ๐ And if it's too annoying in future maintenance, we can complete the script later.
I will open later a separate issue for this ๐
As you wish, it's also fine to start a PR directly or rename this issue.