ARCHITECTURES_2_TASK is limiting the tasks able to be deployed with HF DLC

Issue:

Customer is deploying HF ID: BAAI/bge-m3 with Task: sentence-similarity using:

model_builder = ModelBuilder(
    model=“BAAI/bge-m3”,
    schema_builder=SchemaBuilder(sample_input, sample_output),
    model_path=path, #local path where artifacts will be saved
    mode=Mode.LOCAL_CONTAINER,
    env_vars={
        "HF_TASK": "sentence-similarity"
    }
)

model_builder.deploy()

But getting the following from within the HF DLC:

ModelBuilder: DEBUG:     2024-02-13 00:05:57,567 [INFO ] W-BAAI__bge-m3-58-stdout 
com.amazonaws.ml.mms.wlm.WorkerLifeCycle - ValueError: Task couldn't be inferenced from XLMRobertaModel.Inference Toolkit 
can only inference tasks from architectures ending with ['TapasForQuestionAnswering', 'ForQuestionAnswering', 
'ForTokenClassification', 'ForSequenceClassification', 'ForMultipleChoice', 'ForMaskedLM', 'ForCausalLM', 
'ForConditionalGeneration', 'MTModel', 'EncoderDecoderModel', 'GPT2LMHeadModel', 'T5WithLMHeadModel'].Use env
`HF_TASK` to define your task.

Root Cause:

ARCHITECTURES_2_TASK mapping is too constraining and does not include all admissible pipeline tasks:

sagemaker-huggingface-inference-toolkit/src/sagemaker_huggingface_inference_toolkit/transformers_utils.py

Line 79 in 80634b3

ARCHITECTURES_2_TASK = {

All we do is pass the task to get_pipeline from within the handler_service.py as far as I can see:

sagemaker-huggingface-inference-toolkit/src/sagemaker_huggingface_inference_toolkit/handler_service.py

Line 115 in 80634b3

    
           hf_pipeline = get_pipeline(task=os.environ["HF_TASK"], model_dir=model_dir, device=self.device)

Shouldn't the logic be the following?:

if "HF_TASK" provided -> set task to "HF_TASK"

else:
    fetch architecture from config.json
    if architecture is not in ARCHITECTURES_2_TASK -> throw error
    set task to mapped task

This way we allow for a best effort deployment / a.k.a pass through for the HF_TASK to get_pipeline(). If that fails, then so be it we should propagate the right error messaging to the customer stating that we tried the best and that xyz went wrong.

Hello,
task sentence-similarity is sentence-transformers specific which is not yet supported.

@philschmid Is there a reason why we do not support all the tasks?

ack so will this eventually be supported via get_pipeline? Or will it only be limited to:

from sentence_transformers import SentenceTransformer
model = SentenceTransformer("BAAI/bge-m3")

We might for on that in the future but for now the simplest is to create a inference.py, see here https://www.philschmid.de/custom-inference-huggingface-sagemaker

@philschmid there are a number of multimodal tasks missing here (object detection, text to speech, audio classification etc.)

In addition text2text-generation task provided in the map doesn't have schemas in hf/tasks

huggingface/huggingface.js@b70ac6c

Is there a way to extend this map further for new use-cases? This will also save customer effort to create the inference.py for these formats.

@philschmid there are a number of multimodal tasks missing here (object detection, text to speech, audio classification etc.)

Where is "here"?

Is there a way to extend this map further for new use-cases? This will also save customer effort to create the inference.py for these formats.

What map?

We're basically trying to extend inference capabilities by storing/generating additional metadata for a larger set of task types. Today we use the pipeline tag/task as a core filter of the selection logic, this was under the assumption that pySDK would be compatible will all transformer task types. Explicit types being defined in the ARCHITECTURES_2_TASK is blocking us currently. And my questions above are about possibilities to extend it without using the inference.py route. There is a proposed fix in this issue, but given the lack of transformers support, I am not sure if making the fix will help.

@philschmid there are a number of multimodal tasks missing here (object detection, text to speech, audio classification etc.)

Where is "here"?
ARCHITECTURES_2_TASK

sagemaker-huggingface-inference-toolkit/src/sagemaker_huggingface_inference_toolkit/transformers_utils.py

Lines 79 to 93 in 80634b3

ARCHITECTURES_2_TASK = {

"TapasForQuestionAnswering": "table-question-answering",

"ForQuestionAnswering": "question-answering",

"ForTokenClassification": "token-classification",

"ForSequenceClassification": "text-classification",

"ForMultipleChoice": "multiple-choice",

"ForMaskedLM": "fill-mask",

"ForCausalLM": "text-generation",

"ForConditionalGeneration": "text2text-generation",

"MTModel": "text2text-generation",

"EncoderDecoderModel": "text2text-generation",

# Model specific task for backward comp

"GPT2LMHeadModel": "text-generation",

"T5WithLMHeadModel": "text2text-generation",

}

Is there a way to extend this map further for new use-cases? This will also save customer effort to create the inference.py for these formats.

What map?
ARCHITECTURES_2_TASK

The ARCHITECTURES_2_TASK Is not actively used since its not longer always possible to derive the task from a model since the same "architecture" can have multiple tasks. Thats why we always (every example, guide, docs etc.) set the TASK.

@samruds To address this gap, we should provide InferenceSpec support for HF DLC in ModelBuilder. This way a customer can provide custom inference script logic

The work around today is to treat this as a BYOC ModelBuilder scenario. Doing something like this:

class MySentenceTransformerModel(InferenceSpec):
    def load(self, model_dir: str):
        from sentence_transformers import SentenceTransformer, util
        model = SentenceTransformer("BAAI/bge-m3")
        return model

    def invoke(self, data: object, model: object):
        sentences = data["inputs"]

        embedding_1 = model.encode(sentences[0], convert_to_tensor=True)
        embedding_2 = model.encode(sentences[1], convert_to_tensor=True)

        similarity_score = util.pytorch_cos_sim(embedding_1, embedding_2)

        return {"score": similarity_score.numpy()}

sample_input = {
    "inputs": ["I'm happy", "I'm full of happiness"]
}

sample_output = {
    "score": [0.999]
}

mb = ModelBuilder(
    inference_spec=MySentenceTransformerModel(),
    schema_builder=SchemaBuilder(sample_input=sample_input, sample_output=sample_output),
    model_path="/home/ec2-user/SageMaker/test_dir",
    image_uri="763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-inference:2.1.0-transformers4.37.0-cpu-py310-ubuntu22.04"
)

my_model = mb.build(mode=Mode.LOCAL_CONTAINER)
my_model.deploy()

So error handling on our end should be:

In ModelBuilder

If TASK cannot be inferred or is not supported -> please provide InferenceSpec

There are two things here

Tasks being restricted by old implementation - fix is out for that.
Some transformers don't have tasks that have been exposed via pipeline tag yet - the option you provided is one solution (another could them be exposed as pipeline tag on a release cycle). I think we would need a list of libraries that do need custom inference to prioritize such an implementation. If custom inference is needed, is this the long term plan for such tasks. This looks to be more as a problem of supporting tasks in extremely nascent stage of development. Maybe @philschmid can weigh in more over the long term release.

Customer has two options here

Set the HF task via env variables
Provide inference.py for non task models like sentence similarity.

	ARCHITECTURES_2_TASK = {
	"TapasForQuestionAnswering": "table-question-answering",
	"ForQuestionAnswering": "question-answering",
	"ForTokenClassification": "token-classification",
	"ForSequenceClassification": "text-classification",
	"ForMultipleChoice": "multiple-choice",
	"ForMaskedLM": "fill-mask",
	"ForCausalLM": "text-generation",
	"ForConditionalGeneration": "text2text-generation",
	"MTModel": "text2text-generation",
	"EncoderDecoderModel": "text2text-generation",
	# Model specific task for backward comp
	"GPT2LMHeadModel": "text-generation",
	"T5WithLMHeadModel": "text2text-generation",
	}