NVIDIA/TensorRT-LLM

[RFC]Feedback collection about TensorRT-LLM 1.0 Release Planning and API Compatibility Commitment

Opened this issue · 5 comments

Dear community,

Since TensorRT-LLM's initial GitHub release in October 2023, we have made substantial progress over the past eighteen months. We extend our sincere gratitude for the community's exceptional support and valuable feedback, which have been instrumental in advancing the framework.

With the recent introduction of the PyTorch backend in the 0.17 release and our adoption of a GitHub-first development process, we are now preparing for the 1.0 release. This milestone will formalize our commitment to API backward compatibility. Previously, we intentionally delayed declaring 1.0 status given the rapidly evolving nature of this domain, as we wanted to avoid premature commitments that might require breaking changes.

Proposed Scope for 1.0 API Compatibility:

To ensure enforceable backward compatibility for TensorRT-LLM 1.0 APIs, we have implemented an automated protection mechanism. Specifically:

  • The reference_committed directory contains APIs with formal backward compatibility commitments.
  • The reference directory includes additional APIs currently under automated protection but not yet formally committed. Based on community feedback, we will progressively migrate select APIs from references to references_committed.

We welcome your input on two key matters:

  • Feedback regarding the proposed API compatibility coverage

  • Specific feature requests for inclusion in the TensorRT-LLM 1.0 release

Your insights will directly influence our roadmap prioritization.

Best regards,
The TensorRT-LLM Engineering Team

Specific feature requests for inclusion in the TensorRT-LLM 1.0 release

Ability to disable cyclic kv cache so models requiring sliding windows on some layers are compatible with block reuse :)

In the situation of duplex conversation, each position may contain token from user and token from robot.
Tokens from user are generated by client, robot tokens are generated by llm.
We need an input_generator and a combiner to make the next token.
Hope trtllm can elegantly support this.

Hi @juney-nvidia
Will you also commit to a backward compat for the cpp executor API:


?
Best

Hi, I found that the TAG v1.1.0rc2 was released. I wanna know when the TAG v1.0.0 will be formally released?

hi,
can someone explain if engine building is not possible any more for tensorrt llm 1.0?