Project Roadmap

Question

Project Roadmap

tgaddair opened this issue a year ago · 33 comments

Answer 1 · 2023-11-22T22:45:03.000Z

is AWQ supported?

Answer 2 · 2023-11-22T22:57:22.000Z

Hey @RileyCodes, not yet, will add that to the roadmap!

Answer 3 · 2023-11-23T15:53:14.000Z

does we have tested bitsandbytes Quantization ?

Answer 4 · 2023-11-23T20:22:51.000Z

Hey @abhibst, I've done some basic sanity checks on it, but haven't tested it very thoroughly. Please feel free to report any issues you encounter and I'll take a look!

Answer 5 · 2023-11-23T21:44:58.000Z

Sure Thanks for confirming

Answer 6 · 2023-11-29T20:48:29.000Z

How would you go about adding this in Stable Diffusion? I am really interested in experimenting with that.

Answer 7 · 2023-11-29T22:16:03.000Z

Hey @sansavision, at a high level it would look a lot like the LoRA pipeline used in Diffusers: https://github.com/huggingface/api-inference-community/blob/main/docker_images/diffusers/app/pipelines/text_to_image.py#L25

A v0 shouldn't be too bad, we would basically just run a single forward pass to generate the image and perform postprocessing (as part of the existing Prefill step) and short-circuit the Decode step.

Answer 8 · 2023-12-03T21:38:00.000Z

If no one has started I will start working on awq tomorrow

Answer 9 · 2023-12-03T22:14:21.000Z

Nice! Thanks @flozi00, that would be awesome!

Answer 10 · 2023-12-06T12:33:00.000Z

Any plans to support vision transformers from huggingface / timm? A lot of potential use cases there for deploying many classifiers. If not what would that entail? Would be open to contributing if possible.

Answer 11 · 2023-12-06T17:49:06.000Z

Hey @SamGalanakis, great suggestion! The plan at the moment is to start by supporting text classifiers. Once that framework is in place for that, it should be hopefully relatively straightforward to support image classifiers as well. Happy to start a thread on Discord to discuss!

Answer 12 · 2023-12-06T18:17:45.000Z

Whisper would be also very cool 😄

Answer 13 · 2023-12-06T18:25:26.000Z

@tgaddair Ok clear, joined the discord will look out for it!

Answer 14 · 2023-12-15T07:51:56.000Z

Hi, @tgaddair , could I know how long it will take to support the stable diffusion model?

Answer 15 · 2023-12-16T21:19:15.000Z

Hey @Hap-Zhang, the plan at the moment is to add it after we add support for embedding generation and text classification. Both of those are planned for January 2024, so in the next month.

Answer 16 · 2023-12-18T01:51:50.000Z

@tgaddair Okay, got it. Thank you very much for your efforts. Stay tuned for it.

Answer 17 · 2024-01-08T16:10:49.000Z

If we could have OpenAI compatible endpoints that would be great too. So we can use this as drop in replacement for OpenAI models :)

Answer 18 · 2024-01-08T17:19:43.000Z

Hey @AdithyanI, yes, this should be coming this week or next! See #145 to follow progress.

Answer 19 · 2024-01-08T22:36:26.000Z

@tgaddair oh wow that would be awesome! Thank you so much for the work here.
If you need someone to test it out; let me know. Happy to test it out.

Is the discord still open for others to join :) ?
I followed the link of the repo, and it says it is expired.

Answer 20 · 2024-01-09T22:06:20.000Z

@AdithyanI this should be landing some time today :)

#170

Answer 21 · 2024-01-09T22:07:03.000Z

Hey @AdithyanI, the Discord should be available. Are you using this link?

https://discord.gg/CBgdrGnZjy

Answer 22 · 2024-01-11T07:54:22.000Z

@tgaddair I asked for outlines repo authors to add support to this : dottxt-ai/outlines#523
Then it would be great to have text guided generation :)

I don't know how hard is it to integrate that here.
Do you folks know if this is something that can be supported by LORAX?

Answer 23 · 2024-01-12T05:22:20.000Z

Thanks for starting the Outlines thread @AdithyanI! Looks like the maintainer created an issue #176. Excited to explore this integration!

Answer 24 · 2024-02-20T21:52:49.000Z

Would it be possible to add in context length-scaling methods like Self-Extend , Rope scaling, and/or yarn scaling? I know that llama.cpp has a good implementation of these in their server, and self-extend in particular is much more stable than rope or yarn. Having long context or doing context enhancement is super important for RAG applications.

Answer 25 · 2024-02-26T18:42:57.000Z

About the supported models, could you consider the ChatGLM3 ? @tgaddair

Answer 26 · 2024-03-10T17:22:09.000Z

LongLoRA

It seems that LongLoRA proposed shifted short attention is compatible with Flash-Attention, and not required during inference (ref: https://huggingface.co/Yukang/Llama-2-13b-longlora-8k#highlights), if that is true, could you share what's the planed support in LoRAX inference side? thanks @tgaddair

Answer 27 · 2024-03-17T15:05:21.000Z

Do you plan on supporting AQLM to setve LoRa of Mixtral Instruct with Lorax?

Answer 28 · 2024-03-17T20:37:58.000Z

Hey @thincal, the last thing we need to support LongLoRA, if I remember correctly, is #231 which @geoffreyangus is planning to pick up next week.

@remiconnesson, we have PR #233 from @flozi00 for AQLM. It's pretty close to landing, but just needs a little additional work to finish it up. If no one else picks it up, I can probably take a look in the next week or two.

Answer 29 · 2024-04-01T17:07:31.000Z

Are T5 based models on the Roadmap?

Answer 30 · 2024-04-01T21:27:34.000Z

@tgaddair

@remiconnesson, we have PR #233 from @flozi00 for AQLM. It's pretty close to landing, but just needs a little additional work to finish it up. If no one else picks it up, I can probably take a look in the next week or two.

Hello :) How far do you think we are for this PR to be merged? :)

Answer 31 · 2024-04-03T16:50:20.000Z

Hey @remiconnesson, will probably be the next thing I take a look at after wrapping up speculative decoding this week.

@amir-in-a-cynch we can definitely add T5 to the roadmap!

Answer 32 · 2024-04-22T14:46:57.000Z

Hello, will you integrate / merge / migrate to the latest hugging face text-generation-inference as it is back now with Apache 2.0 license?

Answer 33 · 2024-08-09T17:45:27.000Z

Is there an expected release date for v0.11?

Project Roadmap

v0.10

v0.11

Previous Releases

v0.9

Backlog

Models

Adapters

Throughput / Latency

Quantization

Usability