EthicalML/awesome-production-machine-learning

New section on Generative AI

axsaucedo opened this issue · 16 comments

With the growing pace of Generative AI models and tools I am wondering whether there could be a space to add a section on Generative AI in this section. It would be great to use this issue to brainstorm on what it could be, and identify whether we can find more than 5 relevant examples for production ML in context of Generative AI. Dalle-Flow is already a good example of a framework that could eb used as an example https://github.com/jina-ai/dalle-flow/.

@zhimin-z it would be great to get your thoughts on this one

@zhimin-z it would be great to get your thoughts on this one

Thanks for invitation, @axsaucedo. Generative AI is definitely an awesome domain to be considered in our list.

  • Dalle-Flow is Human-in-the-Loop workflow for creating HD images from text, which correlates both industry-strength NLP as well as CV in our list.
  • Also, I am thinking about integrating Dalle-Flow (Human-in-the-Loop workflow) into Data (generated text/images) Pipeline section. What do you think, @axsaucedo ?

1676221238520

Yes absolutely, I do think that would be the best suited area at this stage, but I would like to see if we can find a list of tools like dall-e flow for various Generative AI usecases, namely to see if these could all fit under their own "Industrial Generative AI" section, or whether we would just add them into the respective existing ones.

Yes absolutely, I do think that would be the best suited area at this stage, but I would like to see if we can find a list of tools like dall-e flow for various Generative AI usecases, namely to see if these could all fit under their own "Industrial Generative AI" section, or whether we would just add them into the respective existing ones.

Also, I wonder if we could add commercial tools like jasper.ai, digitalhumans and alexsei (as production-level Generative AI platforms), this is becoming super popular and impactful these days. @axsaucedo

Reference: https://www.analyticsinsight.net/top-10-generative-ai-companies-in-2023/

Also, regarding the pull request on generated data serving tools such as CLIP-as-service, where shall I put it in the list? @axsaucedo

Also, I wonder if we could add commercial tools like jasper.ai and alexsei (as production-level Generative AI platforms), this is becoming super popular and impactful these days. @axsaucedo

At this stage I would be keen to prioritise OSS tools in this issue, once we explore this we could have a look at commercial tools

Here is another project that seems quite promising https://github.com/LAION-AI/Open-Assistant

image

An exclusive Generative AI section seems to touch too many tools (~100) spanning multiple domains, wondering if it is better to split the toolchain into their respective functional sections (like we did right now).

What do you think? @axsaucedo

Interesting, I search over the Internet and found there already exists two similar lists for generative AI:

One more list:
https://github.com/meetpateltech/AI-Infinity

Maybe we can select only open source tools from those?

Is there a standard when we regard the prompt engineering section as an individual? @axsaucedo

The more the field of prompt engineering is defined the less I see it as relevant to this production list, I agree it's an important domain but it's high level in user interaction level to see it as relevant for this list, so I will close #424 as most of these are very high level tools to manage "text templates" which I don't see relevant.

I would still be keen to continue exploring whether Generative AI tools can fall into a separate theme, and one area that I am seeing as potentially relevant is the area that I am currently referring to as "agent-chain architecture frameworks", which provide the infrastructure and tooling to augment LLMs through agents, chains, etc - the primary example of this of course is https://github.com/hwchase17/langchain (https://www.youtube.com/watch?v=nMniwlGyX-c). I would be open to exploring what a list of this "agent-chain architecture tooling" could look like, but I also want to be careful as I am conscious that there are some tools that can mask themselves as tooling infra but they really are just a "good-looking" front-end interfaces to LLMs.

I would still be keen to continue exploring whether Generative AI tools can fall into a separate theme, and one area that I am seeing as potentially relevant is the area that I am currently referring to as "agent-chain architecture frameworks", which provide the infrastructure and tooling to augment LLMs through agents, chains, etc - the primary example of this of course is https://github.com/hwchase17/langchain (https://www.youtube.com/watch?v=nMniwlGyX-c). I would be open to exploring what a list of this "agent-chain architecture tooling" could look like, but I also want to be careful as I am conscious that there are some tools that can mask themselves as tooling infra but they really are just a "good-looking" front-end interfaces to LLMs.

There is a core question: generative ai concerns many aspects such as NLP, CV, RL, etc. How could we distinguish one from another? If we do not set up a standard about what is generative ai compared to the other ML-specific domain, then it is hard to categorize tools.

Another concern is that "generative ai" is an umbrella term commonly used in everyday life rather than in academia or industry. Scientists or ML engineers tell others they specialize in NLP, RL, or CV, but we seldom heard them say things like "I am a specialist in generative ai." "Generative ai" is a very broad area that touches many aspects of AI, almost all tools in our list potentially fall into this area, which makes the categorization unnecessary anymore.

The more the field of prompt engineering is defined the less I see it as relevant to this production list, I agree it's an important domain but it's high level in user interaction level to see it as relevant for this list, so I will close #424 as most of these are very high level tools to manage "text templates" which I don't see relevant.

How do you remark the following graph? I mean, prompt tuning is inseparable in the deployment of LLM for many many cases. LLM companies have the budget for hiring prompt engineers.
image
image