mikeczech/sw-yx-ai-notes

notes for my AI studies, writing, and product brainstorming

JavaScriptMIT

AI Notes

notes on AI state of the art, with a focus on generative and large language models. These are the "raw materials" for the https://lspace.swyx.io/ newsletter.

This repo used to be called https://github.com/sw-yx/prompt-eng, but was renamed because Prompt Engineering is Overhyped.

This Readme is just the high level overview of the space; you should see the most updates in the OTHER markdown files in this repo:

IMAGE_GEN.md - the most developed file, with the heaviest emphasis notes on Stable Diffusion, and some on midjourney and dalle.
TEXT.md - text generation, mostly with GPT3
CODE.md - codegen models, like Copilot
stubs - very small/lightweight proto pages
- AGENTS.md - tracking "agentic AI"
- AUDIO.md - tracking audio (transcription + generation)

Table of Contents

Motivational Use Cases
Top AI Reads
Communities
People
Misc
Quotes, Reality & Demotivation
Legal, Ethics, and Privacy

Motivational Use Cases

images
video
- img2img of famous movie scenes (lalaland)
  - img2img transforming actor with ebsynth + koe_recast
- virtual fashion (karenxcheng)
- seamless tiling images
- evolution of scenes (xander)
- outpainting https://twitter.com/orbamsterdam/status/1568200010747068417?s=21&t=rliacnWOIjJMiS37s8qCCw
- webUI img2img collaboration https://twitter.com/_akhaliq/status/1563582621757898752
- image to video with rotation https://twitter.com/TomLikesRobots/status/1571096804539912192
- "prompt paint" https://twitter.com/1littlecoder/status/1572573152974372864
- audio2video animation of your face https://twitter.com/siavashg/status/1597588865665363969
- physical toys to 3d model + animation https://twitter.com/sergeyglkn/status/1587430510988611584
- music videos
  - video killed the radio star, colab This uses OpenAI's Whisper speech-to-text, allowing you to take a YouTube video & create a Stable Diffusion animation prompted by the lyrics in the YouTube video
  - Stable Diffusion Videos generates videos by interpolating between prompts and audio
- direct text2video project
text-to-3d https://twitter.com/_akhaliq/status/1575541930905243652
- https://dreamfusion3d.github.io/
- open source impl: https://github.com/ashawkey/stable-dreamfusion
- demo https://twitter.com/_akhaliq/status/1578035919403503616
text products
Jasper
gpt3 email https://github.com/sw-yx/gpt3-email
gpt3() in google sheet 2020, 2022 - sheet
https://www.summari.com/ Summari helps busy people read more
sequoia market map https://twitter.com/sonyatweetybird/status/1584580362339962880
base10 market map https://twitter.com/letsenhance_io/status/1594826383305449491
game assets -
- emad thread https://twitter.com/EMostaque/status/1591436813750906882
- scenario.gg https://twitter.com/emmanuel_2m/status/1593356241283125251

Top AI Reads

The more advanced GPT3 reads have been split out to https://github.com/sw-yx/prompt-eng/blob/main/GPT.md

https://www.gwern.net/GPT-3#prompts-as-programming
https://learnprompting.org/
beginner
- openAI prompt tutorial https://beta.openai.com/docs/quickstart/add-some-examples
- google LAMDA intro https://aitestkitchen.withgoogle.com/how-lamda-works
- humanloop prolpt engineering 101 https://website-olo3k29b2-humanloopml.vercel.app/blog/prompt-engineering-101
- DALLE2 prompt writing book http://dallery.gallery/wp-content/uploads/2022/07/The-DALL%C2%B7E-2-prompt-book-v1.02.pdf
- https://medium.com/nerd-for-tech/prompt-engineering-the-career-of-future-2fb93f90f117
- https://wiki.installgentoo.com/wiki/Stable_Diffusion overview
- https://www.reddit.com/r/StableDiffusion/comments/x41n87/how_to_get_images_that_dont_suck_a/
- https://mpost.io/best-100-stable-diffusion-prompts-the-most-beautiful-ai-text-to-image-prompts/
- https://andymatuschak.org/prompts/
- for nontechnical
Intermediate
- DALLE2 asset generation + inpainting https://twitter.com/aifunhouse/status/1576202480936886273?s=20&t=5EXa1uYDPVa2SjZM-SxhCQ
- suhail journey https://twitter.com/Suhail/status/1541276314485018625?s=20&t=X2MVKQKhDR28iz3VZEEO8w
- composable diffusion - "AND" instead of "and" https://twitter.com/TomLikesRobots/status/1580293860902985728
- img2img https://andys.page/posts/how-to-draw/
- quest for photorealism https://www.reddit.com/r/StableDiffusion/comments/x9zmjd/quest_for_ultimate_photorealism_part_2_colors/
  - https://medium.com/merzazine/prompt-design-for-dall-e-photorealism-emulating-reality-6f478df6f186
- settings tweaking https://www.reddit.com/r/StableDiffusion/comments/x3k79h/the_feeling_of_discovery_sd_is_like_a_great_proc/
  - seed selection https://www.reddit.com/r/StableDiffusion/comments/x8szj9/tutorial_seed_selection_and_the_impact_on_your/
  - minor parameter parameter difference study (steps, clamp_max, ETA, cutn_batches, etc) https://twitter.com/KyrickYoung/status/1500196286930292742
  - Generative AI: Autocomplete for everything https://noahpinion.substack.com/p/generative-ai-autocomplete-for-everything?sd=pf
  - How does GPT Obtain its Ability? Tracing Emergent Abilities of Language Models to their Sources good paper with the development history of the GPT family of models and how the capabilities developed
Advanced
- Transformers from scratch https://e2eml.school/transformers.html
- karpathy on transformers
  - https://twitter.com/karpathy/status/1582807367988654081
- 137 emergent abilities of large language models
  - Emergent few-shot prompted tasks: BIG-Bench and MMLU benchmarks
  - Emergent prompting strategies
- Eugene Yan explanation of the Text to Image stack https://eugeneyan.com/writing/text-to-image/
- VQGAN/CLIP https://minimaxir.com/2021/08/vqgan-clip/
- 10 years of Image generation history https://zentralwerkstatt.org/blog/ten-years-of-image-synthesis
- Vision Transformers (ViT) Explained https://www.pinecone.io/learn/vision-transformers/
- negative prompting https://minimaxir.com/2022/11/stable-diffusion-negative-prompt/
https://creator.nightcafe.studio/vqgan-clip-keyword-modifier-comparison VQGAN+CLIP Keyword Modifier Comparison We compared 126 keyword modifiers with the same prompt and initial image. These are the results.
- https://creator.nightcafe.studio/collection/8dMYgKm1eVXG7z9pV23W
Google released PartiPrompts as a benchmark: https://parti.research.google/ "PartiPrompts (P2) is a rich set of over 1600 prompts in English that we release as part of this work. P2 can be used to measure model capabilities across various categories and challenge aspects."
Video tutorials
- Pixel art https://www.youtube.com/watch?v=UvJkQPtr-8s&feature=youtu.be
Misc
- StabilityAI CIO perspective https://danieljeffries.substack.com/p/the-turning-point-for-truly-open?sd=pf
- https://github.com/awesome-stable-diffusion/awesome-stable-diffusion

Communities

StableDiffusion Discord https://discord.com/invite/stablediffusion
https://reddit.com/r/stableDiffusion
Akhaliq Discord: https://discord.gg/nYqfg4gnBt
Deforum Discord https://discord.gg/upmXXsrwZc
Lexica Discord https://discord.com/invite/bMHBjJ9wRh
Perplexity Discord https://discord.com/invite/kWJZsxPDuX
Midjourney's discord
- how to use midjourney v4 https://twitter.com/fabianstelzer/status/1588856386540417024?s=20&t=PlgLuGAEEds9HwfegVRrpg
https://stablehorde.net/

People

This list will be out of date but will get you started. My live list of people to follow is at: https://twitter.com/i/lists/1585430245762441216

Misc

Whisper
- https://huggingface.co/spaces/sensahin/YouWhisper YouWhisper converts Youtube videos to text using openai/whisper.
- https://twitter.com/jeffistyping/status/1573145140205846528 youtube whipserer
- multilingual subtitles https://twitter.com/1littlecoder/status/1573030143848722433
- video subtitles https://twitter.com/m1guelpf/status/1574929980207034375
- you can join whisper to stable diffusion for reasons https://twitter.com/fffiloni/status/1573733520765247488/photo/1
- known problems https://twitter.com/lunixbochs/status/1574848899897884672 (edge case with catastrophic failures)
textually guided audio https://twitter.com/FelixKreuk/status/1575846953333579776
Codegen
- CodegeeX https://twitter.com/thukeg/status/1572218413694726144
- https://github.com/salesforce/CodeGen https://joel.tools/codegen/
pdf to structured data https://www.impira.com/blog/hey-machine-whats-my-invoice-total
text to Human Motion diffusion https://twitter.com/GuyTvt/status/1577947409551851520
- abs: https://arxiv.org/abs/2209.14916
- project page: https://guytevet.github.io/mdm-page/

Quotes, Reality & Demotivation

Narrow, tedium domain usecases https://twitter.com/WillManidis/status/1584900092615528448 and https://twitter.com/WillManidis/status/1584900100480192516
antihype https://twitter.com/alexandr_wang/status/1573302977418387457
prompt eng memes
- https://twitter.com/_jasonwei/status/1516844920367054848
things stablediffusion struggles with https://opguides.info/posts/aiartpanic/
New Google
- https://twitter.com/alexandr_wang/status/1585022891594510336
New Powerpoint
via emad
Appending prompts by default in UI
DALLE: https://twitter.com/levelsio/status/1588588688115912705?s=20&t=0ojpGmH9k6MiEDyVG2I6gg

Legal, Ethics, and Privacy

NSFW filter https://vickiboykis.com/2022/11/18/some-notes-on-the-stable-diffusion-safety-filter/
On "AI Art Panic" https://opguides.info/posts/aiartpanic/
Yannick influencing OPENRAIL-M https://www.youtube.com/watch?v=W5M-dvzpzSQ
art schools accepting AI art https://twitter.com/DaveRogenmoser/status/1597746558145265664