This is a very rough version of a scrappy production verison I use personally. Your milage may vary!
The idea is that when a new video is created the info is sent to this server (the description) and an image is created using elements (background images, logos, etc.) and returned in the GET response. I use this with n8n (self hosted), but you can use this however you need.
Very much based on this: https://github.com/vercel/og-image