Security

HTTPS
JWT token
APIM in front of API, with subscription key, throttling etc. Not a WAF, though
VNET to only expose endpoints that we want, to the public
backup needs to be applied for the database --> PiTR is enabled by default

What needs to be done

focus on RBAC on all resources
KeyVault for both DevOps and for the runtime - this is a must before going fully into production
Setup a good backup policy on the database and APIM (I think the rules can be hosted in a git repo)
Swagger should be under some kind of login
APIM should be in internal mode with an application gateway in front, so the application gateway guards the public surface, and guards it with a WAF (OWASP protection included). This implies a bigger setup, but will provide more security, because we minimize the attack surface, and get in more control of request/response flow
Some NSGs could be implemented, so the deny/allow rule doesn't need to be applied on every web api
Azure functions should be in same VNET as APIs, but it requires a premium plan, and that i can't afford to have to run in the days up to the presentation

Dataflow

Either:

all metadata is always requested through fetch api --> generates a lot of load
all metadata is sent on event grid (not a good idea - if so, we need to change to a service bus) to the sites, that stores them locally
on every update, each site is notified, fetches the data, and stores them locally, to serv to end user

I prefer number 3 because:

it scales well with new sites and new types of metadata
doesn't overload the pipeline with a lot of GETs
can be reused on other systems as well, without knowing that much a but the how many new clients that is onboarded

Assumptions

I assume that the events from the sites is ordered by date. That is: that an delete at time 1 doesnt come before an update at time 0
I assume that CMS and video can be on same VNET as the APIM

What is missing in general

Deploy to staging slots
Cache hasn't been implemented
Code reuse through nugets hasn't been added to the stack. There are some candidates: the service bus client in web api and maybe the models used to move the messages through the system
Better servicebus topic and queue creation logic (should reside in a resource for itself)
Docker as a runtime so it is easier to switch to k8s or other orchestrations

On the drawing board: Going global

The solution is designed around one region. If we need to go multi region, we need to split focus on:

Apply the "follow the sun" princip if Azure SQL is chosen as database (Cosmos scales globally by default, but is quite expensive)
Setup APIM for multiple regions
Add a traffic manager to always locate the closest fetch and creator api
Find a good geo disaster strategy with service bus

A secondary solution and more simple onw, would be to ensure that we always can read from the secondary, if Azure SQL is chosen. If a disaster occures, we manually switch the creator api over to a new region, so the new region starts importing. This could potentially be done more or less automatic, but we need to build some custom switch logic for this to happen, and it might not be the best choice, 9/10 times it is best to let the switch be carried out by a humanbeing, because a lot of external stuff could lead to extra problems: