microsoft/soundscape

Azure cost/service breakdown

jchudge opened this issue · 7 comments

Are you able to provide a breakdown of the Azure costs at a more granular level? For example, which components/service are used and their approximate costs...

The challenges to doing so are as follows:

  • There are various rates for individual resources depending on the chosen region
  • Our internal cost structure was different
  • Internally we were not able to use certain specific cost saving optimizations so as not to compete for resources with external customers
  • The way we deployed the production service reflected the balance our team chose relative to cost management vs. staffing vs. overall team funding. In other words, it should be possible to do better.
  • All the costs come from things that consume storage or CPU or both Most everything else is negligible

I think the following does at least outline some basics in terms of resources:

  • Any solution is going to need a PostGIS database with upwards of a 1/2 to a terabyte of storage for the injested OSM database
  • You need an app service, kubernetes, service running in a pod, server function/lambda or something similar to take an HTTPS REST query translate that to a SQL query (see tilefunc.sql) that you can issue to that database mentioned earlier.
  • If you're indexing OSM every week, you'll need a transient cpu/memory/storage resources to index the OSM data. I'll look up what we last used. Maybe some cost could be saved here using transient resources.
  • Everything else is just glue stuff to support the above and shouldn't be costly.
  • Unstated here is anything you to do re: fault tolerance eg. running multiple resources, having capacity for fail over, etc.

Can you provide an update on what resources you used to index OSM, as well as how you chose what tiles to cache?

I'm hoping something additional will be open sourced for the former. The services indexed the entire world map into a postgres database and then did queries upon it when given requests for tiles. There was no tile cache on the services. The client did cache tiles though. With this setup actual database i/o utilization from queries was low.

Could one conceivably re-architect the app such that it wouldn't need its own cloud resources? I'm imagining an app that made requests to an OSM API as needed (not attempting to bulk-process the whole world), doing processing and caching directly on the device. I understand this would be a pretty fundamental change, but as an open-source project it might make more sense to push the compute workload to the edge. Unless I'm missing some reason why this wouldn't work, e.g. that there's no such OSM API?

You'd find that OSM itself has fairly minimal service capacity and employs throttling and request limits to compensate to manage their costs. Their resources are intended for experimentation and some individual use and not an uncapped number of users. That's the reason why there are all these guides to standing up minimal tile services or overpass (a odd query engine they have) -- OSM considers their core asset to be the database and not the services they run.

We've regularly seen college projects etc hit those limits and have to instantiate some service or use someone else.

From a pure technical level without substantive change youd still need a tile server returning geojson -- i think those exist or get GeoJSON out of overpass. You'd also have to do more filtering on the client side as some is included in the map ingestion today. Additionally the synthesized data points like entrances and some intersection data would now be entirely a client responsibility.

Additional, you could definitely rearchitect things so that the ingestion activities are done in a batch style and thus have a transient cost. We intended to go down that approach and rely on auto-scaling but experienced various limitations of the systems at the time. If you do that then you're paying for your transient ingestion cost and then the actual serving cost which should be fairly mild.