Aggregation service setup notes, snags & suggestions.

Question

Aggregation service setup notes, snags & suggestions.

Opened this issue 3 months ago · 2 comments

Hello All!

Having spent the past few days on trying to get the AS live, I have been jotting down various questions, suggestions & bugs which I think could be a great addition to the documentation and workflow.

Full architecture diagram

Maybe for those who use terraform in their project this is not required, but we do not use terraform and essentially followed the instructions to get all the resources built. I have since had to traverse the GCP console to try and understand what the scripts created. A high level overview diagram with the main data flows, table names etc would be extremely useful.

Resource naming

Similar to the point above, the terraform scripts are spread over many files so it is not clear exactly what will be created. I think it would be great to have a single file config showing all the names of the resources as they are very obscure in the context of our overall infra, for example prod-jobmd, is a name of a newly created Cloud Spanner instance, which is a pretty unhelpful name. At the very least everything should be prefixed with aggregation-service, or even better allow users to transparently set this as a first step.

Resource costs

It would be good to have an understanding of the cost of the full set up at idle, and maybe have some suggestions for development and staging setups which can minimise costs by using more serverless infra for example.

Cloud Function / Run

I would suggest to drop the use of cloud functions and migrate fully to cloud run, the docs seems to use these interchangeable and although they sort of are (gen2 functions are powered by cloud run), I think this can cause extra confusion. There is also a small typo on the endpoint:

This is the value in the docs

https://<environment>-<region>-frontend-service-<cloud-funtion-id>-uc.a.run.app/v1alpha/createJob

But -uc. was -ew. in my case, so this does not seem to a value which can be hardcoded in the docs in this manner.

Known errors and solutions

Running the jobs stores a nice error in the DB, which is awesome! But even with this nice error it would be great to have a document to show common errors and their solutions. For example my latest error is:

{"errorSummary":{"errorCounts":[{"category":"DECRYPTION_KEY_NOT_FOUND","count":"445","description":"Could not find decryption key on private key endpoint."},{"category":"NUM_REPORTS_WITH_ERRORS","count":"445","description":"Total number of reports that had an error. These reports were not considered in aggregation. See additional error messages for details on specific reasons."}]},"finishedAt":"2024-04-30T13:17:24.233681575Z","returnCode":"REPORTS_WITH_ERRORS_EXCEEDED_THRESHOLD","returnMessage":"Aggregation job failed early because the number of reports excluded from aggregation exceeded threshold."}

Which is very clear - but still does not leave me any paths open to try and rectify the issue apart from troubling people over email or in this repo :)

Some missing configuration

This was addressed in #48 but needs to be added to the repo.

Show data conversion flows

There are quite a few flows in which data must be converted from one format to another, for example some hashed string into a byte array, whilst it is possible to figure this out given some disparate pieces of information available in the repository it would be very useful to have a few examples for various platforms, eg:

-- Convert hashes to domain avro for processing.
CAST(FROM_HEX(SUBSTR(reports.hashed_key, 3)) AS BYTES) AS bucket

I hope you do not mind if I keep updating this issue as I hopeful near completion of getting the service up!

All the best!
D

Answer 1 · 2024-05-15T15:18:19.000Z

Hi Dennis,

Thank you for sharing this detailed feedback. We have taken this and shared this internally with our team.

for #48, we have addressed this issue and it should now reflect in the repository.

For the error DECRYPTION_KEY_NOT_FOUND, this is noted. We are working on documenting these errors and we'll work with both Attribution Reporting API and Private Aggregation team to ensure that this is documented.

For data conversion flow, we are working to update our documentation for creating the avro files.

Answer 2 · 2024-06-14T14:03:20.000Z

Hi again Dennis,

For resource costs, we are unable to include the specific costs in our documentation as this is managed by the cloud providers and could change. However, what we have documented is sizing guidance on how adtechs can calculate their costs based on the aggregation strategy.