privacysandbox/aggregation-service

Error using LocalTestingTool_2.0.0.jar with sampledata

Closed this issue · 7 comments

I am trying to follow the instructions in Testing locally using Local Testing Tool but when I run the following command with the sampledata:

java -jar LocalTestingTool_2.0.0.jar \
--input_data_avro_file sampledata/output_debug_reports.avro \
--domain_avro_file sampledata/output_domain.avro \
--output_directory .

I get the error below:

2023-10-31 12:21:57:506 -0700 [WorkerPullWorkService] INFO com.google.aggregate.adtech.worker.WorkerPullWorkService - Aggregation worker started
2023-10-31 12:21:57:545 -0700 [WorkerPullWorkService] INFO com.google.aggregate.adtech.worker.WorkerPullWorkService - Item pulled
2023-10-31 12:21:57:555 -0700 [WorkerPullWorkService] INFO com.google.aggregate.adtech.worker.aggregation.concurrent.ConcurrentAggregationProcessor - Reports shards detected by blob storage client: [output_debug_reports.avro]
2023-10-31 12:21:57:566 -0700 [WorkerPullWorkService] INFO com.google.aggregate.adtech.worker.aggregation.concurrent.ConcurrentAggregationProcessor - Reports shards to be used: [DataLocation{blobStoreDataLocation=BlobStoreDataLocation{bucket=/Users/jonaquino/projects/aggregation-service/sampledata, key=output_debug_reports.avro}}]
2023-10-31 12:21:57:566 -0700 [WorkerPullWorkService] INFO com.google.aggregate.adtech.worker.aggregation.domain.OutputDomainProcessor - Output domain shards detected by blob storage client: [output_domain.avro]
2023-10-31 12:21:57:567 -0700 [WorkerPullWorkService] INFO com.google.aggregate.adtech.worker.aggregation.domain.OutputDomainProcessor - Output domain shards to be used: [DataLocation{blobStoreDataLocation=BlobStoreDataLocation{bucket=/Users/jonaquino/projects/aggregation-service/sampledata, key=output_domain.avro}}]
2023-10-31 12:21:57:575 -0700 [WorkerPullWorkService] INFO com.google.aggregate.adtech.worker.aggregation.concurrent.ConcurrentAggregationProcessor - Job parameters didn't have a report error threshold configured. Taking the default percentage value 10.000000
return_code: "REPORTS_WITH_ERRORS_EXCEEDED_THRESHOLD"
return_message: "Aggregation job failed early because the number of reports excluded from aggregation exceeded threshold."
error_summary {
  error_counts {
    category: "REQUIRED_SHAREDINFO_FIELD_INVALID"
    count: 1
    description: "One or more required SharedInfo fields are empty or invalid."
  }
  error_counts {
    category: "NUM_REPORTS_WITH_ERRORS"
    count: 1
    description: "Total number of reports that had an error. These reports were not considered in aggregation. See additional error messages for details on specific reasons."
  }
}
finished_at {
  seconds: 1698780117
  nanos: 679576000
}

CustomMetric{nameSpace=scp/worker, name=WorkerJobCompletion, value=1.0, unit=Count, labels={Type=Success}}
2023-10-31 12:21:57:732 -0700 [WorkerPullWorkService] INFO com.google.aggregate.adtech.worker.WorkerPullWorkService - No job pulled.
jen6 commented

have same issue

❯ java -jar LocalTestingTool_${VERSION}.jar \
--input_data_avro_file ~/Downloads/output_debug_reports.avro \
--domain_avro_file ~/Downloads/output_domain\ \(2\).avro \
--output_directory .
2023-11-01 12:23:57:021 +0900 [WorkerPullWorkService] INFO com.google.aggregate.adtech.worker.WorkerPullWorkService - Aggregation worker started
2023-11-01 12:23:57:114 +0900 [WorkerPullWorkService] INFO com.google.aggregate.adtech.worker.WorkerPullWorkService - Item pulled
2023-11-01 12:23:57:154 +0900 [WorkerPullWorkService] INFO com.google.aggregate.adtech.worker.aggregation.concurrent.ConcurrentAggregationProcessor - Reports shards detected by blob storage client: [output_debug_reports.avro]
2023-11-01 12:23:57:216 +0900 [WorkerPullWorkService] INFO com.google.aggregate.adtech.worker.aggregation.concurrent.ConcurrentAggregationProcessor - Reports shards to be used: [DataLocation{blobStoreDataLocation=BlobStoreDataLocation{bucket=/Users/me/Downloads, key=output_debug_reports.avro}}]
2023-11-01 12:23:57:219 +0900 [WorkerPullWorkService] INFO com.google.aggregate.adtech.worker.aggregation.domain.OutputDomainProcessor - Output domain shards detected by blob storage client: [output_domain (2).avro]
2023-11-01 12:23:57:221 +0900 [WorkerPullWorkService] INFO com.google.aggregate.adtech.worker.aggregation.domain.OutputDomainProcessor - Output domain shards to be used: [DataLocation{blobStoreDataLocation=BlobStoreDataLocation{bucket=/Users/me/Downloads, key=output_domain (2).avro}}]
2023-11-01 12:23:57:240 +0900 [WorkerPullWorkService] INFO com.google.aggregate.adtech.worker.aggregation.concurrent.ConcurrentAggregationProcessor - Job parameters didn't have a report error threshold configured. Taking the default percentage value 10.000000
return_code: "REPORTS_WITH_ERRORS_EXCEEDED_THRESHOLD"
return_message: "Aggregation job failed early because the number of reports excluded from aggregation exceeded threshold."
error_summary {
  error_counts {
    category: "REQUIRED_SHAREDINFO_FIELD_INVALID"
    count: 1
    description: "One or more required SharedInfo fields are empty or invalid."
  }
  error_counts {
    category: "NUM_REPORTS_WITH_ERRORS"
    count: 1
    description: "Total number of reports that had an error. These reports were not considered in aggregation. See additional error messages for details on specific reasons."
  }
}
finished_at {
  seconds: 1698809037
  nanos: 449472000
}

Hi @JonathanAquino-NextRoll ,

Thanks! We're looking into the files. It looks like there's some problem with the output_debug_reports.avro. We will update you once we have more information.

jen6 commented

Hi @maybellineboon . Is there any update?

Hi @JonathanAquino-NextRoll and @jen6 ,

Our engineers are still working on this and will publish the new avro file soon. However, temporarily, please find the attached avro file to help you test out the LocalTestingTool.
output_debug_reports.avro.zip

Thanks!
Maybelline

Yes, that file works - thank you

Sample data have been updated.