Logs not directed to the correct place
srossross opened this issue · 2 comments
srossross commented
I have a bunch of ParDo(DoFuncs)
in my beam pipeline which are not being correctly setup in Dataflow. it works fine when I test locally, but not when I run on Dataflow with streaming.
my pipeline is created like this:
class SplitTime(beam.DoFn):
def __init__(self, minutes=60):
self.minutes = minutes
def process(self, element):
....
....
activity_chunks = activities | 'Split Into 15 minute Chunks' >> beam.ParDo(SplitTime(minutes=15))
I'm creating a template like this:
python functions/beam2.py --runner DataflowRunner --project my-great-project --staging_location gs://test-bucket/stage --temp_location gs://test-bucket/temp --setup_file functions/setup.py --template_location gs://test-bucket/templates/27/activity5
and running like this:
gcloud dataflow jobs run template-27-5 --gcs-location=gs://test-bucket/templates/27/activity5
None of the logs are showing up in the dataflow UI. From stackdriver I can see my logs
2019-07-23T15:56:50.955574989Z No unique name set for transform generatedPtransform-2494 I
2019-07-23T15:56:50.972083091Z No unique name set for transform generatedPtransform-2492 I
2019-07-23T15:56:50.980607986Z No unique name set for transform -2482 I
How do I enforce a unique name for transform?
srossross commented
The log metadata from output that I know is in that that step:
{
insertId: "5616980600106424110:3873:0:60547"
jsonPayload: {…}
labels: {…}
logName: "projects/my-great-project/logs/dataflow.googleapis.com%2Fworker"
receiveTimestamp: "2019-07-23T15:41:24.085630441Z"
resource: {
labels: {
job_id: "2019-07-23_08_37_19-14491780021717194051"
job_name: "template-27-5"
project_id: "my-great-project"
region: "us-central1"
step_id: ""
}
type: "dataflow_step"
}
severity: "ERROR"
timestamp: "2019-07-23T15:41:06.450673103Z"
}
aaltay commented
Same issue is reported also in here https://issues.apache.org/jira/browse/BEAM-7934 . I will close this in order to track the issue in one place.