GoogleCloudPlatform/DataflowPythonSDK

Cloud Dataflow - Subprocess Error

SouthernYoda opened this issue · 1 comments

I am following Quickstart using Python. I am able to execute the command locally. When I am trying the push the dataflow to run in the cloud I get a subprocess error. I see that apache beam has been correctly downloaded in the specified tmp directory.

Versions

Windows=10
Python= 3.8.5
apache-beam=2.27.0

Command

python -m apache_beam.examples.wordcount `
--input gs://dataflow-samples/shakespeare/kinglear.txt `
--output gs://<project>/outputs `
--runner DataflowRunner `
--temp_location gs://<project>/tmp/ `
--region us-central1 `
--project <project>

Log Output:

INFO:apache_beam.internal.gcp.auth:Setting socket default timeout to 60 seconds.
INFO:apache_beam.internal.gcp.auth:socket default timeout is 60.0 seconds.
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:oauth2client.client:Refreshing access_token
INFO:apache_beam.runners.portability.stager:Downloading source distribution of the SDK from PyPi
INFO:apache_beam.runners.portability.stager:Executing command: ['C:\\Users\\Adam\\miniconda3\\envs\\GCP-DataFlow-3.8\\python.exe', '-m', 'pip', 'download', '--dest', 'C:\\Users\\Adam\\AppData\\Local\\Temp\\tmpswl98o48', 'apache-beam==2.27.0', '--no-deps', '--no-binary', ':all:']
Traceback (most recent call last):
  File "C:\Users\Adam\miniconda3\envs\GCP-DataFlow-3.8\lib\site-packages\apache_beam\utils\processes.py", line 91, in check_output
    out = subprocess.check_output(*args, **kwargs)
  File "C:\Users\Adam\miniconda3\envs\GCP-DataFlow-3.8\lib\subprocess.py", line 411, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "C:\Users\Adam\miniconda3\envs\GCP-DataFlow-3.8\lib\subprocess.py", line 512, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['C:\\Users\\Adam\\miniconda3\\envs\\GCP-DataFlow-3.8\\python.exe', '-m', 'pip', 'download', '--dest', 'C:\\Users\\Adam\\AppData\\Local\\Temp\\tmpswl98o48', 'apache-beam==2.27.0', '--no-deps', '--no-binary', ':all:']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\Adam\miniconda3\envs\GCP-DataFlow-3.8\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\Adam\miniconda3\envs\GCP-DataFlow-3.8\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\Adam\miniconda3\envs\GCP-DataFlow-3.8\lib\site-packages\apache_beam\examples\wordcount.py", line 99, in <module>
    run()
  File "C:\Users\Adam\miniconda3\envs\GCP-DataFlow-3.8\lib\site-packages\apache_beam\examples\wordcount.py", line 94, in run
    output | 'Write' >> WriteToText(known_args.output)
  File "C:\Users\Adam\miniconda3\envs\GCP-DataFlow-3.8\lib\site-packages\apache_beam\pipeline.py", line 582, in __exit__
    self.result = self.run()
  File "C:\Users\Adam\miniconda3\envs\GCP-DataFlow-3.8\lib\site-packages\apache_beam\pipeline.py", line 529, in run
    return Pipeline.from_runner_api(
  File "C:\Users\Adam\miniconda3\envs\GCP-DataFlow-3.8\lib\site-packages\apache_beam\pipeline.py", line 561, in run
    return self.runner.run_pipeline(self, self._options)
  File "C:\Users\Adam\miniconda3\envs\GCP-DataFlow-3.8\lib\site-packages\apache_beam\runners\dataflow\dataflow_runner.py", line 504, in run_pipeline
    artifacts=environments.python_sdk_dependencies(options)))
  File "C:\Users\Adam\miniconda3\envs\GCP-DataFlow-3.8\lib\site-packages\apache_beam\transforms\environments.py", line 738, in python_sdk_dependencies
    staged_name in stager.Stager.create_job_resources(
  File "C:\Users\Adam\miniconda3\envs\GCP-DataFlow-3.8\lib\site-packages\apache_beam\runners\portability\stager.py", line 223, in create_job_resources
    Stager._create_beam_sdk(sdk_remote_location, temp_dir))
  File "C:\Users\Adam\miniconda3\envs\GCP-DataFlow-3.8\lib\site-packages\apache_beam\runners\portability\stager.py", line 636, in _create_beam_sdk
    sdk_local_file = Stager._download_pypi_sdk_package(temp_dir)
  File "C:\Users\Adam\miniconda3\envs\GCP-DataFlow-3.8\lib\site-packages\apache_beam\runners\portability\stager.py", line 744, in _download_pypi_sdk_package
    processes.check_output(cmd_args)
  File "C:\Users\Adam\miniconda3\envs\GCP-DataFlow-3.8\lib\site-packages\apache_beam\utils\processes.py", line 96, in check_output
    raise RuntimeError( \
RuntimeError: Full traceback: Traceback (most recent call last):
  File "C:\Users\Adam\miniconda3\envs\GCP-DataFlow-3.8\lib\site-packages\apache_beam\utils\processes.py", line 91, in check_output
    out = subprocess.check_output(*args, **kwargs)
  File "C:\Users\Adam\miniconda3\envs\GCP-DataFlow-3.8\lib\subprocess.py", line 411, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "C:\Users\Adam\miniconda3\envs\GCP-DataFlow-3.8\lib\subprocess.py", line 512, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['C:\\Users\\Adam\\miniconda3\\envs\\GCP-DataFlow-3.8\\python.exe', '-m', 'pip', 'download', '--dest', 'C:\\Users\\Adam\\AppData\\Local\\Temp\\tmpswl98o48', 'apache-beam==2.27.0', '--no-deps', '--no-binary', ':all:']' returned non-zero exit status 1.

 Pip install failed for package: apache-beam==2.27.0
 Output from execution of subprocess: b''

We moved to Apache Beam!

Google Cloud Dataflow for Python is now Apache Beam Python SDK and the code development moved to the Apache Beam repo.

If you want to contribute to the project (please do!) use this Apache Beam contributor's guide. Closing out this issue accordingly.