cf-platform-eng/concourse-pypi-resource

When 'repository.authenticate' is set to 'always' credentials are leaked in the logs

ostefano opened this issue · 1 comments

Hi,

When repository.authenticate is set to always, username and password are used both at upload and download time.
When this happens the credentials are leaked in the logs when uploading the package (put)

Finding package to upload
Glob output/* matched files: ['/tmp/build/put/output/package-0.0.7-py3-none-any.whl', '/tmp/build/put/output/package-0.0.7.tar.gz']
Check that the package name = input name
Uploading /tmp/build/put/output/package-0.0.7.tar.gz version 0.0.7
Uploading distributions to https://username:password@repo_url.com/
Uploading package-0.0.7.tar.gz
100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 168.0/168.0 kB • 00:00 • 84.9 MB/s

Note that the leakage does not happen at download time (get), even though username and password are still required for the download to succeed.

Output directory: /tmp/build/get
Try 1 of 3 to call function `download_version` from module `pypi_resource.in_`
Downloading https://repo_url.com/package/package-0.0.7.tar.gz (164 kB)

I believe a way to fix this issue would be to sanitize the url in the upload_package function here:

def upload_package(pkgpath, input):
    repocfg = input['source']['repository']
    twine_cmd = [sys.executable, '-m', 'twine', 'upload']

    url, unused_hostname = pipio.get_pypi_url(input, 'out')

    username = repocfg.get('username', os.getenv('TWINE_USERNAME'))
    password = repocfg.get('password', os.getenv('TWINE_PASSWORD'))
    if not (username and password):
        raise KeyError("username and password required to upload")

    twine_cmd.append(pkgpath)

    env = os.environ.copy()
    env.update({
        'TWINE_USERNAME': username,
        'TWINE_PASSWORD': password,
        'TWINE_REPOSITORY_URL': url
    })

    try:
        subprocess.run(
            twine_cmd,
            stdout=sys.stderr.fileno(),
            check=True,
            env=env
        )
    except subprocess.CalledProcessError as e:
        raise SystemExit(e.returncode)

The reasoning is that the username and password are removed from the URL, but this should work because twine receives username and password from the environmental variables as well.

Note that the line leaking the credentials is emitted by twine (https://github.com/pypa/twine), see commands.upload:

repository_url = cast(str, upload_settings.repository_config["repository"])
print(f"Uploading distributions to {repository_url}")

However, upload_settings.repository_config["repository"] is initialized as follows (settings.py):

self.repository_config = utils.get_repository_from_config(
    self.config_file,
    repository_name,
    repository_url,
)
self.repository_config["repository"] = utils.normalize_repository_url(
    cast(str, self.repository_config["repository"]),
)

and from utils.py

def get_repository_from_config(
    config_file: str,
    repository: str,
    repository_url: Optional[str] = None,
) -> RepositoryConfig:
    """Get repository config command-line values or the .pypirc file."""
    # Prefer CLI `repository_url` over `repository` or .pypirc
    if repository_url:
        _validate_repository_url(repository_url)
        return {
            "repository": repository_url,
            "username": None,
            "password": None,
        }

    try:
        return get_config(config_file)[repository]
    except OSError as exc:
        raise exceptions.InvalidConfiguration(str(exc))
    except KeyError:
        raise exceptions.InvalidConfiguration(
            f"Missing '{repository}' section from {config_file}.\n"
            f"More info: https://packaging.python.org/specifications/pypirc/ "
        )

Meaning that username and password are managed separately (and thus not leaked) when they are passed as environmental variables (they are actually managed by the auth.Resolver object).