jfrog/terraform-provider-artifactory

Add Environment Filter to artifactory_repositories Data Source

Opened this issue ยท 15 comments

Is your feature request related to a problem? Please describe.
I'm writing custom granular platform permissions and using an artifactory_repository data source to pull in a list of repositories I can then prune. I have implemented a custom environment with remote repositories exempt from our Xray policies specifically for security exceptions that I do not want to be included in the platform permission. The only solution to prune these repos from the repository list seems to be a semantic naming convention.

Describe the solution you'd like
I would like the artifactory_repository data sources to support environments specifically, but more broadly all "compile-time" (plan) repository attributes as suitable filter criteria. Accepting lists as inputs would also be a nice QoL improvement, i.e. environments = ["DEV", "PROD"] would be a list of all repos that are either in DEV or PROD.

Describe alternatives you've considered
As stated earlier, I've devised a semantic naming convention, which gets the job done but is implicit and not self-documenting. The solution is a for loop with an if !endwith(repo.key, "-deferral"). This works but would require extra intermediate lists if I wished to prune the repo list by additional criteria.

@eelginUPS Thanks for the suggestion. However, the Artifactory API currently does not support environment as a query parameter. The API response also does not contain environment so the provider also can't filter after retrieving the data.

This can probably be implemented by using the project provider, since environment is a project related concept and not an artifact concept, although I defer to @alexhung on this since I am just a lurker and not a maintainer.

So, if you have datasource_project and then apply a filter here, you could get all repos from all projects with a filter of DEV. The implementation though would be in the project provider and not the artifact provider.

Thoughts @alexhung ?

I guess there is some confusion around how the providers are separated as it seems inconsistent. project_environment and project_key are both repository properties, yet project_key is technically deprecated and the use of the project_repository is recommended. However, the project_environment attribute only exists in the artifactory_repository, not the project_repository resource, and depends on a repository already being assigned a project.

As an end user, it seems that the artifactory_repository data source should have most/all artifactory_repository attributes available to be used as a filter.

Also, if you want to have a project provider with a project_repository resource that applies project attributes to existing repositories, it should be used to apply all project attributes. This would mean following suit with project_key by migrating project_environment to be set in project_repository, and heavily discouraging using the property in artifactory_repository.

EDIT: I'm reading through some of the API docs and realize I may have to raise this issue up a level of abstraction. Luckily this isn't bottleneck for my work though.

Your confusion is understandable. I originally prototyped the implementation for the project provider. The concept of an environment only exists in the scope and context of a project. If you don't have project support in your version of artifactory no amount of cleverness is going to get you all the 'DEV' repos

@eelginUPS Your confusions/concerns are noted. The providers are constrained by the JFrog REST API capability. You're correct that environments should be a filter for querying repos but the Get Repositories API doesn't support that currently.

FYI @chb0github Environments are no longer project-only. Environments can also exist on a Global level: https://jfrog.com/help/r/jfrog-rest-apis/environments

And neither AQL or the GraphQL API has the environment field to filter on. So we are need the API support added first.

In theory, we can fetch all repository info (based on the existing filters), then fetch the configuration for each repos in the list separately, then filter the result on the provider side.

This is obviously not a scalable solution. Just putting it out there as a thought experiment.

@eelginUPS I encourage you to reach out to your JFrog contact and make a feature request. I'll do the same internally.

@alexhung Thanks for the update. Like I said: defer to you. So, this can be implemented, as you mentioned @alexhung by a repository wide search. Usage of go routines to parallelize this definitely help with performance.

As an aside @eelginUPS - I put together this script that demonstrates is:

curl -snLf https://myartifactory/artifactory/api/repositories | 
    jq -re '.[].key' | 
        xargs printf 'https://myartifactory/artifactory/api/repositories/%s\n' | 
            xargs -n 10 -P 10 curl -snlf | 
                jq -sre 'map(select(.environments? | contains(["DEV"]))'

This will spawn 10 threads and fetch 10 resources at a time.

Depending on how you want to scale this (client side, server side), the parallelism is an option:
Since this is a request coming long long after the maturity of the provider, it seems to me that it's not a feature request likely to be abused.

Would a brute force implementation be a welcome contribution @alexhung ? Using go routines?

@chb0github We need a way to control the Go routine API request rate, as to not trip the rate limiter on the platform.

Combine with appropriate warnings in documentation, I think this is doable.

The ideal solution is still for the API to include environments as a filter so the load is handled server side ๐Ÿ˜„

Agreed @alexhung , but you know how long that could take ๐Ÿ˜

Let me investigate if there is anyway to do the equivalent of a thread pool in go

@eelginUPS what sort of syntax/hcl are you thinking? Sketch out what you had in mind

@alexhung can you supply a sketch of the syntax you/he had in mind?

For the record, the above is able to fetch and filter, through brute force, 154 repositories in under 2 seconds:

time curl -snLf https://myartifactory/artifactory/api/repositories | jq -re '.[].key' | xargs printf 'https://myartifactory/artifactory/api/repositories/%s\n' | xargs -n 10 -P 10 curl -snlf | jq -sre 'map(select(.environments | contains(["DEV"])))' > /dev/null
curl -snLf https://myartifactory/artifactory/api/repositories  0.02s user 0.01s system 3% cpu 0.928 total
jq -re '.[].key'  0.00s user 0.00s system 0% cpu 0.928 total
xargs printf   0.00s user 0.00s system 0% cpu 0.940 total
xargs -n 10 -P 10 curl -snlf  0.18s user 0.18s system 18% cpu 1.935 total
jq -sre 'map(select(.environments | contains(["DEV"])))' > /dev/null  0.02s user 0.01s system 1% cpu 1.935 total

@alexhung how would the current -parallel in tf assist or hinder this?

can you supply a sketch of the syntax you/he had in mind?

@chb0github The data source already has 2 attributes for filtering (repository_type and package_type). Adding a new attribute environments (type schema.SetAttribute) would be consistent.

how would the current -parallel in tf assist or hinder this?

@chb0github For scale testing, I think we need to ensure the instance is healthy even with 5-10k repositories (not uncommon in some of our larger customer installations). Local installation masks a fair amount of network traffic latency, etc. so testing on a SaaS/cloud instance is needed.

-parallelism doesn't apply here, I think. https://developer.hashicorp.com/terraform/cli/commands/apply#parallelism-n Since this applies to TF building up the dependency graph, not the provider execution.