ansible-community/ara

Add parameter to ignore file path patterns to avoid saving matched files

Closed this issue · 19 comments

The callback will retrieve files that are loaded by Ansible throughout the execution of the playbook. This could contain files that users might not want to save, either because they are sensitive or are simply not relevant.

The implementation would be somewhat similar to that we already do with ignored_arguments and ignored_facts for ignoring specified playbook parameters and host facts.

I was looking at something similar. We are using a "site-configuration" that holds specific private variables. I would ideally be able to ignore that file during the upload.

I can take a crack at implementing the feature.

@LaurentDumont it would be nice to implement this indeed. Happy to review patches !

Patterning off of ignored_arguments and ignored_facts should work.. there's:

I've started a quick fix for our use case - https://review.opendev.org/#/c/703276/

  • It doesn't allow for pattern matching but our current need is limited to ignoring specific unique file names.
  • A large majority of the Ansible files are probably going to be named main.yml. It could be a good idea to allow ignoring based on file path (or a keyword within the file path).
  • Zuul failed once but the two errors we're somewhat nebulous and not related to my changes. I've re-run the CI and all tests passed.

Let me know if it's an okay start! I've tested locally by modifying the callback directly and I can see the file content being sanitized.

@LaurentDumont thanks for the patch !

A large majority of the Ansible files are probably going to be named main.yml. It could be a good idea to allow ignoring based on file path (or a keyword within the file path).

Yes, I think what we actually want are patterns instead of absolute file names because in the end, absolute file names can also be specified as patterns.
For example, a pattern of main.yml would match all the main.yml files but main.y[a]?ml would also match main.yaml.

If the users aren't interested in saving group_vars, they could specify a group_vars pattern which would prevent saving any file path where "group_vars" is present.

There are some approaches for matching strings against a list of regexes on stack overflow that could be worth taking a look into.

Zuul failed once but the two errors we're somewhat nebulous and not related to my changes. I've re-run the CI and all tests passed.

It happens ¯\(ツ)

@LaurentDumont I was thinking we'd probably want to have some minimal testing to demonstrate that we really aren't saving files that are filtered.
I have been tinkering with an approach and was wondering what you thought about it ?

It looks like this in practice and the patch for the implementation is here.

I think users could come up with their own use cases for it but I'd use it for integration testing ara itself.

For example, we could have a test where ARA would normally include a "secrets.yml" file, assert that it's there but actually empty:

- name: Retrieve the current playbook
    ara_playbook:
    register: playbook_query

- name: Recover secret file from ARA
  vars:
    playbook_id: "{{ playbook_query.playbook.id | string }}"
    query: "?playbook={{ playbook_id }}&path=secrets.yml"
  set_fact:
    file: "{{ lookup('ara', '/api/v1/files/' + query) }}"

# TODO: This wouldn't actually work because the file list view doesn't have the content detail
- name: Assert file
  assert:
    that:
      - file.content == "Not saved by ARA as configured by 'ignored_files'"

I like that! That could be a good test to have for the feature. I'll work on integrating the regex matching for the full file path. It's going to be more flexible than just the file name.

Before I forget, what is the right way to actually let the callback output the debug? I'm seeing self.log.debug("v2_playbook_on_start") in the callback but using -v on the playbook doesn't seem to let it output.

Nevermind!

Looks like adding logging.basicConfig(level=logging.DEBUG) to the ara_default.py works to let the debug stuff output.

Before I forget, what is the right way to actually let the callback output the debug? I'm seeing self.log.debug("v2_playbook_on_start") in the callback but using -v on the playbook doesn't seem to let it output.

Nevermind!

Looks like adding logging.basicConfig(level=logging.DEBUG) to the ara_default.py works to let the debug stuff output.

export ARA_LOG_LEVEL=DEBUG should do it.

edit: There's also export ARA_DEBUG=true for the Django bits.

So, I've been playing with this for a little bit and I'm a bit stuck. It looks like the behavior with escaping string/special characters and dealing with full path (with multiple "/") gets quirky and messy quickly.

I was looking at the behavior of getting the variables for the ansible.cfg file and I just wanted to confirm that the following was expected.

This is when I'm printing the value of the variable itself.

DEBUG:ara.plugins.callback.default:Show ignored_arguments
['\'["extra_vars"', '"vault_password_files"]\'']

DEBUG:ara.plugins.callback.default:Show ignored_facts
['\'["ansible_env"', '"ansible_all_ipv4_addresses"]\'']

DEBUG:ara.plugins.callback.default:Show ignored_files
['\'["test_ignored_file_1"', '"test_ignored_file_2"]\'']

#Print each value of the list ignored_files
'["test_ignored_file_1"
"test_ignored_file_2"]'

It looks like the "final" value includes the brackets and the single quotes. It also escapes the outer single quote. I am missing something in the way the list is built from the .ini value?

It looks like this

ignored_facts = '["ansible_env", "ansible_all_ipv4_addresses"]'
ignored_arguments = '["extra_vars", "vault_password_files"]'
ignored_files = test_ignored_file_1, test_ignored_file_2, *.test.*

Also works and outputs the variables without all the extra stuff around.

Are you looking at the values before or after the get_option ? Ansible should parse an "ini-style" list into a proper python list we can work with. Are you printing them as strings ? Using json.dumps ? Something else ?

That said, specifying lists in ini-style format (or in a format appropriate for an environment variable export) is not really great from a UX perspective -- case in point: the issue you had raised about allowed_hosts.

I need to think a little bit about this.

I'm just dumping the values after the get_option. For the escape stuff, it looks like you need repr to get printed as the raw value. Printing with a loop seems to work correctly without repr...

self.log.debug("Show ignored_arguments")
print(self.ignored_arguments)

self.log.debug("Show ignored_facts")
print(repr(self.ignored_facts))

self.log.debug("Show ignored_files")
print(repr(self.ignored_files))

for x in self.ignored_files:
    print(x)
for x in self.ignored_arguments:
    print(x)

Output

DEBUG:ara.plugins.callback.default:Show ignored_arguments
['\'["extra_vars"', '"vault_password_files"]\'']

DEBUG:ara.plugins.callback.default:Show ignored_facts
['\'["ansible_env"', '"ansible_all_ipv4_addresses"]\'']

DEBUG:ara.plugins.callback.default:Show ignored_files
['["test_ignored_file_1"', '"test_ignored_file_2"]']

["test_ignored_file_1"
"test_ignored_file_2"]

'["extra_vars"
"vault_password_files"]'

Maybe limiting to the file name would be a good start. I'm looking at our playbooks and I don't see a case where the regex would win over just the string matching technique.

@LaurentDumont you're right, maybe we don't need regexes. I added a comment in the patch, let me know what you think ?

@LaurentDumont you're right, maybe we don't need regexes. I added a comment in the patch, let me know what you think ?

That's good for me! I've added a patch. It works from what I've been able to test. I'll see about the integration tests as well.

That said, I'm still unsure about the proper format in ansible.cfg. The only way I could make it work is by using the following format. I've also noticed that that the ignored_facts didn't seem to work on my side.

[ara]
api_client = http
api_server = http://127.0.0.1:8000
ignored_facts = '["ansible_env", "ansible_all_ipv4_addresses"]'
ignored_arguments = '["extra_vars", "vault_password_files"]'
ignored_files = test_ignored_file_1, test_ignored_file_2, create_domains.yml

I'm going to try a fresh install somewhere just to make it isn't something local on my side.

@LaurentDumont the list format stuff is weird and confusing. Sorry about that.
There is also a distinction between how dynaconf parses things for the API server and how Ansible parses things for the callback plugin.

I've bumped into this when implementing a default label feature and ended up doing it this way:

# ara_playbook_labels can be supplied as a list inside a playbook
# but it might also be specified as a comma separated string when
# using extra-vars
if isinstance(play_vars["ara_playbook_labels"], list):
labels.extend(play_vars["ara_playbook_labels"])
elif isinstance(play_vars["ara_playbook_labels"], str):
labels.extend(play_vars["ara_playbook_labels"].split(","))
else:
raise TypeError("ara_playbook_labels must be a list or a comma-separated string")

We should probably go back and do something similar for the other Ansible callback settings.

Before I forget, what is the right way to actually let the callback output the debug? I'm seeing self.log.debug("v2_playbook_on_start") in the callback but using -v on the playbook doesn't seem to let it output.

Nevermind!

Looks like adding logging.basicConfig(level=logging.DEBUG) to the ara_default.py works to let the debug stuff output.

btw I realized that this is an issue and I've created one for it: #112

@LaurentDumont the list format stuff is weird and confusing. Sorry about that.
There is also a distinction between how dynaconf parses things for the API server and how Ansible parses things for the callback plugin.

I've bumped into this when implementing a default label feature and ended up doing it this way:

# ara_playbook_labels can be supplied as a list inside a playbook
# but it might also be specified as a comma separated string when
# using extra-vars
if isinstance(play_vars["ara_playbook_labels"], list):
labels.extend(play_vars["ara_playbook_labels"])
elif isinstance(play_vars["ara_playbook_labels"], str):
labels.extend(play_vars["ara_playbook_labels"].split(","))
else:
raise TypeError("ara_playbook_labels must be a list or a comma-separated string")

We should probably go back and do something similar for the other Ansible callback settings.

It could be good. I've been testing with the following format and it seems to work for ENV variables and the ansible.cfg .ini method.

With the latest master, if I use the following syntax.

[ara]
api_client = http
api_server = http://ara.coldnorthadmin.com:8000
default_labels = '["dev","deploy"]'
ignored_facts = '["ansible_env", "ansible_all_ipv4_addresses"]'
ignored_arguments = '["extra_vars", "vault_password_files"]'

I end up with

image

And the ignored_facts and ignored_arguments do not match anything and the output is saved on the api-server.

That said, if I use the following syntax for all the arguments, it works as either the .ini or the ENV variables.

#ini
ignored_facts = ansible_env,ansi
ble_all_ipv4_addresses
ignored_arguments = extra_vars,vault_password_files
ignored_files = install_prometheus_server.yml

#ENV
export ARA_IGNORED_FACTS=ansible_env,ansible_all_ipv4_addresses
export ARA_IGNORED_ARGUMENTS=extra_vars,vault_password_files
export ARA_IGNORED_FILES=install_prometheus_server.yml

It seems a bit more clean than whatever is being parsed by Dynaconf.

@LaurentDumont yeah, in your case default_labels was parsed as a string which was then split into a list delimited by , so you ended up square brackets and double quotes.

There must have been a good reason why we documented the format with square brackets and double quotes (format consistent with dynaconf?) but I can't think of anything right now. If it works without, we should definitely document it that way.

I sent a patch to update the docs: https://review.opendev.org/705115

Do you think your patch for the file filtering feature is good to go ? We can add docs and tests in different patches, it's fine.

@LaurentDumont yeah, in your case default_labels was parsed as a string which was then split into a list delimited by , so you ended up square brackets and double quotes.

There must have been a good reason why we documented the format with square brackets and double quotes (format consistent with dynaconf?) but I can't think of anything right now. If it works without, we should definitely document it that way.

I sent a patch to update the docs: https://review.opendev.org/705115

Do you think your patch for the file filtering feature is good to go ? We can add docs and tests in different patches, it's fine.

Sounds good to me. Ready to merge!

Do you think your patch for the file filtering feature is good to go ? We can add docs and tests in different patches, it's fine.

Sounds good to me. Ready to merge!

Done! Merged as 23aaffb

Thanks a lot for your work on this ! It'll land in ARA 1.4 🎉