wwkimball/yamlpath

yaml-set: Allow array.append using index [+] or [*]

azrdev opened this issue · 4 comments

Is your feature request related to a problem? Please describe.
Having read #56 I see the complexity of modifying a nested array/hash data structure, and why it's left to yaml-merge.
However, I humbly think some simple use cases could be brought to yaml-set instead of necessitating to build a full document for a merge: E.g. appending a new element to existing arrays (without knowing how many elements are already), and possibly creating arrays to append to, too.

Describe the solution you'd like
Indexing [-1] yields the last element, as expected, so could be used for creating a new array with the given element, but not append to an existing.
AIUI indexing [*] is not yet used and could be for this purpose, alternatively [+].

Describe alternatives you've considered
The current way is to build a full yaml document, and then yaml-merge the two. According to the docs, this has the drawback of stripping all comments and empty lines.

Additional context
yamlpath version 3.6.1

This may be a failing in the documentation. yaml-merge does not require a "full yaml document". The following is the intended experience for this specific use-case (append arbitrary elements to an Array):

$ yaml-merge --version
yaml-merge 3.6.1

$ cat lhs-array.yaml 
---
an_array:
  - alpha
  - beta
  - charlie

$ echo delta | yaml-merge --mergeat=/an_array lhs-array.yaml -
---
an_array:
  - alpha
  - beta
  - charlie
  - delta

As you can see, the new element was appended to the end of the target Array. The trick is to set --mergeat (-m) to whatever Array you wish to append to.

Using yaml-merge to append data or otherwise merge two documents together -- whole or fragments -- grants other benefits well beyond the capabilities of yaml-set. Consider the case of wishing for only unique data to be appended to an Array. The yaml-merge tool handles cases like this as well as vastly more complex cases:

$ yaml-merge --version
yaml-merge 3.6.1

$ cat lhs-dupes.yaml 
---
another_array:
  - uno
  - dos
  - tres

$ echo -e "- quatro\n- dos" | yaml-merge --arrays=unique --mergeat=/another_array lhs-dupes.yaml -
---
another_array:
  - uno
  - dos
  - tres
  - quatro

Note that the presence of --arrays=unique prevented the duplicate dos entry from being appended.

There is another crucial reason to consider all document append operations as a "merge" rather than a "set": YAML supports Anchors and Aliases, which must be handled very carefully so as to avoid document corruption. This is very complex and is one of the core reasons why yaml-merge exists apart from the trivial yaml-set tool. In fact, yaml-merge can handle Anchors and Aliases as inputs directly from the command-line. Consider the following:

$ yaml-merge --version
yaml-merge 3.6.1

$ cat lhs-complex.yaml 
---
aliases:
  - &a_value alpha
complex_array_1:
  - *a_value
complex_array_2:
  - ichi
  - *a_value
  - ni

$ echo "&b_value bravo" | yaml-merge --mergeat='/*' lhs-complex.yaml -
---
aliases:
  - &a_value alpha
  - &b_value bravo
complex_array_1:
  - *a_value
  - *b_value
complex_array_2:
  - ichi
  - *a_value
  - ni
  - *b_value

In this contrived example, the user practices "One Version of the Truth" and needed to add the same scalar value to multiple Arrays at the same time. In this case, an Anchor is defined and applied to multiple target Arrays via its Aliases (with a single command). Beyond this, any change to /aliases[&b_value] is automatically applied everywhere "*b_value" exists.

The yaml-set tool exists to trivially change the value of pre-existing data elements. While it is capable of generating just enough document structure to create a novel value at a previously non-existent YAML Path, it is not intended to handle the complexities of careful document merging.

In another light, adding a new YAML Path segment like [+] or [*] must be considered in the greater context of what such a segment would mean to other use-cases for YAML Paths. What should the yaml-get command do when it receives a YAML Path like "some.path.to[+]"?

I hope this information helps solve your use-case to your satisfaction. If however, you still strongly feel that yaml-set should allow the creation of arbitrary elements to the end of Arrays, I'd like to hear your additional thoughts.

Thanks for your extensive and quick reply!

The --mergeat trick is nice, and makes this workaround more feasible.
I'm mostly worried about a merge stripping all comments and (whitespace) formatting, since that round-trip capability is one of the selling points of yamlpath.

My usecase is a python application which modifies an ansible inventory (host_vars/$host.yml) and adds any number of variables specified on command line. By utilizing yamlpath as a library I just pass the varname/path to set_value and have it do all the heavy lifting -- except for appending to arrays.

In another light, adding a new YAML Path segment like [+] or [*] must be considered in the greater context of what such a segment would mean to other use-cases for YAML Paths. What should the yaml-get command do when it receives a YAML Path like "some.path.to[+]"?

Indeed this would open a class of YAML Paths which are only valid for modification, but not querying, so would need to rise an exception if used in a query. I'd understand if you were reluctant to add that possibility to yamlpath.

Thanks for your extensive and quick reply!

The --mergeat trick is nice, and makes this workaround more feasible.

I wouldn't think of this as a "workaround"; it is by deliberate design, necessitated by the inherent complexities of YAML's Anchor/Alias and Merge Key features. Whereas yaml-set is designed for atomic, trivial, scalar operations, yaml-merge is vastly more robust. To wit, yaml-merge can be used instead of yaml-set in most use-cases, though its relatively greater capabilities come with a burden of more granular configuration.

I'm mostly worried about a merge stripping all comments and (whitespace) formatting, since that round-trip capability is one of the selling points of yamlpath.

You may have missed the --preserve-lhs-comments (-l) option to yaml-merge or a Boolean preserve_lhs_comments property on the args object you can pass to MergerConfig. It is briefly discussed in the documentation and preserves all original documentation in the left-most document. I still discard all right-hand-side documentation due to comment-handling limitations of ruamel.yaml, which yamlpath is based upon. Let me know if you need some sample code to set this up.

My usecase is a python application which modifies an ansible inventory (host_vars/$host.yml) and adds any number of variables specified on command line. By utilizing yamlpath as a library I just pass the varname/path to set_value and have it do all the heavy lifting -- except for appending to arrays.

I have received user stories from people using this project in everything from Ansible to CloudFormations to Cloudify, Puppet, and others. I'm very happy this project is useful to so many people, including you. To this end, I enjoy discussions such as this, ever expanding and refining the usefulness of this project.

In another light, adding a new YAML Path segment like [+] or [*] must be considered in the greater context of what such a segment would mean to other use-cases for YAML Paths. What should the yaml-get command do when it receives a YAML Path like "some.path.to[+]"?

Indeed this would open a class of YAML Paths which are only valid for modification, but not querying, so would need to rise an exception if used in a query. I'd understand if you were reluctant to add that possibility to yamlpath.

I'm no fan of adding a formal YAML Path segment which is only useful to one particular use-case. Everywhere possible, I try hard to only add segments which are applicable to all get/set/merge/delete operations.

I'm closing this issue as resolved by way of illustrating the by-design solution to this need.