wwkimball/yamlpath

Feature: Enable getting just the names of keys, disregarding any child nodes

AndydeCleyre opened this issue · 9 comments

I can't tell if I'm failing to find this in the docs, or if it's missing from the docs, or if it's not currently possible. Honestly I can't for the life of me get an intuitive handle on the syntaxes but I'm always trying.

Is your feature request related to a problem? Please describe.

With a doc like this:

svcs:
  coolserver:
    enabled: true
    exec: ./coolserver.py
  logsender:
    enabled: false
    exec: remote_syslog -D

I would like to build, in the shell, an array of two strings like (coolserver logsender).

Describe the solution you'd like

An easily found example of such an operation in the wiki.

I expect this is possible with yaml-get, via something vaguely similar to

$ svcs=($(yaml-get -p 'svcs[. =~ /.*/][parent()]' /path/to/yml))

Describe alternatives you've considered

Using dasel, this can be done with

$ svcs=($(dasel -m -f /path/to/yml svcs.-))

It's interesting that you're asking this within barely two hours of me wondering how to solve the very same query. I've never heard of dasel, nor have I ever seen any query syntax which would yield only the parent's key of a complex data structure (discarding all child nodes). In its present design, YAML Path selects and returns entire nodes. So, when a matched node has children, those children are selected right along with their matched parent. This is almost always the desired outcome.

Until today.

I model YAML Path's syntax after that of Hiera and XPath. It so happens that I'm presently adding some XPath-like Search Keywords because #107 presented a compelling use-case for has_child(NAME). I've wanted to add parent([STEPS]) for quite awhile and have taken this opportunity to do so along with max(NAME) and min(NAME). For interest, I was updating Search Expressions when you posted this issue. Near the bottom of the page, I had just asked myself, "How would I return just the product names from this result?" right as your request came it. Serendipity?

I've never seen the - symbol used in Hiera or XPath. Do any other tools also use it, or is this novel to dasel?

I also am not familiar with that syntax, but just found it here after exploring with yaml-get for a while.

Looking at what jq offers, I see some keys keyword, but it looks like it only shows top-level keys, and I'm ignorant of more advanced usage.

It looks like XPath uses a keyword for exactly this, name(). Since I'm presently adding a whole new capability around keywords, I'll add [name()] to the 3.5.0 project list to have parity with XPath. I'm on the fence about adding a - segment type for the same (plus list indexes). I'm just not convinced that - is sufficiently representative of what it does and I cannot think of any use-case for dumping the array index of list items without their respective values.

Thanks!!

Sorry to offer a suboptimal alternative, but it has occurred to me that version 3.5.0 is a ways off and you probably need a solution sooner rather than later. If you'd like a temporary solution for the question of how to get the selected key names, you could chain some tools to abuse a yaml-set | yaml-diff chain, like this:

$ cat products.yaml
---
products_hash:
  doodad:
    availability:
      start:
        date: 2020-10-10
        time: 08:00
      stop:
        date: 2020-10-29
        time: 17:00
    dimensions:
      width: 5
      height: 5
      depth: 5
      weight: 10
  doohickey:
    availability:
      start:
        date: 2020-08-01
        time: 10:00
      stop:
        date: 2020-09-25
        time: 10:00
    dimensions:
      width: 1
      height: 2
      depth: 3
      weight: 4
  widget:
    availability:
      start:
        date: 2020-01-01
        time: 12:00
      stop:
        date: 2020-01-01
        time: 16:00
    dimensions:
      width: 9
      height: 10
      depth: 1
      weight: 4

$ cat products.yaml | yaml-set --change='products_hash.*.REMOVE' --value='' | yaml-diff products.yaml - | grep '^a\s' | cut -d. -f2
doodad
doohickey
widget

What I'm doing above is adding an arbitrary, unique child key to all of the nodes I want the names of. I then use yaml-diff to show me what changes from the original file and then I cut out just the key name I actually want. With the desired data being printed out like this, capturing it into an array in a shell script is trivial.

I hope this chain is helpful until yamlpath directly supports this capability with the upcoming [name()] keyword search!

I'd like to share that I've just added an initial implementation of the new [name()] keyword and it is working in early tests. Using your data, here's how it looks:

$ cat services.yaml 
svcs:
  coolserver:
    enabled: true
    exec: ./coolserver.py
  logsender:
    enabled: false
    exec: remote_syslog -D

$ yaml-get --query='svcs.*[name()]' services.yaml 
coolserver
logsender

Thanks very much! The upcoming keyword search syntax is looking good.

From the original doc:

svcs:
  coolserver:
    enabled: true
    exec: ./coolserver.py
  logsender:
    enabled: false
    exec: remote_syslog -D

How would one construct a same-structure doc with the disabled services filtered out? Targeting this result:

svcs:
  coolserver:
    enabled: true
    exec: ./coolserver.py

Something like the following? But I know I don't even have the yaml-get part right:

% yaml-merge -m svcs =(<<<"{'svcs': {}}") =(yaml-get -p 'svcs[*.enabled == true]') vars.yml

Combined with the upcoming [parent()] keyword, the yaml-set command has a --delete option which can remove the unwanted nodes. If you wanted to write a different file from the source as part of the operation, use the stream mode, like this:

$ cat services.yaml | yaml-set --change='svcs.*[enabled=False][parent()]' --delete
---
svcs:
  coolserver:
    enabled: true
    exec: ./coolserver.py

Tacking a file redirect at the end would generate the new file, like: cat services.yaml | yaml-set --change='svcs.*[enabled=False][parent()]' --delete >/your/new/file.yaml

I have published version 3.5.0, which includes this new capability.