dfinke/ImportExcel

Support for adding new columns with Export-Excel -Append

nogaff opened this issue ยท 8 comments

I think it would be really useful if you could do something like this:

[pscustomobject]@{Foo = 1; Baz = 2} | Export-Excel foobar.xlsx -Append
[pscustomobject]@{Foo = 3; Bar = 4; Baz = 5} | Export-Excel foobar.xlsx -Append
[pscustomobject]@{Foo = 6; Bar = 7} | Export-Excel foobar.xlsx -Append

and end up with the following output in foobar.xlsx:

Foo Baz Bar
1 2
3 5 4
6 7

This would allow a script to easily gather data in a spreadsheet over time, even when the column names are variable and/or not known in advance.

Unless I've missed something, it seems that currently, only columns that already exist in the very first row can be appended to, so all you get is this:

Foo Baz
1 2
3 5
6

which means that the only way to achieve the desired result is to inefficiently import the entire spreadsheet every time you want to append a row, then manipulate the column names as required and write the whole thing back out to file.

If appending new columns were to be supported, I imagine it would need to be made optional via an extra switch, e.g. Export-Excel foobar.xlsx -Append -IncludeNewColumns or something to that effect.

That is the way the object flow engine works. Said another way, that's the way the cookie crumbles ๐Ÿ˜†

Your example shows 3 records, what if there were 10K. Plus, keeping track of properties per record, and they are in different orders.

Not gonna do it.

Image

@nogaff there is a rescue trick.
Use Select-Object Foo,Baz,Bar before Export-Excel to align input object structure

@scriptingstudio to do that you'd need to know all the possible column names in advance, but as mentioned in my OP, I was really thinking of scenarios where that isn't the case.

Here's another dumb example of what I was getting at. Imagine that for some reason you wanted to record the number of instances of running processes over time, like this:

for ($i = 1; $i -le 10; $i++) {
  $processCounts = @{}
  Get-Process | Group-Object ProcessName | ForEach-Object {
    $processCounts[$_.Name] = $_.Count
  }
  [PSCustomObject]$processCounts | Export-Excel process_counts.xlsx -Append
  Start-Sleep -Seconds 2
}

Any new process names that appear after the first iteration will not be included in the .xlsx file, and you have no idea what those new process names might be before you start, so you can't use Select-Object to solve the problem.

Not being familiar with the inner workings of the .xlsx format and the ImportExcel module, I thought it might be fairly trivial for Export-Excel -Append to take each new row of input and compare it to the first row of the file, adding new column names to the first row where necessary, before appending the new row to the end of the file, without having to touch any other rows in-between, but if @dfinke says it's not that simple then so be it!

anything can be coded. this feels like a can of worms that keeps on giving ๐Ÿ™‚.

i prefer to shape the data and then export it to Excel.

if a robust approach to create that data structure can be done, then it is simple to export it.

I believe that Python Pandas dataframes can handle this, needs a fact check. They take the approach of shaping the data and then you can export to any format.

Should be easy to check if it works that way and then think about how to port it to PowerShell.

Another idea I was thinking about was the option -noHeader that is available on Export-Excel. One of the ways, if I remember correctly, was to get the Excel column names as header (A, B, C, ....).

If Import-Excel supports something similar, one could shape the object by using the Excel column names before writing the data back to Excel.

Might be off here... but I remember a feature like this from my previous tests.

Upon Further Review

Needs to be fact, perf and boundary checked.

$data = $(
    [pscustomobject]@{Foo = 1; Baz = 2 }
    [pscustomobject]@{Foo = 3; Bar = 4; Baz = 5 }
    [pscustomobject]@{Foo = 6; Bar = 7 }
)

$pn = [System.Collections.Generic.HashSet[string]]::new()

foreach ($row in $data) {
    foreach ($key in $row.PSObject.Properties.Name) {
        $pn.Add($key) | Out-Null
    }
}

$data | Select-Object @($pn)

Image

$null = $pn.Add($key) is MUCH faster and even a bit faster using $pn = [ordered]@{}

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.