tidyverse/googlesheets4

Cells spanning multiple columns or rows

Opened this issue · 5 comments

We would like to access information about the cells spanning multiple columns or rows. From what I understand, this is implemented by reading the merges field from the Sheet object (https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets/sheets). I don't think this information is currently imported.

I might be interested in submitting a PR for this feature, maybe as an attribute or additional column (n_cols/n_rows) on the tibble returned by read_cells?

Yes, this is something I would like to deal with in googlesheets4, but have not gotten around to it.

Yes, range_read_cells() is definitely a major player here.

Another relevant function is new_googlesheets4_spreadsheet(). I suspect it will need to learn about merged cells.

As for a PR, I should pre-emptively state that this might be a feature that is big enough that I would want to do it myself. So you should just enter into the effort knowing that it might not get merged.

But a working PR is always useful. It would give you the capability, sooner rather than later, and would certainly surface the relevant questions and tricky bits. So if you want to have a go, you should (with the caveat above).

Thanks, that sounds fair. I just had a brief look and I think that the basic functionality can be achieved fairly easily via vctrs::vec_locate_matches. I'll try to sketch the basic implementation later when I have some time, then you can still decide how you want to proceed.

I think one of the harder questions is deciding how to expose merged cells in the user-facing functions. So deciding how the new awareness of merged cells shows up in the user interface.

Attached is a basic patch for reading merged cell ranges in range_read_cells(). This adds two columns: n_cols and n_rows to the resulting cell tibble. I haven't touched location formatting or the documentation for this patch. If you are interested, I can make this into a full PR, with documentation updates and tests.

I think one of the harder questions is deciding how to expose merged cells in the user-facing functions. So deciding how the new awareness of merged cells shows up in the user interface.

Yes, absolutely. To be honest, I am unsure whether this should even be exposed at the higher interface level. If one wants to do it properly, this would need to be supported in the upstream tibbles/pillar, which means a massive overhaul across the entire tidyverse. Not to mention more than a few tough choices, as merged cells don't play well with the basic design of R data frames. In fact, I doubt that this kind of effort will pay off anyway, as any design is likely to introduce tradeoffs that will harm practical use cases. It's probably best to accept that spreadsheets and R use different data models, provide the basic mapping interface for the common case (as is done now), and give the advanced user the space to do advanced stuff if they need to. People who read complex sheets will need to process them anyway before they are useable in R (for example, in our project we also need to consider cell color and notes).

Thus, for googlesheets4, I would suggest to start by exposing merged cells on a lower API level. The default behavior which puts a value in the starting cell and leaves the remaining cells as NA is already sane. For advanced use range_read_cells() is already a great API entry. Probably would be also interesting to have a symmetrical range_write_cells() that uploads a range of cells (using some format as what the read cells returns). In fact, I think my team will need this kind of functionality anyway, so I'll have to implement it this way or another.

Looking forward to hearing your thoughts!

merged_cells.patch

Thanks for starting the experiments! Can you make this as a pull request (as opposed to a patch file)? That's a lot more natural for my workflow.