Duplicate columns in sp_readListData

Question

Duplicate columns in sp_readListData

Closed this issue 4 years ago · 9 comments

I have been testing your (very helpful) package to retrive data from a sharepoint. Every step to connect, retrieve Lists, access metadata from a given list is working well. But I am having an issue when retrieving list data through the sp_readListData.

As it is difficult to make a good reprex, I will describe a simplified version of my list : 3 columns (object name, object value, property_type), 343 rows.

I first try to read the data using the command

liste_data <- sp_readListData(con = con_sharepoint, listID = liste_meatadata$id)

This returns an error :

Error in `.rowNamesDF<-`(x, value = value) : 
  duplicate 'row.names' are not allowed
De plus : Warning message:

 Error in `.rowNamesDF<-`(x, value = value) : 
  duplicate 'row.names' are not allowed

I tried with the additionnal option expand = TRUE. And I get a result.

liste_data <- sp_readListData(con = con_sharepoint, listID = liste_metadata$id, expand = TRUE)

The liste_data data.frame has 343 observations as expected. But too many columns :

object name appears 3 times (object.name, object.name.1, object.name.2) as well as object value,
property_type is not duplicated,
The first 200 rows of object.name are consistent while object.name.1 and object.name.2 are filled with NA. From row 201 to 300, object.name and object.name.2 are filled with NA while object.name.1 contains valus. Behavior is the same for object value columns. Data in property_type are relevant on the other hand.

This looks, from what I understand, as a combined problem with :

List paging,
Column renaming to remove special characters,
Data joins.

Data paging is set to 100 on my Sharepoint. So I do not understand why I seems to be working for the first 200 rows and then not.

Answer 1 · 2020-11-23T08:34:19.000Z

Hi @pwaeckerle

I'm sorry to hear that. Unfortunately I have no SharePoint license anymore and therefore cannot reproduce your issue.
My guess would be however that there are issues with the spaces in your column names. Could you check if that fixes your problem?

Answer 2 · 2020-11-24T09:30:54.000Z

Hi @LukasK13 ,

Thank you for your reply. I made a try with a simple list containing > 200 elements and two colums : "id" and "data".

I get that return from sp_readListData :

   title data Pièces.jointes Pièces.jointes.1 Pièces.jointes.2 Pièces.jointes.3
1 ligne1    1          False             <NA>             <NA>             <NA>
2 ligne2    2          False             <NA>             <NA>             <NA>
3 ligne3    3          False             <NA>             <NA>             <NA>
4 ligne4    4          False             <NA>             <NA>             <NA>
5 ligne5    5          False             <NA>             <NA>             <NA>
6 ligne6    6          False             <NA>             <NA>             <NA>

So, I have an additionnal column Pièces jointes (attached in French) that contains duplicates due to the spaces in the name. But the data column, having no spaces, are fine. I tried then to filter as you propose using dplyr verbs :

essai_data <- sp_list(con = connection_sharepoint, listName = "essai_sharepointr") %>% 
  sp_select("title", "data") %>% 
  sp_collect()

This gets better but this time I have duplicate columns names ...

> head(essai_data)
   title data  title  title  title
1 ligne1    1 ligne1 ligne1 ligne1
2 ligne2    2 ligne2 ligne2 ligne2
3 ligne3    3 ligne3 ligne3 ligne3
4 ligne4    4 ligne4 ligne4 ligne4
5 ligne5    5 ligne5 ligne5 ligne5
6 ligne6    6 ligne6 ligne6 ligne6

It might be related to the fact that I edited the default key column in sharepoint. He might not like it !

Answer 3 · 2020-11-24T17:39:11.000Z

This behavior might result from multiple title columns in sharepoint. I know this sounds strange, but sharepoint has several layers of internal column names which are translated into user clear text names. Especially renaming columns can lead to a situation with multiple columns having the same name. Unfortunately you have no chance to detect this issue via the webinterface, you will have to use sharepoint designer for this.

Another thing to note ist that sharepoint sometimes creates multiple title columns of which you will only see one in the webinterface. I decided to ignore the title column for all applications for this reason.

Answer 4 · 2020-11-26T17:41:29.000Z

Il tried to follow the core of your sp_readListData with my fields containing spaces.

I stumbled on the code on lines 107 - 109 in sharepoint_list_get.R

        data = if (nrow(data) == 0) data_temp else rbind(
          data.frame(c(data, sapply(colnames(data_temp)[!colnames(data_temp) %in% colnames(data)], function(x) NA))),
          data.frame(c(data_temp, sapply(colnames(data)[!colnames(data) %in% colnames(data_temp)], function(x) NA)))

You retrieve elements 100 at a time in data_temp and the pass it to data. The passing is either an equality when data is empty or a rbind.
This gives the following behaviour during the first two loops :

Loop 1.
- data_temp contains 100 rows, colnames contains spaces.
- data is empty before the instruction. After it, he is exactly the same than data_temp (logically)
Loop 2.
- data_temp has new values, but still with colnames containing spaces.
- data is unmodified before the if compared to previous content. After the if colnames are modified (spaces replaced by ".")

This seems related to the data.frame which automatically replaces spaces :

> data.frame(`cols with spaces` = 1:5)
  cols.with.spaces
1                1
2                2
3                3
4                4
5                5

I tried to put a data = data.frame(data_temp) during first loop with no succes. @LukasK13 , do you see a solution ?

Answer 5 · 2020-11-26T17:51:39.000Z

@LukasK13 I tried (again). Here is a proposal :

 data = if (nrow(data) == 0) data.frame(data_temp) else rbind(
          data.frame(c(data, sapply(colnames(data_temp)[!colnames(data.frame(data_temp)) %in% colnames(data)], function(x) NA))),
          data.frame(c(data_temp, sapply(colnames(data)[!colnames(data) %in% colnames(data.frame(data_temp))], function(x) NA)))

Answer 6 · 2020-11-26T18:00:15.000Z

Thank you for digging into this. I just added your proposal with a slight modification: instead of colnames(data.frame(data_temp)) we can simply call make.names() on colnames(data_temp). Could you check if it works now by installing the latest version?

Answer 7 · 2020-11-27T07:10:57.000Z

Tested and approved. It might even be improved to return column names from your columnNames list. I tried an addition of the following just before return(data) :

colnames(data) <- columnNames[make.names(columnNames) %in% colnames(data)]

Answer 8 · 2020-11-27T07:44:03.000Z

Added. Please be aware that even though spaces in column names are now fully supported by sharepointr, this isn't considered to be best practice.

Answer 9 · 2020-11-27T07:47:23.000Z

Thanks.
I'm fully aware of that. But this sharepoint is inherited from previous "work" and I try to change it's structure step by step.