zdavatz/spreadsheet

Complex Excel Structure -> Cells location jumbled upon saving

sergio-rivas opened this issue · 13 comments

When parsing a complex structure excel file (i.e. lots of merges, etc), then modifying a few specific cells, the output file changes the cell locations of many unrelated cells around the sheet (both row/column position are being shifted for other cells).

Is there anything I should provide to help you be able to replicate the issue?

what software created the original file?

The file was created with Windows version of Microsoft Excel, it's a .xls file too. I think it might have to do with the fact that the file has many use of merged cells.

I tried two simple tests to create an easy replication:

  1. open the file and use spreadsheet to write to new file -> no problem
  2. open the file and use spreadsheet to edit only one cell -> issue replicated

Would I be able to privately send the XLS file so you can confirm the issue?
I don't mind helping out by sponsoring the bugfix too. Let me know how much.

Please try to open and save the file with LibreOffice and then try to edit it again using spreadsheet. Does that work?

After saving w/ LibreOffice, now spreadsheet gem cannot read the excel.
Error:

Spreadsheet::Errors::UnknownCodepage: Unknown Codepage 0xac8d
from .../ruby/gems/2.6.0/gems/spreadsheet-1.2.7/lib/spreadsheet/excel/reader.rb:75:in `block in encoding'

Did you save the file as UTF-8? Which version of LibreOffice and which OS are you using?

LibreOffice 7.1.0.3, Mac OSX.

There is no option to choose UTF-8 when saving as xls. The resulting file can be opened in LibreOffice, but is not readable by spreadsheet gem. Can I email you the original file and you try to open/save with your libre office version, then read it in spreadsheet gem? This way we can eliminate the possibility of environmental factors?

Screen Shot 2021-03-02 at 1 23 08 AM

Please share the Ruby script with which you want to open the file. Also look at this commit here: https://github.com/zdavatz/spreadsheet/blob/master/History.md#111--03012016 where we had a similar error. Maybe you can send a Pull Request. This is the commit: e59b337

I tried digging into the code myself, but was unable to get anywhere. Is there a way to know exactly what encoding LibreOffice is using when saving?

I tried setting the unknown codepage to MACROMAN, but would get invalid byte sequence errors, I tried UTF-8, and didn't get invalid byte sequence errors, but in BIFF8, it kept encountering a lot of errors due to nil values here and there.

For me the question is which language/character uses that "strange" characterset in your XLS?

It's all alphanumeric characters. Language is set to English (US), I changed all items to font "Times New Roman"

I even tried running a macros to enforce all alphanumeric characters only.

REM  *****  BASIC  *****
Function AlphaNumericOnly(strSource As String) As String
    Dim i As Integer
    Dim strResult As String

    For i = 1 To Len(strSource)
        Select Case Asc(Mid(strSource, i, 1))
            Case 32 To 32, 48 To 57, 65 To 90, 97 To 122: 'include 32 if you want to include space
                strResult = strResult & Mid(strSource, i, 1)
        End Select
    Next
    AlphaNumericOnly = strResult
End Function

Function CleanSheet(sheetName As String) As Boolean
	Dim sh As Object, z As Object, c As Object
    Dim qCells As Object, enuCells As Object
    sh = ThisComponent.Sheets.getByName(sheetName)
    z = sh.getCellRangeByName("A1:U300")
    qCells = z.queryContentCells(-1)
    enuCells = qCells.Cells.createEnumeration
    Do While enuCells.hasMoreElements
      c = enuCells.nextElement
      If Left(c.Formula, 1) = "=" Then 
		
      Else
		c.string = AlphaNumericOnly(c.String)
	  End If  
    Loop
    CleanSheet = true
End Function


Sub CleanAll()
	CleanSheet("GRID")
	CleanSheet("RESULTS")
	CleanSheet("PICTURES")
End Sub

Which Windows Version and which MS Excel Version created the orginial file? Can these people try to create the same file using LibreOffice on Windows?

After some messing around with different settings, the following combination fixed my issue:

  1. Uncheck "Protect Sheet"
  2. Uncheck "Protect Spreadsheet Structure"

Screen Shot 2021-03-13 at 1 43 52 PM

Ok, good to know, thank you for reporting!