Improve mandatoryCell behavior for opened sheets
virtual-machinist opened this issue · 12 comments
The mandatoryCell
feature works rather strangely:
- does detect missing cells for XLS files
- does not detect missing cells for XLSX files
- does not detect blank cells (i.e. without data but with changed format) for XLSX sheets (i.e. when already opened as a
XSSFSheet
instance) - does detect missing cells for XLSX sheets
- does not detect blank cells for XLS files
The first three cases are documented, i.e. I shouldn't expect the parser to work correctly for XLSX format, however the last two aren't.
From layman's point of view there isn't too much difference between blank and missing cells, meaning nobody will re-examine if the contents of the cell were cleared together with or without formatting. Thus, it makes sense to expect the same com.poiji.exception.PoijiMultiRowException
for both blank and missing cells in an XLS workbook that the feature applies to.
Seeing that XSSFSheet
is actually parsed the same way as HSSFSheet
using com.poiji.bind.mapping.SheetUnmarshaller
we can at least try to fix/improve behavior for opened sheets. XLSX files and streams use a different unmarshaller and mandatoryCell
check isn't done there in any way, i.e. adding this feature there may be out of scope for this issue.
I've added test cases to my missing-cell-ignored branch. The excel files basically have the same input as the constructed sheet in com.poiji.deserialize.MandatoryCellsExceptionTest#createDummyExcel
. The old XLS format was derived from the XLSX using the ssconvert
utility.
I believe the place we should be looking at is somewhere near com.poiji.bind.mapping.HSSFUnmarshaller#constructTypeValue
:
Cell cell = currentRow.getCell(annotationDetail.getColumn());
if (cell != null) {
...
} else if (annotationDetail.isMandatoryCell()) {
// this is never reached, since the cell is not null
throw new PoijiRowSpecificException(annotationDetail.getColumnName(), field.getName(), currentRow.getRowNum());
}
I suggest adding a check if the cell is blank but mandatory before casting the value using com.poiji.config.Casting#castValue
and assigning it to the target field or doing the casting and then checking if the result is null
.
At the moment I have a workaround using the latter option via an overridden DefaultCasting
that checks if the target field has an annotation with mandatoryCell
set to true
, but it feels clunky and won't work without .preferNullOverDefault(true)
.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Hi @virtual-machinist !
Sorry I got busy with other tasks. I'm open to any suggestion based on your findings for this issue.
@ozlerhakan same here.
One thing I found was that doing cell.getCellType() != CellType.BLANK
check in addition to cell != null
check pretty much solves the problem of detecting blank mandatory cells. But haven't had the time to test if this breaks anything else.
@ozlerhakan , I've opened a PR with this change, and also added a test case based on the previous one where the cell was simply null
. But I can also merge my test cases from the missing-cell-ignored branch. There only one fails that is related to XLSX file parsing that is not done using the same code, as described in the docs.
However, the test modifies XLSX files every time it's run (i.e. I see the file has been changed in git log after running tests). Can this be a problem?
Thanks @virtual-machinist !
However, the test modifies XLSX files every time it's run (i.e. I see the file has been changed in git log after running tests). Can this be a problem?
I'd say that we shouldn't modify an excel, do you think you can find a fix for that?
@ozlerhakan , had a look at the problem. Apparently I forgot that I have to open the workbook in read-only mode for the tests. This was no issue when I used an InputStream
instead of a File
, as I don't have file-based I/O.
Also while investigating this problem I saw a better workaround for the whole issue by using the org.apache.poi.ss.usermodel.Workbook#setMissingCellPolicy
. If I set the missing cell policy to Row.MissingCellPolicy.RETURN_BLANK_AS_NULL
, then no modification of com.poiji.bind.mapping.HSSFUnmarshaller#constructTypeValue
is needed.
This however cannot be done if using the com.poiji.bind.Poiji#fromExcel(java.io.File, java.lang.Class<T>, com.poiji.option.PoijiOptions)
or com.poiji.bind.Poiji#fromExcel(java.io.InputStream, com.poiji.exception.PoijiExcelType, java.lang.Class<T>, com.poiji.option.PoijiOptions)
methods for XLS files, as there's no way to specify this cell policy.
The missing cell policy option looks better to me, I think we should document this for opened sheets (i.e. when using com.poiji.bind.Poiji#fromExcel(org.apache.poi.ss.usermodel.Sheet, java.lang.Class<T>, com.poiji.option.PoijiOptions)
) and modify com.poiji.bind.mapping.PoijiWorkBook#workbook
appropriately.
OR expose org.apache.poi.ss.usermodel.Row.MissingCellPolicy
setting in com.poiji.option.PoijiOptions
. But this won't work too good when opening sheets, as I may have set it already in the workbook.
What do you think?
This was no issue when I used an InputStream instead of a File, as I don't have file-based I/O.
Cool!
The missing cell policy option looks better to me, I think we should document this for opened sheets (i.e. when using com.poiji.bind.Poiji#fromExcel(org.apache.poi.ss.usermodel.Sheet, java.lang.Class, com.poiji.option.PoijiOptions)) and modify com.poiji.bind.mapping.PoijiWorkBook#workbook appropriately.
There are indeed by design different constraints between XLS and XLSX approaches. Probably, this is also one of the challenges among them. I'd choose the first option where we can have support this by default and document it explicitly.
Did a second attempt at this. Added test cases for both XLS stream and file, and also one for opened XLSX sheet, since mandatory cell check works there as well.
Thanks @virtual-machinist , I'll plan to merge it with a new release.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Hey @virtual-machinist , I've just merged the changes, thanks a lot!