Ordered Output of Split Pages
Opened this issue · 5 comments
Awesome module which I have used to sort through large PDF files at incredible speeds.
First time posting anything on GitHub, so I hope this is acceptable.
Only issue I have is when splitting documents with a large amount of pages, the naming convention of the [CustomeSplitter] Class names the file based on the page number. This can make it hard to then correctly read through split files in order.
Suggest expanding the file name to include leading zeros. I have successfully been able to modify the [CustomSplitter] Class to do this with the below code:
class CustomSplitter : iText.Kernel.Utils.PdfSplitter {
[int] $_order
[string] $_destinationFolder
[string] $_outputName
CustomSplitter([iText.Kernel.Pdf.PdfDocument] $pdfDocument, [string] $destinationFolder, [string] $OutputName) : base($pdfDocument) {
$this._destinationFolder = $destinationFolder
$this._order = 1
$this._outputName = $OutputName
}
[iText.Kernel.Pdf.PdfWriter] GetNextPdfWriter([iText.Kernel.Utils.PageRange] $documentPageRange) {
$Name = -join ($this._outputName, $this._order.ToString("D4"), ".pdf")
$Path = [IO.Path]::Combine($this._destinationFolder, $Name)
$this._order++
return [iText.Kernel.Pdf.PdfWriter]::new($Path)
}
}
"$this._order = 1" as a start for page 1.
"$this._order.ToString("D4")" will handle files that are up to 9999 pages long, so shouldn't push the limits too often.
"$this._order++" to increment to the next page number.
Ideally if I had time, I would expand this to look at the file prior to splitting to get the total amount of pages and adjust how many leading zeros are required so that the naming convention was dynamic based on the content at the time.
Tested this to work with both 0.0.10 and 0.0.17.
Thanks again for the module.
This seems like a nice idea. Using Get-PDFDetails
one could get a number of pages, based on that add leading zero's to make it nice and pretty for naming convention.
$NumberOfPages = 10000
$number = 100
([string]$number).PadLeft($NumberOfPages.ToString().length,'0')
@TheOwl57 would you consider making a PR?
Sorry, very new to GitHub and trying to figure it out, but yeah I would happy to create a PR. I have gone further and have some ideas on how to get the padding on the fly. Something like:
$Reader = [iText.Kernel.Pdf.PdfReader]::New($File)
$PDFLength = ([iText.Kernel.Pdf.PdfDocument]::new($Reader).GetNumberOfPages()).ToString().Length
The easiest way to "manage PR" is to follow what I've written in #12 and do it from GitHub GUI.
However I would encourage you to "learn" GitHub a bit as it will come useful in the future. Let me know if you would be able to make that PR?
This is what I use (PSWritePDF.psm1)
class CustomSplitter : iText.Kernel.Utils.PdfSplitter {
[int] $_order
[string] $_destinationFolder
[string] $_outputName
[string] $_Mask
CustomSplitter([iText.Kernel.Pdf.PdfDocument] $pdfDocument, [string] $destinationFolder, [string] $OutputName) : base($pdfDocument) {
$this._destinationFolder = $destinationFolder
$this._order = 1 # commencer à 1 au lieu de 0
$this._outputName = $OutputName
$this._Mask = ("0" * ($pdfDocument.GetNumberOfPages()).ToString().Length)
}
[iText.Kernel.Pdf.PdfWriter] GetNextPdfWriter([iText.Kernel.Utils.PageRange] $documentPageRange) {
$Name = -join ($this._outputName, $this._order.ToString($this._Mask), ".pdf")
$this._order++
$Path = [IO.Path]::Combine($this._destinationFolder, $Name)
return [iText.Kernel.Pdf.PdfWriter]::new($Path)