Text in PDF recognized as gibberish in any PDFium viewer due to invalid bfrange definitions in ToUnicodeMap

Bug Report

Description of the problem

Lines 269 to 271 in 485b7e6

    
           1 beginbfrange 
        
           <0000> <${toHex(entries.length - 1)}> [${entries.join(' ')}] 
        
           endbfrange

Currently, our code generates all ToUnicodeMap entries on a single line. This yields invalid text mapping on any PDFium base viewers (and maybe others).

https://source.chromium.org/chromium/_/pdfium/pdfium.git/+/master:core/fpdfapi/font/cpdf_tounicodemap.cpp;l=171-172;drc=61bda438f9071586c92f8f626c29021524a8d0b0

    uint32_t lowcode = lowcode_opt.value();
    uint32_t highcode = (lowcode & 0xffffff00) | (highcode_opt.value() & 0xff);

Related Chromium bug: https://bugs.chromium.org/p/pdfium/issues/detail?id=1339#c1

The PDF spec doesn't give too much detail about beginbfrange. I looked around and found the doc below. Based on section 1.4.1 in that doc, the <19ff><1a00><63cf> beginbfrange entry is illegal. The first byte values should be the same for the two source range values in the entry.
https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/5411.ToUnicode.pdf

The link is moved or removed at this moment. I found another copy at http://www.audentia-gestion.fr/ADOBE/5411.ToUnicode.pdf

Screenshots

Google Chrome 122.0.6261.69 Linux x86_64
Chromium 122.0.6261.69 (Official Build) Arch Linux (64-bit)
WPS Office for Linux 11.1.0.11698
Firefox (pdf.js) - CORRECT
Adobe Acrobat Reader 2023.008.20533 64-bit on Windows 11 - CORRECT

Code sample

https://replit.com/@orzFly/pdfkit-tounicode?v=1
test.pdf

I used 258 glyphs in the document, so only the first two (258 % 256 = 2) glyphs is correct - yields "AB" correctly. All the rest are incorrect.

Your environment

pdfkit version: 0.12.3, or master
Node version: 12.22.9
Browser version:
- Google Chrome 122.0.6261.69 Linux x86_64
- WPS Office for Linux 11.1.0.11698
- Chromium 122.0.6261.69 (Official Build) Arch Linux (64-bit)
Operating System: Linux x86_64

I have a possible fix - will send a pull request later. However, I am not sure how to add unit test about this particular behavior.

	1 beginbfrange
	<0000> <${toHex(entries.length - 1)}> [${entries.join(' ')}]
	endbfrange