bieniu/brother

encoding issue with Dutch status

Closed this issue · 23 comments

I get: 'status': 'stap. kopieín:01'. This should be 'status': 'stap. kopieën:01'. After some digging, it turns out the encoding the printer uses is hp_roman8. The chardet call (visible in the diff below) detects the string as ISO-8859-9 which is wrong (but I can see how it could not do any better).

This diff makes things work for me:

diff --git a/brother/__init__.py b/brother/__init__.py
index 8fc37b3..ee1932c 100644
--- a/brother/__init__.py
+++ b/brother/__init__.py
@@ -65,9 +65,10 @@ class Brother:  # pylint:disable=too-many-instance-attributes
         try:
             self.firmware = raw_data[OIDS[ATTR_FIRMWARE]]
             data[ATTR_FIRMWARE] = self.firmware
-            code_page = chardet.detect(raw_data[OIDS[ATTR_STATUS]].encode("latin1"))[
-                "encoding"
-            ]
+            code_page = 'hp_roman8'
+            # code_page = chardet.detect(raw_data[OIDS[ATTR_STATUS]].encode("latin1"))[
+            #     "encoding"
+            # ]
             # chardet detects Polish as ISO-8859-1 but Polish should use ISO-8859-2
             if code_page == "ISO-8859-1":
                 data[ATTR_STATUS] = (

but of course that is not a patch that would be good for everybody. I'm unsure what the right fix is; I understand from the existing code that some printers in fact use ISO-8559-1? I'm also confused about the .encode('latin1') step, why is that happening?

And, unrelated, why is the status string lowercased?

Full status output from unpatched code:

Data available: True
Model: DCP-7070DW
Firmware: U1307022128VER.J
Status: stap. kopieín:01
Serial no: E69767E2N930019
Sensors data: {'model': 'DCP-7070DW', 'serial': 'E69767E2N930019', 'firmware': 'U1307022128VER.J', 'status': 'stap. kopieín:01', 'page_counter': 2652, 'uptime': 346, 'drum_status': 1, 'drum_counter': 1603, 'drum_remaining_life': 88, 'black_toner_status': 1, 'black_toner_remaining': 72, 'drum_remaining_pages': 10397}

Are you using HA? Could you turn on debug for brother package and post here RAW data with this status?
You have to add this to your configuration.yaml file.

logger:
  default: warning
  logs:
    brother: debug

I have this for you right now, does it help? iso.3.6.1.2.1.43.16.5.1.2.1.1 = Hex-STRING: 53 74 61 70 2E 20 4B 6F 70 69 65 CD 6E 3A 30 31

I found the issue while using HA but the output in my first post is from the script in the README in this repository.

I need a raw output from the printer to try to fix the issue. The status uses OID 1.3.6.1.4.1.2435.2.3.9.4.2.1.5.4.5.2.0.

$ snmpwalk -OT -v 1 -c public brother.local 1.3.6.1.4.1.2435.2.3.9.4.2.1.5.4.5.2.0
SNMPv2-SMI::enterprises.2435.2.3.9.4.2.1.5.4.5.2.0 = Hex-STRING: 53 74 61 70 2E 20 4B 6F 70 69 65 CD 6E 3A 30 31   [Stap. Kopie.n:01]
2020-03-31 14:50:21 DEBUG (MainThread) [brother] RAW data: {'1.3.6.1.4.1.2435.2.3.9.4.2.1.5.5.10.0': ['00010400000a5c'], '1.3.6.1.4.1.2435.2.3.9.4.2.1.5.5.17.0': 'U1307022128VER.J', '1.3.6.1.4.1.2435.2.3.9.4.2.1.5.5.8.0': ['63010400000001', '11010400000643', '41010400002260', '31010400000001', '6f010400001c20'], '1.3.6.1.4.1.2435.2.3.9.1.1.7.0': 'MFG:Brother;CMD:PJL,PCL,PCLXL;MDL:DCP-7070DW;CLS:PRINTER;CID:Brother Laser Type1;', '1.3.6.1.4.1.2435.2.3.9.4.2.1.5.5.11.0': ['8201040000289d'], '1.3.6.1.2.1.43.10.2.1.4.1.1': '2652', '1.3.6.1.4.1.2435.2.3.9.4.2.1.5.5.1.0': 'E69767E2N930019', '1.3.6.1.4.1.2435.2.3.9.4.2.1.5.4.5.2.0': 'Stap. KopieÍn:01', '1.3.6.1.2.1.1.3.0': '2987802561'}
2020-03-31 14:50:21 DEBUG (MainThread) [brother] Data: {'status': 'stap. kopieín:01', 'page_counter': 2652, 'uptime': 346, 'drum_status': 1, 'drum_counter': 1603, 'drum_remaining_life': 88, 'black_toner_status': 1, 'black_toner_remaining': 72, 'drum_remaining_pages': 10397}

.1.3.6.1.4.1.2699.1.2.1.1.1.0 = STRING: "nl-NL"

http://oid-info.com/get/1.3.6.1.4.1.2699.1.2.1.1.1

This natural language tag is necessary for support of correct glyph selection for text display

Sounds like we should let the encoding depend on that field?

Or http://oid-info.com/get/1.3.6.1.2.1.43.7.1.1.4 - where we can let the printer in fact tell us the charset: .1.3.6.1.2.1.43.7.1.1.4.1.1 = INTEGER: 2004 which https://www.iana.org/assignments/ianacharset-mib/ianacharset-mib tells me is csHPRoman8(2004).

(If you think any of my suggestions are a good idea, I'm happy to also try implementing something)

I will look at this after work. Of course PRs are most welcome ;)

I think I have a universal solution. Could you test it? You have to install brother package from branch status-encoding:

pip uninstall brother
pip install --upgrade git+https://github.com/bieniu/brother@status-encoding
$ git checkout status-encoding
Branch 'status-encoding' set up to track remote branch 'status-encoding' from 'origin'.
Switched to a new branch 'status-encoding'
$ .venv/bin/python setup.py develop
...
$ .venv/bin/python run.py 
Data available: True
Model: DCP-7070DW
Firmware: U1307022128VER.J
Status: stap. kopieën:01
Serial no: E69767E2N930019
Sensors data: {'model': 'DCP-7070DW', 'serial': 'E69767E2N930019', 'firmware': 'U1307022128VER.J', 'status': 'stap. kopieën:01', 'page_counter': 2652, 'uptime': 346, 'drum_status': 1, 'drum_counter': 1603, 'drum_remaining_life': 88, 'black_toner_status': 1, 'black_toner_remaining': 72, 'drum_remaining_pages': 10397}

works!

+CHARSET_MAP = {"5": "latin2", "2004": "roman8", "8": "cyrillic", "12": "latin5"}, excellent choice, if I do say so myself :D

By the way, it is unclear to me if 1.3.6.1.2.1.43.7.1.1.4.1.1 is a reliable static index, or if one of the 1s at the end should officially come from some OID nearby.

Great. Thanks for help. Data from 1.3.6.1.2.1.43.7.1.1.4.1.1 is very helpful.

I don't know why charset data is in 1.3.6.1.2.1.43.7.1.1.4.1.1 but not in 1.3.6.1.2.1.43.7.1.1.4. Brother often change OIDs in their devices.

I have the same problem with Spanish accents. Do you want me to try the status-encoding branch? I only need to execute the two commands in ssh in hass.io?

pip3 uninstall brother
pip3 install git+https://github.com/bieniu/brother@status-encoding
curl https://raw.githubusercontent.com/bieniu/brother/status-encoding/example.py -o example.py
python3 example.py <PRINTER_IP>

The first command shows a warning because brother is not installed.
The latest one gives me the same error, module brother not installed.
I've used the integration that comes with HA directly.

I think I need to install some dependencies, but I'm on mobile. Tomorrow I will try with the computer.

Tested today in a VM with ubuntu and I have what seems a correct response:

miguel@ubuntuvm:~$ python3 example.py 192.168.100.10
Data available: True
Model: DCP-L2530DW
Firmware: Q1911142035
Status: cambie tóner
Serial no: E78277C8N724306
Sensors data: {'model': 'DCP-L2530DW', 'serial': 'E78277C8N724306', 'firmware': 'Q1911142035', 'status': 'cambie tóner', 'uptime': 0, 'page_counter': 574, 'duplex_unit_pages_counter': 330, 'drum_status': 1, 'drum_remaining_life': 96, 'drum_counter': 574, 'black_toner_status': 5, 'black_toner_remaining': 0, 'black_toner': 0, 'drum_remaining_pages': 11426}

The status is ok, cambie tóner, in Home Assistant it appears like cambie tæner. This implies that your new code fixes the issue?

Great, today I will release the package and prepare PR to HA repository.

Thank you!