Crashing when using öäü special chars in html input as form
maxi07 opened this issue · 3 comments
Describe the bug
I am running a sample app to catch user input from a textbox. The html displays a textbox and a submit button. When the button is pressed, the textbox content is printed to the console just fine. If the textbox contains special characters like äöü, the urldecode_bytes
function crashes with a UnicodeError.
To Reproduce
<!DOCTYPE html>
<html>
<head>
<title>Microdot Example Page</title>
</head>
<body>
<div>
<h1>Microdot Example Page</h1>
<p>Hello from Microdot!</p>
<p><a href="/shutdown">Click to shutdown the server</a></p>
<form action="/" method="post">
<input name="test-input" id="test-input"/>
<button name="test-button" id="test-button" type="submit">Submit</button>
</form>
</div>
</body>
</html>
@app.post('/')
async def hello2(request):
print(request.form['test-input'])
return htmldoc, 200, {'Content-Type': 'text/html'}
Expected behavior
I expect microdot to also decode special characters.
Additional context
You are doing amazing work!
StackTrace
File "C:\Users\max\source\repos\microdot-test\Microdot\microdot.py", line 466, in form
self._form = self._parse_urlencoded(self.body)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\max\source\repos\microdot-test\Microdot\microdot.py", line 413, in _parse_urlencoded
data[urldecode_bytes(k)] = urldecode_bytes(v)
^^^^^^^^^^^^^^^^^^
File "C:\Users\max\source\repos\microdot-test\Microdot\microdot.py", line 93, in urldecode_bytes
return b''.join(result).decode('utf-8')
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf6 in position 4: invalid start byte
What browser is this? Internet Explorer by any chance?
Haha, no - I am going with Microsoft Edge (Windows 11, Version 113.0.1774.35) and Safari for iOS 16.5.
If the textbox contains "text", it works fine, as urlencoded
is b'test-input=test&test-button='
.
If the textbox input is "test ä", the program crashes with the given error. urlencoded
is then equal to b'test-input=test+%E4&test-button='
@maxi07 Okay, here is the solution:
<!DOCTYPE html>
<html>
<head>
<title>Microdot Example Page</title>
<meta charset="UTF-8"> <--! add this line -->
</head>
<body>
<div>
<h1>Microdot Example Page</h1>
<p>Hello from Microdot!</p>
<p><a href="/shutdown">Click to shutdown the server</a></p>
<form action="/" method="post">
<input name="test-input" id="test-input"/>
<button name="test-button" id="test-button" type="submit">Submit</button>
</form>
</div>
</body>
</html>
This tells the browser that the page and any form submissions originating from it should be encoded in UTF-8.