miguelgrinberg/microdot

long/complex URLs cause recursion limit

Closed this issue · 9 comments

Describe the bug
first: microdot is amazing! :)

this url works: http://192.168...../api/v1/onoffswitch/PicoBuzzer/../PicoBuzzer~PicoCh1~PicoCh2~PicoCh3~PicoCh4~PicoCh5~PicoCh6~PicoCh7~PicoCh8~PicoGP2~PicoGP3~PicoGP4~PicoGP5~PicoGP7/off

this url and longer don't work http://192.168...../api/v1/onoffswitch/PicoBuzzer/../PicoBuzzer~PicoCh1~PicoCh2~PicoCh3~PicoCh4~PicoCh5~PicoCh6~PicoCh7~PicoCh8~PicoGP2~PicoGP3~PicoGP4~PicoGP5~PicoGP7~PicoGP8/off

also I noticed that if you allow e.g. "|" (pipe char), properly in route RegExp, it won't match, if sent URI encoded (e.g. %7C), but allowing "%" (and A-Z 0-9) also causes the request to never work

To Reproduce
Steps to reproduce the behavior:

  1. @app.route("/api/v1/onoffswitch/re:[A-Za-z0-9\\-\\.\\_\\~]+:switchName/", methods=["GET"])
  2. call above URL

Expected behavior
working

Additional context
Task exception wasn't retrieved
future: coro= <generator object 'serve' at 20039db0>
Traceback (most recent call last):
File "asyncio/core.py", line 1, in run_until_complete
File "microdot/microdot.py", line 1183, in serve
File "microdot/microdot.py", line 1293, in handle_request
File "microdot/microdot.py", line 1317, in dispatch_request
File "microdot/microdot.py", line 1266, in find_route
File "microdot/microdot.py", line 837, in match
RuntimeError: maximum recursion depth exceeded

I assume you are doing this on MicroPython hardware? I don't have a problem with your URL definition on the UNIX MicroPython, so I suspect this is just a limitation of your device's small stack size.

yes, it's an RPi Pico W (RP2040). there is ~50% (~80kb) free according to gc.mem_free().
is there any way to trace/debug this and either fix/improve it or find out that it is definitely not possible? I don't quite understand how that can't be enough for this simple string. also the actual route failing (with /off) is

@app.route(
    "/api/v1/onoffswitch/<re:[A-Za-z0-9\\-\\.\\_\\~\\|]+:switchName>/<re:on|off|toggle:action>/",
    methods=["GET", "PUT"],
)

but, the route without action has the same problem.
would it use much less stack if i don't use regexp and just capture the whole paths and then validate myself?

i just played around and changed it to a non regexp route

@app.route("/api/v1/onoffswitch/<switchName>", methods=["GET"])

but it never reaches my code with still the same error :(
I thought it was a RegExp issue, but it seems the URL is just too long before even getting that far?

here are all routes in my entire code of 328 lines total, if that helps to debug:

  140,2: @app.route("/api/v1/system/info", methods=["GET"])
  141,2: @app.route("/api/v1/system/info/", methods=["GET"])
  153,2: @app.route("/api/v1/system/status", methods=["GET"])
  154,2: @app.route("/api/v1/system/status/", methods=["GET"])
  189,2: @app.route("/api/v1/onoffswitch", methods=["GET"])
  190,2: @app.route("/api/v1/onoffswitch/", methods=["GET"])
  191,2: @app.route("/api/v1/onoffswitch/ALL", methods=["GET"])
  192,2: @app.route("/api/v1/onoffswitch/ALL/", methods=["GET"])
  210,2: @app.route("/api/v1/onoffswitch/<switchName>", methods=["GET"])
  211,2: @app.route("/api/v1/onoffswitch/<switchName>/", methods=["GET"])
  223,2: @app.route("/api/v1/onoffswitch/<switchName>/<action>", methods=["GET", "PUT"])
  227,2: @app.route("/api/v1/onoffswitch/<switchName>/<action>/", methods=["GET", "PUT"])
  253,2: @app.route("/api/v1/ws")
  298,2: @app.route("/favicon.ico", methods=["GET"])
  303,2: @app.route("/favicon.webp", methods=["GET"])
  308,2: @app.route("/", methods=["GET"])

Okay. Yeah, it looks like even the simpler regexes that I'm using for regular path components end up consuming the entire stack, given a sufficiently long URL. I did not realize the regexp module is so expensive to run, I need to see if I can replace it with a simpler parser.

cool, thanks! :)

first draft/attempt without RegExp - but now I get

MemoryError: memory allocation failed, allocating 2344 bytes

on startup already :(

edit: OK, turns out micropython loads all uploaded .py files, even the ones not imported/used, so I removed the unpatched microdot.py but now I get an IndexOutOfRange error in match() - I'm not good enough with Python :(

class URLPattern:
    def __init__(self, url_pattern):
        self.url_pattern = url_pattern
        self.segments = url_pattern.strip("/").split("/")
        self.args = []
        for segment in self.segments:
            if segment.startswith("<") and segment.endswith(">"):
                segment = segment[1:-1]
                if ":" in segment:
                    type_, name = segment.split(":", 1)
                else:
                    type_ = "string"
                    name = segment
                self.args.append({"type": type_, "name": name})

    def match(self, path):
        path_segments = path.strip("/").split("/")
        if len(path_segments) != len(self.segments):
            return
        args = {}
        for i, segment in enumerate(self.segments):
            if segment.startswith("<") and segment.endswith(">"):
                name = self.args[i]["name"]
                type_ = self.args[i]["type"]
                value = path_segments[i]
                if type_ == "int":
                    try:
                        value = int(value)
                    except ValueError:
                        return
                elif type_ == "path":
                    value = "/".join(path_segments[i:])
                    args[name] = value
                    return args
                args[name] = value
            elif segment != path_segments[i]:
                return
        return args

@sblive Give the update microblog.py on the main branch a try and let me know if it works any better. This version will only use regexes if you add a path or re segment to your URL pattern. For any others, the parsing is done without regexes and without recursion.

yes it works, even with longer URLs, thanks! :)