long/complex URLs cause recursion limit

Question

long/complex URLs cause recursion limit

Closed this issue 5 months ago · 9 comments

Describe the bug
first: microdot is amazing! :)

this url works: http://192.168...../api/v1/onoffswitch/PicoBuzzer/../PicoBuzzer~PicoCh1~PicoCh2~PicoCh3~PicoCh4~PicoCh5~PicoCh6~PicoCh7~PicoCh8~PicoGP2~PicoGP3~PicoGP4~PicoGP5~PicoGP7/off

this url and longer don't work http://192.168...../api/v1/onoffswitch/PicoBuzzer/../PicoBuzzer~PicoCh1~PicoCh2~PicoCh3~PicoCh4~PicoCh5~PicoCh6~PicoCh7~PicoCh8~PicoGP2~PicoGP3~PicoGP4~PicoGP5~PicoGP7~PicoGP8/off

also I noticed that if you allow e.g. "|" (pipe char), properly in route RegExp, it won't match, if sent URI encoded (e.g. %7C), but allowing "%" (and A-Z 0-9) also causes the request to never work

To Reproduce
Steps to reproduce the behavior:

@app.route("/api/v1/onoffswitch/re:[A-Za-z0-9\\-\\.\\_\\~]+:switchName/", methods=["GET"])
call above URL

Expected behavior
working

Additional context
Task exception wasn't retrieved
future: coro= <generator object 'serve' at 20039db0>
Traceback (most recent call last):
File "asyncio/core.py", line 1, in run_until_complete
File "microdot/microdot.py", line 1183, in serve
File "microdot/microdot.py", line 1293, in handle_request
File "microdot/microdot.py", line 1317, in dispatch_request
File "microdot/microdot.py", line 1266, in find_route
File "microdot/microdot.py", line 837, in match
RuntimeError: maximum recursion depth exceeded

Answer 1 · 2024-02-17T19:30:21.000Z

I assume you are doing this on MicroPython hardware? I don't have a problem with your URL definition on the UNIX MicroPython, so I suspect this is just a limitation of your device's small stack size.

Answer 2 · 2024-02-17T19:39:36.000Z

yes, it's an RPi Pico W (RP2040). there is ~50% (~80kb) free according to gc.mem_free().
is there any way to trace/debug this and either fix/improve it or find out that it is definitely not possible? I don't quite understand how that can't be enough for this simple string. also the actual route failing (with /off) is

@app.route(
    "/api/v1/onoffswitch/<re:[A-Za-z0-9\\-\\.\\_\\~\\|]+:switchName>/<re:on|off|toggle:action>/",
    methods=["GET", "PUT"],
)

but, the route without action has the same problem.
would it use much less stack if i don't use regexp and just capture the whole paths and then validate myself?

Answer 3 · 2024-02-17T19:52:48.000Z

i just played around and changed it to a non regexp route

@app.route("/api/v1/onoffswitch/<switchName>", methods=["GET"])

but it never reaches my code with still the same error :(
I thought it was a RegExp issue, but it seems the URL is just too long before even getting that far?

Answer 4 · 2024-02-17T20:15:55.000Z

here are all routes in my entire code of 328 lines total, if that helps to debug:

  140,2: @app.route("/api/v1/system/info", methods=["GET"])
  141,2: @app.route("/api/v1/system/info/", methods=["GET"])
  153,2: @app.route("/api/v1/system/status", methods=["GET"])
  154,2: @app.route("/api/v1/system/status/", methods=["GET"])
  189,2: @app.route("/api/v1/onoffswitch", methods=["GET"])
  190,2: @app.route("/api/v1/onoffswitch/", methods=["GET"])
  191,2: @app.route("/api/v1/onoffswitch/ALL", methods=["GET"])
  192,2: @app.route("/api/v1/onoffswitch/ALL/", methods=["GET"])
  210,2: @app.route("/api/v1/onoffswitch/<switchName>", methods=["GET"])
  211,2: @app.route("/api/v1/onoffswitch/<switchName>/", methods=["GET"])
  223,2: @app.route("/api/v1/onoffswitch/<switchName>/<action>", methods=["GET", "PUT"])
  227,2: @app.route("/api/v1/onoffswitch/<switchName>/<action>/", methods=["GET", "PUT"])
  253,2: @app.route("/api/v1/ws")
  298,2: @app.route("/favicon.ico", methods=["GET"])
  303,2: @app.route("/favicon.webp", methods=["GET"])
  308,2: @app.route("/", methods=["GET"])

Answer 5 · 2024-02-17T20:26:43.000Z

Okay. Yeah, it looks like even the simpler regexes that I'm using for regular path components end up consuming the entire stack, given a sufficiently long URL. I did not realize the regexp module is so expensive to run, I need to see if I can replace it with a simpler parser.

Answer 6 · 2024-02-17T20:29:32.000Z

cool, thanks! :)

Answer 7 · 2024-02-17T20:32:29.000Z

first draft/attempt without RegExp - but now I get

MemoryError: memory allocation failed, allocating 2344 bytes

on startup already :(

edit: OK, turns out micropython loads all uploaded .py files, even the ones not imported/used, so I removed the unpatched microdot.py but now I get an IndexOutOfRange error in match() - I'm not good enough with Python :(

class URLPattern:
    def __init__(self, url_pattern):
        self.url_pattern = url_pattern
        self.segments = url_pattern.strip("/").split("/")
        self.args = []
        for segment in self.segments:
            if segment.startswith("<") and segment.endswith(">"):
                segment = segment[1:-1]
                if ":" in segment:
                    type_, name = segment.split(":", 1)
                else:
                    type_ = "string"
                    name = segment
                self.args.append({"type": type_, "name": name})

    def match(self, path):
        path_segments = path.strip("/").split("/")
        if len(path_segments) != len(self.segments):
            return
        args = {}
        for i, segment in enumerate(self.segments):
            if segment.startswith("<") and segment.endswith(">"):
                name = self.args[i]["name"]
                type_ = self.args[i]["type"]
                value = path_segments[i]
                if type_ == "int":
                    try:
                        value = int(value)
                    except ValueError:
                        return
                elif type_ == "path":
                    value = "/".join(path_segments[i:])
                    args[name] = value
                    return args
                args[name] = value
            elif segment != path_segments[i]:
                return
        return args

Answer 8 · 2024-02-18T15:07:05.000Z

@sblive Give the update microblog.py on the main branch a try and let me know if it works any better. This version will only use regexes if you add a path or re segment to your URL pattern. For any others, the parsing is done without regexes and without recursion.

Answer 9 · 2024-02-18T20:51:55.000Z

yes it works, even with longer URLs, thanks! :)