drolbr/Overpass-API

Invalid regular expression: "^[А-ЯЁ ]+$"

Opened this issue · 7 comments

[out:json][timeout:60][bbox:{{bbox}}];
nwr[name~"^[А-ЯЁ ]+$"];
out body qt;
>;
out skel qt;

While your query works on the overpass-api.de instance, some other instances like kumi.systems fail with the error message above. Some versions of C POSIX regular expressions don't seem to handle ranges with cyrillic characters properly.

As a quick workaround, you might try some other Overpass instance, or maybe avoid the range altogether by explicitly specifying all characters (not properly tested):

[out:json][timeout:60][bbox:{{bbox}}];
nwr[name~"^[АБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁ ]+$"];
out body qt;
>;
out skel qt;

Minimum example https://cpp.godbolt.org/z/qz6Tn56j9 fails on some systems.

Interesting! This problem impacts my Overpass instance, which I set up using the instructions from https://overpass-api.de/full_installation.html on a Debian docker image. I'm wondering if I've overlooked something.

FROM debian:bookworm-slim

# Install dependencies
RUN apt-get update && apt-get install -y \
    wget \
    g++ \
    make \
    expat \
    libexpat1-dev \
    zlib1g-dev \
    liblz4-dev \
    lighttpd \
    && rm -rf /var/lib/apt/lists/*

WORKDIR /app

# Download, extract and compile Overpass
RUN wget https://dev.overpass-api.de/releases/osm-3s_latest.tar.gz -O osm-3s_latest.tar.gz && \
    mkdir ./src && \
    tar -xzf osm-3s_latest.tar.gz -C ./src --strip-components=1 && \
    rm osm-3s_latest.tar.gz && \
    cd src && \
    ./configure --prefix="/app" --enable-lz4 && \
    make dist install clean && \
    cp -r rules .. && \
    cd .. && \
    rm -r ./src
...

By the way, I'm getting the same issue on Ubuntu 22.04, which is also based on Debian bookworm. For some reason, the previous Debian version bullseye seems to work ok.

You could try and replace the first line in your Dockerfile by FROM debian:bullseye-slim to see it helps. We still need to figure out what exactly is causing this issue on the newer Debian version.

image

I think I found the cause of that.
To check the currently applied locale:

std::cout << "Current Locale: " << setlocale(LC_ALL, NULL) << std::endl;

But maybe there is a better way to set the UTF-8 locale in the first place.

I have read that Python officially supports systems that have at least one of installed:

  • C.UTF-8
  • C.utf8
  • UTF-8

Maybe the same could be done in the overpass-api case.

...btw, I do confirm that switching to FROM debian:bullseye-slim fixed the issue.

It looks like buggy Regex engines from the base system are a real problem. The final solution, even if a workaround, should be to open an avenue to use the Regex engine of choice. I don't know whether the final solution will do some during install time or runtime.

If the app uses the C locale (since the requested locale is not installed), I don't see it as much of a regex engine issue. The app should simply support a wider range of UTF-8 locales, as other apps do.