netdata/netdata

[Bug]: Chart filtering to prometheus appears to be broken

johnalotoski opened this issue · 3 comments

Bug description

Netdata should be able to filter charts from promtheus exporter per docs.

However, adding a prometheus exporter filter doesn't seem to do filtering anymore.

Expected behavior

Filtering will filter the prometheus metrics.

Steps to reproduce

Create a netdata.conf to do some filtering, in this example, try to filter everything:

[prometheus:exporter]
    send charts matching = !*

Check if any filtering has occurred and see nothing has been filtered:

# curl -s 'localhost:19999/api/v1/allmetrics?format=prometheus' | wc -l
2026

Finer grained filtering also seems to make no difference, example with a pattern of statsd_*:

[prometheus:exporter]
    send charts matching = statsd_*

I would expect to see only charts starting with statsd_, but similar to above, we see all charts:

# curl -s 'localhost:19999/api/v1/allmetrics?format=prometheus' | wc -l
2026

Installation method

other

System info

Linux sanchonet1-test-a-1 6.1.84 #1-NixOS SMP PREEMPT_DYNAMIC Wed Apr  3 13:19:55 UTC 2024 x86_64 GNU/Linux
/etc/lsb-release:DISTRIB_CODENAME=tapir
/etc/lsb-release:DISTRIB_DESCRIPTION="NixOS 23.11 (Tapir)"
/etc/lsb-release:DISTRIB_ID=nixos
/etc/lsb-release:DISTRIB_RELEASE="23.11"
/etc/lsb-release:LSB_VERSION="23.11 (Tapir)"
/etc/os-release:BUILD_ID="23.11pre-git"
/etc/os-release:ID=nixos
/etc/os-release:LOGO="nix-snowflake"
/etc/os-release:NAME=NixOS
/etc/os-release:PRETTY_NAME="NixOS 23.11 (Tapir)"
/etc/os-release:SUPPORT_END="2024-06-30"
/etc/os-release:VERSION="23.11 (Tapir)"
/etc/os-release:VERSION_CODENAME=tapir
/etc/os-release:VERSION_ID="23.11"
Linux sanchonet1-test-a-1 6.1.84 #1-NixOS SMP PREEMPT_DYNAMIC Wed Apr  3 13:19:55 UTC 2024 x86_64 GNU/Linux

Netdata build info

Packaging:
    Netdata Version ____________________________________________ : v1.43.2
    Installation Type __________________________________________ : unknown
    Package Architecture _______________________________________ : unknown
    Package Distro _____________________________________________ : unknown
    Configure Options __________________________________________ : REMOVED FOR CLOSURE SIZE REASONS
Default Directories:
    User Configurations ________________________________________ : /etc/netdata
    Stock Configurations _______________________________________ : /nix/store/r1vk56fkg3dpdbly303ir2hk909naw2h-netdata-1.43.2/lib/netdata/conf.d
    Ephemeral Databases (metrics data, metadata) _______________ : /var/cache/netdata
    Permanent Databases ________________________________________ : /var/lib/netdata
    Plugins ____________________________________________________ : /nix/store/r1vk56fkg3dpdbly303ir2hk909naw2h-netdata-1.43.2/libexec/netdata/plugins.d
    Static Web Files ___________________________________________ : /nix/store/r1vk56fkg3dpdbly303ir2hk909naw2h-netdata-1.43.2/share/netdata/web
    Log Files __________________________________________________ : /var/log/netdata
    Lock Files _________________________________________________ : /var/lib/netdata/lock
    Home _______________________________________________________ : /var/lib/netdata
Operating System:
    Kernel _____________________________________________________ : Linux
    Kernel Version _____________________________________________ : 6.1.84
    Operating System ___________________________________________ : NixOS
    Operating System ID ________________________________________ : nixos
    Operating System ID Like ___________________________________ : unknown
    Operating System Version ___________________________________ : 23.11 (Tapir)
    Operating System Version ID ________________________________ : none
    Detection __________________________________________________ : /etc/os-release
Hardware:
    CPU Cores __________________________________________________ : 2
    CPU Frequency ______________________________________________ : 2199000000
    CPU Architecture ___________________________________________ : 4072284160
    RAM Bytes __________________________________________________ : 85899345920
    Disk Capacity ______________________________________________ : x86_64
    Virtualization Technology __________________________________ : amazon
    Virtualization Detection ___________________________________ : systemd-detect-virt
Container:
    Container __________________________________________________ : none
    Container Detection ________________________________________ : systemd-detect-virt
    Container Orchestrator _____________________________________ : none
    Container Operating System _________________________________ : none
    Container Operating System ID ______________________________ : none
    Container Operating System ID Like _________________________ : none
    Container Operating System Version _________________________ : none
    Container Operating System Version ID ______________________ : none
    Container Operating System Detection _______________________ : none
Features:
    Built For __________________________________________________ : Linux
    Netdata Cloud ______________________________________________ : NO (disabled)
    Health (trigger alerts and send notifications) _____________ : YES
    Streaming (stream metrics to parent Netdata servers) _______ : YES
    Replication (fill the gaps of parent Netdata servers) ______ : YES
    Streaming and Replication Compression ______________________ : YES (lz4)
    Contexts (index all active and archived metrics) ___________ : YES
    Tiering (multiple dbs with different metrics resolution) ___ : YES (5)
    Machine Learning ___________________________________________ : YES
Database Engines:
    dbengine ___________________________________________________ : YES
    alloc ______________________________________________________ : YES
    ram ________________________________________________________ : YES
    map ________________________________________________________ : YES
    save _______________________________________________________ : YES
    none _______________________________________________________ : YES
Connectivity Capabilities:
    ACLK (Agent-Cloud Link: MQTT over WebSockets over TLS) _____ : NO
    static (Netdata internal web server) _______________________ : YES
    h2o (web server) ___________________________________________ : NO
    WebRTC (experimental) ______________________________________ : NO
    Native HTTPS (TLS Support) _________________________________ : YES
    TLS Host Verification ______________________________________ : YES
Libraries:
    LZ4 (extremely fast lossless compression algorithm) ________ : YES
    zlib (lossless data-compression library) ___________________ : YES
    Judy (high-performance dynamic arrays and hashtables) ______ : YES (bundled)
    dlib (robust machine learning toolkit) _____________________ : YES (bundled)
    protobuf (platform-neutral data serialization protocol) ____ : NO
    OpenSSL (cryptography) _____________________________________ : YES
    libdatachannel (stand-alone WebRTC data channels) __________ : NO
    JSON-C (lightweight JSON manipulation) _____________________ : YES
    libcap (Linux capabilities system operations) ______________ : YES
    libcrypto (cryptographic functions) ________________________ : YES
    libm (mathematical functions) ______________________________ : YES
    jemalloc ___________________________________________________ : YES
    TCMalloc ___________________________________________________ : NO
Plugins:
    apps (monitor processes) ___________________________________ : YES
    cgroups (monitor containers and VMs) _______________________ : YES
    cgroup-network (associate interfaces to CGROUPS) ___________ : YES
    proc (monitor Linux systems) _______________________________ : YES
    tc (monitor Linux network QoS) _____________________________ : YES
    diskspace (monitor Linux mount points) _____________________ : YES
    freebsd (monitor FreeBSD systems) __________________________ : NO
    macos (monitor MacOS systems) ______________________________ : NO
    statsd (collect custom application metrics) ________________ : YES
    timex (check system clock synchronization) _________________ : YES
    idlejitter (check system latency and jitter) _______________ : YES
    bash (support shell data collection jobs - charts.d) _______ : YES
    debugfs (kernel debugging metrics) _________________________ : YES
    cups (monitor printers and print jobs) _____________________ : NO
    ebpf (monitor system calls) ________________________________ : NO
    freeipmi (monitor enterprise server H/W) ___________________ : YES
    nfacct (gather netfilter accounting) _______________________ : YES
    perf (collect kernel performance events) ___________________ : YES
    slabinfo (monitor kernel object caching) ___________________ : YES
    Xen ________________________________________________________ : NO
    Xen VBD Error Tracking _____________________________________ : NO
Exporters:
    AWS Kinesis ________________________________________________ : NO
    GCP PubSub _________________________________________________ : NO
    MongoDB ____________________________________________________ : NO
    Prometheus (OpenMetrics) Exporter __________________________ : YES
    Prometheus Remote Write ____________________________________ : NO
    Graphite ___________________________________________________ : YES
    Graphite HTTP / HTTPS ______________________________________ : YES
    JSON _______________________________________________________ : YES
    JSON HTTP / HTTPS __________________________________________ : YES
    OpenTSDB ___________________________________________________ : YES
    OpenTSDB HTTP / HTTPS ______________________________________ : YES
    All Metrics API ____________________________________________ : YES
    Shell (use metrics in shell scripts) _______________________ : YES
Debug/Developer Features:
    Trace All Netdata Allocations (with charts) ________________ : NO
    Developer Mode (more runtime checks, slower) _______________ : NO

Additional info

There are two options to filter prometheus metrics:

  1. The first is via config file filtering which appears doesn't work as discussed above
  2. The second way is by including the filter param directly in the URL, which does work from CLI, example:
# curl -s 'localhost:19999/api/v1/allmetrics?format=prometheus&filter=statsd_*' | wc -l
2
  • However, when using automation to scrape this endpoint, various scrape clients, such a grafana-agent, will automatically encode pattern identifiers, such as * with %2A, in the URL with no apparent way to escape these and the scrape also fails using this approach.
May 08 23:16:37 grafana-agent-start[261377]: ts=2024-05-08T23:16:37.608255206Z caller=scrape.go:1384 level=debug agent=prometheus component="scrape manager" target="http://localhost:8125/api/v1/allmetrics?filter=statsd_%2A&format=prometheus" msg="Scrape failed" err="Get \"http://localhost:8125/api/v1/allmetrics?filter=statsd_%2A&format=prometheus\": context deadline exceeded"
  • Maybe it would be nice if the URL method would also recognize the encoded form of the pattern matchers so clients which automatically encode special characters won't break.

Hi, @johnalotoski. I can't reproduce the problem in v1.45.4.

$ cat exporting.conf | grep -v "#"
[prometheus:exporter]
    send charts matching = !*

$ curl -s 'localhost:19999/api/v1/allmetrics?format=prometheus' | wc -l
1

Have you restarted Netdata after updating exporting.conf?

Also, you can filter using the filter URL parameter:

$ curl 'localhost:19999/api/v1/allmetrics?format=prometheus&filter=*system.softirq*'
netdata_info{instance="pve-deb-work",application="netdata",version="v1.45.3-10-gb589731c8"} 1 1715237847219
netdata_system_softirq_latency_milliseconds_average{chart="system.softirq_latency",dimension="HI",family="softirqs"} 0.0000000 1715237830000
netdata_system_softirq_latency_milliseconds_average{chart="system.softirq_latency",dimension="TIMER",family="softirqs"} 0.2000000 1715237830000
netdata_system_softirq_latency_milliseconds_average{chart="system.softirq_latency",dimension="NET_TX",family="softirqs"} 0.0000000 1715237830000
netdata_system_softirq_latency_milliseconds_average{chart="system.softirq_latency",dimension="NET_RX",family="softirqs"} 0.0000000 1715237830000
netdata_system_softirq_latency_milliseconds_average{chart="system.softirq_latency",dimension="BLOCK",family="softirqs"} 0.0000000 1715237830000
netdata_system_softirq_latency_milliseconds_average{chart="system.softirq_latency",dimension="IRQ_POLL",family="softirqs"} 0.0000000 1715237830000
netdata_system_softirq_latency_milliseconds_average{chart="system.softirq_latency",dimension="TASKLET",family="softirqs"} 0.0000000 1715237830000
netdata_system_softirq_latency_milliseconds_average{chart="system.softirq_latency",dimension="SCHED",family="softirqs"} 0.4000001 1715237830000
netdata_system_softirq_latency_milliseconds_average{chart="system.softirq_latency",dimension="HRTIMER",family="softirqs"} 0.0000000 1715237830000
netdata_system_softirq_latency_milliseconds_average{chart="system.softirq_latency",dimension="RCU",family="softirqs"} 0.4000001 1715237830000
netdata_system_softirqs_softirqs_persec_average{chart="system.softirqs",dimension="HI",family="softirqs"} 0.0000000 1715237844000
netdata_system_softirqs_softirqs_persec_average{chart="system.softirqs",dimension="TIMER",family="softirqs"} 53.1620671 1715237844000
netdata_system_softirqs_softirqs_persec_average{chart="system.softirqs",dimension="NET_TX",family="softirqs"} 0.0000000 1715237844000
netdata_system_softirqs_softirqs_persec_average{chart="system.softirqs",dimension="NET_RX",family="softirqs"} 12.3700883 1715237844000
netdata_system_softirqs_softirqs_persec_average{chart="system.softirqs",dimension="BLOCK",family="softirqs"} 0.5258970 1715237844000
netdata_system_softirqs_softirqs_persec_average{chart="system.softirqs",dimension="TASKLET",family="softirqs"} 0.1428571 1715237844000
netdata_system_softirqs_softirqs_persec_average{chart="system.softirqs",dimension="SCHED",family="softirqs"} 69.8122657 1715237844000
netdata_system_softirqs_softirqs_persec_average{chart="system.softirqs",dimension="HRTIMER",family="softirqs"} 0.0000000 1715237844000
netdata_system_softirqs_softirqs_persec_average{chart="system.softirqs",dimension="RCU",family="softirqs"} 137.9602871 1715237844000
$

Hi @ilyam8, thanks for trying to reproduce; I do have this config issue resolved on my side now.

For filtering by URL parameter, with the example you provided, I did address that above in the Additional Info section -- see the log example there. Some common scrape clients, such as grafana-agent, escape the pattern matchers, such as * in the URL, so setting params for filtering through these clients doesn't work because of the automatic URL escaping. I wish it did work there too.