wgrep
(world wide web grep) search for patterns in a web site directory hierarchy over HTTP, through hypertext references.
By default, it searches for patterns inside paragraphs.
The tool is inspired by GNU grep(1)
and wget(1)
.
wgrep PATTERN URL [flags]
For details please read the CLI documentation.
wgrep --recursive|-r PATTERN URL
wgrep --ignore-case|-i PATTERN URL
While referenced locations that have a host name different from the one specified in the URL
argument are skipped by default, it's possible to include only locations of which HTTP path follows a specific pattern.
Similarly to how grep
allows with the --include
flag to include specific locations in the search, it's possible to filter the pages by URL when recursively look for a pattern.
The include location filter pattern supports regular expressions in the Go flavor.
wgrep -r --include "my-section\/.+" PATTERN URL
By default, the element filter is set to "p", as standard paragraphs are represented in HTML. However this filter can be customized with the --element
|-e
flag:
wgrep --element|-e "article" PATTERN URL
The element filter supports GoQuery patterns. For example, this allows to select elements based on class attributes:
wgrep -e ".my-class" PATTERN URL
For more information about the selector syntax please refer to the GoQuery documentation.
$ wgrep --include "posts\/" -ri kubernetes https://blog.maxgio.me
https://blog.maxgio.me/posts/k8s-stride-05-denial-of-service/:
Users that are authorized to make patch requests to the Kubernetes API server can send a specially crafted patch of type json-patch (e.g. kubectl patch - type json or Content-Type: application/json-patch+json) that consumes excessive resources while processing, causing a denial of service on the API server.
https://blog.maxgio.me/posts/stride-threat-modeling-kubernetes-elevation-of-privileges/: Hello everyone, a long time has passed after the 5th part of this journey through STRIDE thread modeling in Kubernetes has been published.
If you recall well, STRIDE is a model of threats for identifying security threats, by providing a mnemonic for security threats in six categories:
https://blog.maxgio.me/posts/stride-threat-modeling-kubernetes-elevation-of-privileges/:
In Kubernetes Role-Based Access Control authorizes or not access to Kubernetes resources through roles, but we also have underlying infrastructure resources, and Kubernetes provides primitives to authorize workload to access operating system resources, like Linux namespaces.
...