DumbLinks is a simple C++ program designed to fetch a webpage and extract all hyperlinks from it. Utilizing the libcurl
library, it performs HTTP requests to retrieve the HTML content of a user-specified URL. It then parses the HTML using regular expressions to find all <a>
tags and extracts the URLs from their href
attributes.
- Fetch Web Content: Retrieves HTML content from a given URL entered by the user.
- Automatic Redirect Handling: Follows HTTP redirects to fetch the final destination content.
- Link Extraction: Parses the HTML to extract all hyperlinks (
<a href="...">
). - Console Output: Displays the list of found links directly in the console.
-
Compile the Program:
Ensure you have
libcurl
installed on your system. Compile the program using the following command:g++ -o DumbLinks DumbLinks.cpp -lcurl
-
Run the Program:
Execute the compiled program:
./DumbLinks
-
Enter the URL:
When prompted, input the hostname or URL from which you want to extract links:
Enter hostname: example.com
Note: If you don't include
http://
orhttps://
, the program will automatically prependhttp://
to the hostname. -
View Extracted Links:
The program will fetch the webpage content, parse it, and display all found hyperlinks in the console:
Found link: https://www.iana.org/domains/example
Enter hostname: example.com
URL being requested: https://example.com
Found link: https://www.iana.org/domains/example
- C++ Compiler: A compiler that supports C++11 or later (e.g., GCC, Clang).
- libcurl: The curl library for handling HTTP requests.
- Install on Debian/Ubuntu:
sudo apt-get install libcurl4-openssl-dev
- Install on CentOS/Fedora:
sudo dnf install libcurl-devel
- Install on Debian/Ubuntu:
- HTML Parsing: Uses regular expressions for parsing HTML, which may not handle all HTML edge cases or malformed HTML.
Contributions are welcome! Feel free to submit issues or pull requests to improve the functionality or fix bugs.
For any questions or suggestions, please contact @dumbbutt0 .