Upa URL is WHATWG URL Standard compliant URL parser library written in C++.
The library is self-contained with no dependencies and requires a compiler that supports C++11 or later. It is known to compile with Clang 4, GCC 4.9, Microsoft Visual Studio 2015 or later. Can also be compiled to WebAssembly, see demo: https://upa-url.github.io/demo/
This library is up to date with the URL Standard published on 19 August 2024 (commit 5861e02) and supports internationalized domain names as specified in the UTS46 Unicode IDNA Compatibility Processing version 15.1.0.
It implements:
- URL class:
upa::url
- URLSearchParams class:
upa::url_search_params
- URL record:
upa::url
has functions to examine URL record members:get_part_view(PartType t)
,is_empty(PartType t)
andis_null(PartType t)
- URL equivalence:
upa::equals
function - Percent decoding and encoding functions:
upa::percent_decode
,upa::percent_encode
andupa::encode_url_component
It has some differences from the standard:
- Setters of the
upa::url
class are implemented as functions, which returntrue
if value is accepted. - The
href
setter does not throw on parsing failure, but returnsfalse
.
Upa URL contains features not specified in the standard:
- The
upa::url
class haspath
getter (to getpathname
concatenated withsearch
) - Function to convert file system path to file URL:
upa::url_from_file_path
- Function to get file system path from file URL:
upa::path_from_file_url
- Experimental URLHost class (see proposal: whatwg/url#288):
upa::url_host
- The
upa::url_search_params
class has a few additional functions:remove
,remove_if
For string input, the library supports UTF-8, UTF-16, UTF-32 encodings and several string types, including std::basic_string
, std::basic_string_view
, null-terminated strings of any char type: char
, char8_t
, char16_t
, char32_t
, or wchar_t
. See "String input" for more information.
The simplest way is to use two amalgamated files: url.h
and url.cpp
. You can download them from releases page, or if you have installed Python, then generate them by running tools/amalgamate.sh
script (tools/amalgamate.bat
on Windows). The files will be created in the single_include/upa
directory.
The library can be built and installed using CMake 3.13 or later. To build and install to default directory (usually /usr/local
on Linux) run following commands:
cmake -B build -DUPA_BUILD_TESTS=OFF
cmake --build build
cmake --install build
To use library add find_package(upa REQUIRED)
and link to upa::url
target in your CMake project:
find_package(upa REQUIRED)
...
target_link_libraries(exe-target PRIVATE upa::url)
The entire library source tree can be placed in subdirectory (say url/
) of your project and then included in it with add_subdirectory()
:
add_subdirectory(url)
...
target_link_libraries(exe-target PRIVATE upa::url)
include(FetchContent)
FetchContent_Declare(upa
GIT_REPOSITORY https://github.com/upa-url/upa.git
GIT_SHALLOW TRUE
GIT_TAG v1.0.2
)
FetchContent_MakeAvailable(upa)
...
target_link_libraries(exe-target PRIVATE upa::url)
If you are using the CPM.cmake script and have included it in your CMakeLists.txt
, then:
CPMAddPackage("gh:upa-url/upa@1.0.2")
...
target_link_libraries(exe-target PRIVATE upa::url)
For use in C++17 or later projects, install Upa URL with vcpkg install upa-url
.
For use in C++11 and C++14 projects: vcpkg install upa-url[cxx11]
.
In source files, that use this library, the upa/url.h
must be included:
#include "upa/url.h"
If you are using CMake, see the CMake section for how to link to the library. Alternatively, if you are using amalgamated files, then add the amalgamated url.cpp
file to your project, otherwise add all the files from the src/
directory to your project.
Parse input string using url::parse
function and output URL components:
#include "upa/url.h"
#include <iostream>
#include <string>
int main() {
upa::url url;
std::string input;
std::cout << "Enter URL to parse, or empty line to exit\n";
while (std::getline(std::cin, input) && !input.empty()) {
if (upa::success(url.parse(input))) {
std::cout << " href: " << url.href() << '\n';
std::cout << " origin: " << url.origin() << '\n';
std::cout << " protocol: " << url.protocol() << '\n';
std::cout << " username: " << url.username() << '\n';
std::cout << " password: " << url.password() << '\n';
std::cout << " hostname: " << url.hostname() << '\n';
std::cout << " port: " << url.port() << '\n';
std::cout << " pathname: " << url.pathname() << '\n';
std::cout << " search: " << url.search() << '\n';
std::cout << " hash: " << url.hash() << '\n';
} else {
std::cout << " URL parse error\n";
}
}
}
Parse URL against base URL using url
constructor:
try {
upa::url url{ "/new path?query", "https://example.org/path" };
std::cout << url.href() << '\n';
}
catch (const std::exception& ex) {
std::cerr << "Error: " << ex.what() << '\n';
}
Use setters of the url
class:
upa::url url;
if (upa::success(url.parse("http://host/"))) {
url.protocol("https:");
url.host("example.com:443");
url.pathname("kelias");
url.search("z=7");
url.hash("top");
std::cout << url.href() << '\n';
}
Enumerate search parameters of URL:
upa::url url{ "wss://h?first=last&op=hop&a=b" };
for (const auto& param : url.search_params()) {
std::cout << param.first << " = " << param.second << '\n';
}
Remove search parameters whose names start with "utm_" (requires C++20):
upa::url url{ "https://example.com/?id=1&utm_source=twitter.com&utm_medium=social" };
url.search_params().remove_if([](const auto& param) {
return param.first.starts_with("utm_");
});
std::cout << url.href() << '\n'; // https://example.com/?id=1
Convert filesystem path to file URL:
try {
auto url = upa::url_from_file_path("/home/opa/file.txt", upa::file_path_format::posix);
std::cout << url.href() << '\n'; // file:///home/opa/file.txt
}
catch (const std::exception& ex) {
std::cerr << "Error: " << ex.what() << '\n';
}
This library is licensed under the BSD 2-Clause License. It contains portions of modified source code from the Chromium project, licensed under the BSD 3-Clause License, and the ICU project, licensed under the UNICODE LICENSE V3.
See the LICENSE
file for more information.