This python script scrapes all the license files and automates the task of detecting broken links, timeout error and other link issues
- Pre-requisite
- Installation
- Usage
- Integrating with CI
- Unit Testing
- Troubleshooting
- Code of Conduct
- Contributing
- License
- Python3
- UTF-8 supported console
There are two suggested ways of installation. Use User, if you are interested in just running the script. Use Development, if you are interested in developing the script
- Clone the repo
git clone https://github.com/creativecommons/cc-link-checker.git
- Install dependencies
Using Pipfile (requires pipenv):
pipenv install
We recommend using pipenv to create a virtual environment and install dependencies
-
Clone the repo
git clone https://github.com/creativecommons/cc-link-checker.git
-
Create virtual environment and install all dependencies
pipenv install --dev
- To install last successful environment:
pipenv install --dev --ignore-pipfile
- To install last successful environment:
-
Either:
- Activate project's virtual environment:
pipenv shell
- Run the script:
pipenv run link_checker.py
- Activate project's virtual environment:
It provides the help text related to the script
pipenv run link_checker.py -h
usage: link_checker.py [-h] [--local] [--output-errors [output_file]] [-q]
[--root-url ROOT_URL] [-v]
Check for broken links in Creative Commons licenses
optional arguments:
-h, --help show this help message and exit
--local Scrapes license files from local file system
--output-errors [output_file]
Outputs all link errors to file (default: errorlog.txt)
and creates junit-xml type summary(test-summary/junit-xml-
report.xml)
-q, --quiet Decrease verbosity. Can be specified multiple times.
--root-url ROOT_URL Set root URL (default: https://creativecommons.org)
-v, --verbose Increase verbosity. Can be specified multiple times.
This mode shows which file is currently being checked along with warnings and errors encountered in the links
pipenv run link_checker.py
This flag decreases the verbosity of the output. This mode is useful for reducing the noise. By default, WARNING and higher output is displayed.
pipenv run link_checker.py -q
This flag increases the verbosity of the output. This mode is useful for in-depth debugging. By default, WARNING and higher output is displayed.
pipenv run link_checker.py -v
This flag outputs all the link errors to file. By default, the output is saved
in file errorlog.txt
pipenv run link_checker.py --output-error
The output file can also be explicitly defined by passing a value to the flag
pipenv run link_checker.py --output-error output\results.txt
This flag also creates a junit-xml
format summary of script run containing
number of error links and number of unique error links.
The location of this file will be test-summary/junit-xml-report.xml
. This xml
file can be passed to CI to show failure result.
This flag allows script to test license files stored locally rather than fetching each license file from Github.
The relative directory structure should be:
/
├── cc-link-checker/
│ ├── link_checker.py
│ ├── Pipfile
│ ├── Pipfile.lock
│ .
| .
|
├── creativecommons.org/
│ ├── docroot
| | ├── legalcode
| | | ├── by-nc-nd_4.0.html
│ . . .
| . . .
|
This mode can be helpful for using script as a CI.
Note: You can manually change the relative local path by changing
LICENSE_LOCAL_PATH
global variable in the script.
Due to the script capability to scrape licenses from local storage, it can be used as CI in 2 easy steps:
-
Clone this repo in the CI container
git clone https://github.com/creativecommons/cc-link-checker.git ~/cc-link-checker
-
Run the
link_checker.py
in local(--local
) and output error(--output-error
) modepython link_checker.py --local --output-errors
The configuration for GitHub Actions, for example, is present here.
Unit tests have been written using pytest framework. The tests can be run using:
- Install dev dependencies
pipenv install --dev
- Run unit tests
pipenv run pytest -v
-
UnicodeEncodeError
:This error is thrown when the console is not UTF-8 supported.
-
Failing Lint build:
Currently we follow customised black code style alongwith flake8. The black configuration and flake8 configuration are present in the repo. Do follow them to pass the CI build:
black ./
flake8 ./
The Creative Commons team is committed to fostering a welcoming community. This project and all other Creative Commons open source projects are governed by our Code of Conduct. Please report unacceptable behavior to conduct@creativecommons.org per our reporting guidelines.
We welcome contributions for bug fixes, enhancement and documentation. Please
follow CONTRIBUTING.md
while contributing.