/cc-link-checker

Automated link checker for legalcode and license URLs

Primary LanguagePythonMIT LicenseMIT

Creative Commons Link Checker

This python script scrapes all the license files and automates the task of detecting broken links, timeout error and other link issues

unitAndLint Licence: MIT Code style: black chat: on Slack

Table of Contents

Pre-requisite

  • Python3
  • UTF-8 supported console

Installation

There are two suggested ways of installation. Use User, if you are interested in just running the script. Use Development, if you are interested in developing the script

User

  1. Clone the repo
    git clone https://github.com/creativecommons/cc-link-checker.git
  2. Install dependencies Using Pipfile (requires pipenv): pipenv install

Development

We recommend using pipenv to create a virtual environment and install dependencies

  1. Clone the repo

    git clone https://github.com/creativecommons/cc-link-checker.git
  2. Create virtual environment and install all dependencies

    pipenv install --dev
    • To install last successful environment: pipenv install --dev --ignore-pipfile
  3. Either:

    • Activate project's virtual environment:
      pipenv shell
    • Run the script:
      pipenv run link_checker.py

Usage

-h or --help

It provides the help text related to the script

pipenv run link_checker.py -h
usage: link_checker.py [-h] [--local] [--output-errors [output_file]] [-q]
                       [--root-url ROOT_URL] [-v]

Check for broken links in Creative Commons licenses

optional arguments:
  -h, --help            show this help message and exit
  --local               Scrapes license files from local file system
  --output-errors [output_file]
                        Outputs all link errors to file (default: errorlog.txt)
                        and creates junit-xml type summary(test-summary/junit-xml-
                        report.xml)
  -q, --quiet           Decrease verbosity. Can be specified multiple times.
  --root-url ROOT_URL   Set root URL (default: https://creativecommons.org)
  -v, --verbose         Increase verbosity. Can be specified multiple times.

Default mode

This mode shows which file is currently being checked along with warnings and errors encountered in the links

pipenv run link_checker.py

-q or --quiet

This flag decreases the verbosity of the output. This mode is useful for reducing the noise. By default, WARNING and higher output is displayed.

pipenv run link_checker.py -q

-v or --verbose

This flag increases the verbosity of the output. This mode is useful for in-depth debugging. By default, WARNING and higher output is displayed.

pipenv run link_checker.py -v

--output-error

This flag outputs all the link errors to file. By default, the output is saved in file errorlog.txt

pipenv run link_checker.py --output-error

The output file can also be explicitly defined by passing a value to the flag

pipenv run link_checker.py --output-error output\results.txt

This flag also creates a junit-xml format summary of script run containing number of error links and number of unique error links.

The location of this file will be test-summary/junit-xml-report.xml. This xml file can be passed to CI to show failure result.

--local

This flag allows script to test license files stored locally rather than fetching each license file from Github.

The relative directory structure should be:

/
├── cc-link-checker/
│   ├── link_checker.py
│   ├── Pipfile
│   ├── Pipfile.lock
│   .
|   .
|
├── creativecommons.org/
│   ├── docroot
|   |   ├── legalcode
|   |   |   ├── by-nc-nd_4.0.html
│   .   .   .
|   .   .   .
|

This mode can be helpful for using script as a CI.

Note: You can manually change the relative local path by changing LICENSE_LOCAL_PATH global variable in the script.

Integrating with CI

Due to the script capability to scrape licenses from local storage, it can be used as CI in 2 easy steps:

  1. Clone this repo in the CI container

    git clone https://github.com/creativecommons/cc-link-checker.git ~/cc-link-checker
  2. Run the link_checker.py in local(--local) and output error(--output-error) mode

    python link_checker.py --local --output-errors

The configuration for GitHub Actions, for example, is present here.

Unit Testing

Unit tests have been written using pytest framework. The tests can be run using:

  1. Install dev dependencies
    pipenv install --dev
  2. Run unit tests
    pipenv run pytest -v

Troubleshooting

  • UnicodeEncodeError:

    This error is thrown when the console is not UTF-8 supported.

  • Failing Lint build:

    Currently we follow customised black code style alongwith flake8. The black configuration and flake8 configuration are present in the repo. Do follow them to pass the CI build:

    black ./
    flake8 ./
    

Code of Conduct

CODE_OF_CONDUCT.md:

The Creative Commons team is committed to fostering a welcoming community. This project and all other Creative Commons open source projects are governed by our Code of Conduct. Please report unacceptable behavior to conduct@creativecommons.org per our reporting guidelines.

Contributing

We welcome contributions for bug fixes, enhancement and documentation. Please follow CONTRIBUTING.md while contributing.

License