/urlfix

Check and Fix Outdated URLs

Primary LanguagePythonMIT LicenseMIT

urlfix: Check and Fix Outdated URLs

PyPI version fury.io DOI Project Status Codecov Test-Package PyPI license Documentation Status Total Downloads Monthly Downloads Weekly Downloads Maintenance GitHub last commit GitHub issues GitHub issues-closed

urlfix aims to find all outdated URLs in a given file and fix them.

Features List

  • Commandline and programmer-friendly modes.

  • Replace outdated URLs/links in a single file

  • Replace outdated URLs/links in a directory

  • Replace outdated URLs/links in the same file or in the same files in a directory i.e. inplace.

  • Replace outdated links in files in nested directories

  • Replace outdated links in files in sub-nested directories

Supported file formats

urlfix fixes URLs given a file of the following types:

  • MarkDown (.md)

  • Plain Text files (.txt)

  • RMarkdown (.rmd)

  • ReStructured Text (.rst)

  • PDF (.pdf)

  • Word (.docx)

  • ODF (.odf)

Installation

The simplest way to install the latest release is as follows:

pip install urlfix

To install the development version:

Open the Terminal/CMD/Git bash/shell and enter

pip install git+https://github.com/Nelson-Gon/urlfix.git

# or for the less stable dev version
pip install git+https://github.com/Nelson-Gon/urlfix.git@dev

Otherwise:

# clone the repo
git clone git@github.com:Nelson-Gon/urlfix.git
cd urlfix
pip install -e . 

Sample usage

Script Mode

To use at the commandline, please use:

python -m urlfix --mode "f" --verbose 1 --inplace 1 --inpath myfile.md

If not replacing within the same file, then:

python -m urlfix --mode "f" --verbose 1 --inplace 0 --inpath myfile.md --output-file myoutputfile.md

To get help:

python -m urlfix -h 

#usage: main.py [-h] -m MODE -in INPUT_FILE [-o OUTPUT_FILE] -v {False,false,0,True,true,1} -i {False,false,0,True,true,1}
#
#optional arguments:
#  -h, --help            show this help message and exit
#  -m MODE, --mode MODE  Mode to use. One of f for file or d for directory
#  -in INPUT_FILE, --input-file INPUT_FILE
#                        Input file for which link updates are required.
#  -o OUTPUT_FILE, --output-file OUTPUT_FILE
#                        Output file to write to. Optional, only necessary if not replacing inplace
#  -v {False,false,0,True,true,1}, --verbose {False,false,0,True,true,1}
#                        String to control verbosity. Defaults to True.
#  -i {False,false,0,True,true,1}, --inplace {False,false,0,True,true,1}
#                        Should links be replaced inplace? This should be safe but to be sure, test with an output file first.


Programmer-Friendly Mode

from urlfix.urlfix import URLFix
from urlfix.dirurlfix import DirURLFix

Create an object of class URLFix

urlfix_object = URLFix("testfiles/testurls.txt", output_file="replacement.txt")

Replacing URLs

After creating our object, we can replace outdated URLs as follows:

urlfix_object.replace_urls(verbose=1)

The above uses default arguments and will not replace a file inplace. This is a safety mechanism to ensure one does not damage their files.

Since we set verbose to True, we get the following output:

urlfix_object.replace_urls()

To replace silently, simply set verbose to False (which is the default).

urlfix_object.replace_urls()

If there are URLs known to be valid, pass these to the correct_urls argument to save some time.

urlfix_object.replace_urls(correct_urls=[urls_here]) # Use a Sequence eg tuple, list, etc

Replacing several files in a directory

To replace several files in a directory, we can use DirURLFix as follows.

  • Instantiate an object of class DirURLFix
replace_in_dir = DirURLFix("path_to_dir")
  • Call replace_urls
replace_in_dir.replace_urls()

Recursively replacing links in nested directories

To replace outdated links in several files located in several directories, we set recursive to True. Currently, replacing links in directories nested within nested directories is not (yet) supported.

recursive_object = DirURLFix("path_to_root_directory", recursive=True)

We can then proceed as above

recursive_object.replace_urls() # provide other arguments as you may wish. 

To report any issues, suggestions or improvement, please do so at issues.

If you would like to cite this work, please use:

Nelson Gonzabato (2021) urlfix: Check and Fix Outdated URLs https://github.com/Nelson-Gon/urlfix

Thank you very much.

“Before software can be reusable it first has to be usable.” – Ralph Johnson