Script to generate troublesome filenames from the big list of naughty strings: blns
The output should be useful for testing the file-handling capabilities of most systems that read files from disk. Digital preservation systems anticipate a lot of heterogeneous data and so this script is written with testing those systems in mind.
Big-List-of-Naughty-Files (BLNF) converts the Big List of Naughty Strings (BLNS) to file names and outputs sample files for each.
Insprired initially by Max Woolfe who converted the list to JSON. This script now takes the plain-text rendition from the BLNS repository to generate its output.
To get started, run the following:
git clone git@github.com:ross-spencer/big-list-of-naughty-files.git \
&& cd big-list-of-naughty-files
python -m venv venv
source venv/bin/activate
python -m pip install -r requirements/requirements.txt
python blnf.py
The script tries to convert as many strings from BLNS to file names as possible.
python blnf.py
outputs three folders:
* blnf-output/ <-- contains the two folders below
* files <-- contains files we were able to write to the file system.
* converted files <-- contains files we couldn't but simplified the
string for, so may still be challenging.
You are very likely to have better luck running this script on Linux. Microsoft control names for example are part of the set.
Is the script dangerous? I don't know. Don't run it as sudo
. When using the
output against your systems, try to only use it on test servers.
Files are output as follows:
├──blnf-output
├───files
└───files-converted
And take the form:
* `<blns-filename>.<unique-string>.blnf`
At the time of writing the script outputs: 2 directories, 511 files
on
Ubuntu 22.04.4 LTS
running Python 3.10.12
.
By default, all files share the same checksum:
e049fd364abc35f24fd66f056d30f95c
which might be an important in systems that
perform deduplication.
Filenames that are generated are guaranteed to be unique with the generation and appending of a UUID to the filename.
It might be beneficial to make each file's payload unique but predictable. The same with the filename.
It might be possible to achieve this by hashing the anticipated filename and writing it to the file's contents and then using a part of the hash as the filename in-place of the UUID.
Some interesting and innocuous looking strings I have found some systems struggle with when testing them alongside the BLNF:
!@#$%^&*()`~
<>?:"{}|_+
Ω≈ç√∫˜µ≤≥÷
Setup a virtual environment venv
and install the local development
requirements as follows:
python3 -m venv venv
source venv/bin/activate
python -m pip install -r requirements/local.txt
The project structure is as follows:
src
└── blnf
├── blnf.py
├── blns.txt
├── __init__.py
blns.txt
contains a snapshot of the big list of naughty strings for
processing.
python -m tox
python -m tox -e py3
python -m tox -e linting
Pre-commit can be used to provide more feedback before committing code. This reduces reduces the number of commits you might want to make when working on code, it's also an alternative to running tox manually.
To set up pre-commit, providing pip install
has been run above:
pre-commit install
This repository contains a default number of pre-commit hooks, but there may be others suited to different projects. A list of other pre-commit hooks can be found here.
The Makefile
contains helper functions for packaging and release.
Makefile functions can be reviewed by calling make
from the root of this
repository:
clean Clean the package directory
help Print this help message.
package-check Check the distribution is valid
package-deps Upgrade dependencies for packaging
package-source Package the source code
package-upload Upload package to pypi
package-upload-test Upload package to test.pypi
tar-source Package repository as tar for easy distribution
Packaging consumes the metadata in pyproject.toml
which helps to describe
the project on the official pypi.org repository. Have a look at the
documentation and comments there to help you create a suitably descriptive
metadata file.
To create a python wheel for testing locally, or distributing to colleagues run:
make package-source
A tar
and whl
file will be stored in a dist/
directory. The whl
file
can be installed as follows:
pip install <your-package>.whl
Publishing for public use can be achieved with:
make package-upload-test
ormake package-upload
make-package-upload-test
will upload the package to test.pypi.org
which provides a way to look at package metadata and documentation and ensure
that it is correct before uploading to the official pypi.org
repository using make package-upload
.