Python Debian Package Parser
Prolog
This document is markdown format and best viewed using a markdown capable viewer, such as pycharm or visual studio code. Alternatively copy/paste into an online markdown viewer such as https://dillinger.io
Introduction
This project uses python to show statistics of the debian packages (for a specific architecture) that have the most files associated with them.
It uses the online data at http://ftp.uk.debian.org/debian/dists/stable/main/. to do so.
See challenge.txt for furter details.
Terminology
Listing File
http://ftp.uk.debian.org/debian/dists/stable/main/.
from here available contents files (see below) can be discerned
Contents (index) File
"Architecture" specific (one->many) mapping of file->packages eg http://ftp.uk.debian.org/debian/dists/stable/main/Contents-[Arch].gz
official documentation:
https://wiki.debian.org/RepositoryFormat#A.22Contents.22_indices
Prerequisites
Python 3.10+
If you have multiple versions of python installed, please make sure the correct one is being referenced in the files in _prj/script_
Setup
ex_auto/prj/script> ./venv_create.sh
This will create a virtual environment and install the required packages (as listed in prj/pip_reqs.txt)
-
Note: on Ubuntu one may need to first separately install python3-venv
sudo apt install python3-venv
-
Note: Poetry has superior dependency handling, and should be preferred in a production environment. Direct use of pip was chosen here, however, as poetry requires global modifications on the developers machine, and I wanted to avoid requiring this of anyone else, wishing to run/test this project
-
Note: If you are behind a proxy you may need to configure https_proxy (eg export as an environment var) appropriately for pip to be able to download the packages, and the app to download the files. eg:
> export http_proxy=http://10.10.1.10:3128 > export https_proxy=http://10.10.1.10:1080
-
Execute
ex_auto> ./prj/script/venv/bin/python -m src.main udeb-all
collating for 'Contents-udeb-all.gz' ....
downloading http://ftp.uk.debian.org/debian/dists/stable/main/ ...
5.52kB [00:00, 7.63MB/s]
.... download complete
downloading http://ftp.uk.debian.org/debian/dists/stable/main/Contents-udeb-all.gz ...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 13.2k/13.2k [00:00<00:00, 9.49MB/s]
.... download complete
extracting...
Package statistics for 'udeb-all':
| | Package | Num Files |
|---:|:---------------------------------------------------|------------:|
| 1 | debian-installer/xkb-data-udeb | 293 |
| 2 | debian-installer/fonts-noto-unhinted-udeb | 270 |
| 3 | debian-installer/ca-certificates-udeb | 259 |
| 4 | debian-installer/console-keymaps-at | 128 |
| 5 | debian-installer/debootstrap-udeb | 72 |
| 6 | debian-installer/console-keymaps-acorn | 57 |
| 7 | debian-installer/console-setup-linux-charmaps-udeb | 56 |
| 8 | debian-installer/debian-edu-profile-udeb | 50 |
| 9 | debian-installer/kickseed-common | 40 |
| 10 | debian-installer/console-setup-freebsd-fonts-udeb | 36 |
Test
ex_auto> ./prj/script/venv/bin/pytest test/
========================================================================= test session starts ==========================================================================
platform darwin -- Python 3.10.7, pytest-7.2.0, pluggy-1.0.0
rootdir: /Users/declan/Documents/zone/high/dev/eigene/canonical/ex_auto
collected 15 items
test/test__cts_arch.py . [ 6%]
test/test__man_app.py ... [ 26%]
test/test__man_file.py ... [ 46%]
test/test__man_parser_.py .. [ 60%]
test/test__man_util.py .. [ 73%]
test/test__man_web.py .... [100%]
========================================================================== 15 passed in 0.77s ==========================================================================
Check Typing
ex_auto> ./prj/script/venv/bin/mypy --check-untyped-defs -p test -p src
Success: no issues found in 16 source files
Declans-iMac:ex_auto declan$ ./prj/script/venv/bin/pylint src test
Check Lint
ex_auto> ./prj/script/venv/bin/pylint src test
--------------------------------------------------------------------
Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)
Notes:
Some of the wording was ambiguous in the document at:
In any cases of ambiguity a reasonable best guess was made. For example:
'The first row of the table SHOULD have the columns "FILE" and "LOCATION" '
Rows don't have columns, and it is not fully clear if a literal
"FILE LOCATION"
should be expected and ignored, or if this is just trying to describe schema (latter was presumed)
'A filename relative to the root directory, without leading .'
Its actually a file path (filename cant contain "/")
"possibly taking advantage of the fact that package names cannot include
white space characters"
Its was furthermore presumed there can also be no spaces -between- package names
If any of the above presumptions are incorrect, then the code may need to be revised.