/pacoloco

Caching proxy server for Arch Linux pacman

Primary LanguageGoMIT LicenseMIT

Pacoloco - caching proxy server for pacman

Pacoloco is a web server that acts if it was an Arch Linux pacman repository. Every time pacoloco server gets a request from user it downloads this file from real Arch Linux mirror and bypasses it to the user. Additionally pacoloco saves this file to local filesystem cache and serves it to the future users. It also allows to prefetch updates of the most recently used packages.

How does it help?

Fast internet is still a luxury in many parts of the world. There are many places where access to internet is expensive and slow due to geographical and economical reasons.

Now think about a situation when multiple pacman users connected via fast local network. Each of these users needs to download the same set of files. Pacoloco allows to minimize the Internet workload by caching pacman files content and serving it over fast local network.

Pacoloco does not mirror the whole Arch repository. It only downloads files needed by local users. You can think of pacoloco as a lazy Arch mirror.

Install

Arch systems

Install pacoloco package from the official Arch repository. Then start its systemd service: # systemctl start pacoloco.

Docker

Pacoloco can be used with docker.

You can get a prebuilt image from GitHub's container registry (see also sidebar). Currently the images are built for amd64 and ARM (arm64, armv7) architectures.

docker pull ghcr.io/anatol/pacoloco

Available tags are: latest = git master and any git tags.

You can also build it yourself:

$ git clone https://github.com/anatol/pacoloco && cd pacoloco
$ docker build -t ghcr.io/anatol/pacoloco .

Run it like this:

$ docker run -p 9129:9129 \
    -v /path/to/config/pacoloco.yaml:/etc/pacoloco.yaml \
    -v /path/to/cache:/var/cache/pacoloco \
    ghcr.io/anatol/pacoloco

You need to provide paths or volumes to store application data.

Alternatively, you can use docker-compose:

---
version: "3.8"
services:
  pacoloco:
#   if a specific user id is provided, you have to make sure
#   the mounted directories have the same user id owner on host
#   user: 1000:1000
    container_name: pacoloco
#   to pull the image from github's registry:
    image: ghcr.io/anatol/pacoloco
#   or replace it for for self-building with:
#    build: https://github.com/anatol/pacoloco.git
    ports:
      - "9129:9129"
    volumes:
      - /path/to/cache:/var/cache/pacoloco
      - /path/to/config/pacoloco.yaml:/etc/pacoloco.yaml
    restart: unless-stopped
#   to set time zone within the container for cron and log timestamps:
#    environment:
#      - TZ=Europe/Berlin

Build from sources

Optionally you can build the binary from sources using go build command.

Configure

The server configuration is located at /etc/pacoloco.yaml. Here is an example how the config file looks like:

address: 127.0.0.1
port: 9129
cache_dir: /var/cache/pacoloco
purge_files_after: 360000 # 360000 seconds or 100 hours, 0 to disable
download_timeout: 3600 # download will timeout after 3600 seconds
repos:
  archlinux:
    urls:
      - http://mirror.lty.me/archlinux
      - http://mirrors.kernel.org/archlinux
  quarry:
    url: http://pkgbuild.com/~anatolik/quarry/x86_64
  sublime:
    http_proxy: http://bar.company.com:8989 # Proxy could be enabled per-repo, shadowing the global `http_proxy` (see below)
    url: https://download.sublimetext.com/arch/stable/x86_64
  archlinux-reflector:
    mirrorlist: /etc/pacman.d/reflector_mirrorlist # Be careful! Check that pacoloco URL is NOT included in that file!
http_proxy: http://foo.company.com:8989 # Enable this only if you have pacoloco running behind a proxy
user_agent: Pacoloco/1.2
prefetch: # optional section, add it if you want to enable prefetching
  cron: 0 0 3 * * * * # standard cron expression (https://en.wikipedia.org/wiki/Cron#CRON_expression) to define how frequently prefetch, see https://github.com/gorhill/cronexpr#implementation for documentation.
  ttl_unaccessed_in_days: 30  # defaults to 30, set it to a higher value than the number of consecutive days you don't update your systems
  # It deletes and stop prefetch packages(and db links) when not downloaded after ttl_unaccessed_in_days days that it had been updated.
  ttl_unupdated_in_days: 300 # defaults to 300, it deletes and stop prefetch packages which hadn't been either updated upstream or requested for ttl_unupdated_in_days.
  • cache_dir is the cache directory, this location needs to read/writable by the server process.
  • purge_files_after specifies inactivity duration (in seconds) after which the file should be removed from the cache. This functionality uses unix "AccessTime" field to find out inactive files. Default value is 0 that means never run the purging.
  • address is the servers listening address. When left empty, the server will start opening a listener on all available addresses. When any of the IPs is a public IP, it will make pacoloco available to the internet.
  • port is the server port.
  • download_timeout is a timeout (in seconds) for internet->cache downloads. If a remote server gets slow and file download takes longer than this will be terminated. Default value is 0 that means no timeout.
  • repos is a list of repositories to mirror. Each repo needs name and url of its Arch mirrors. Note that url can be specified either with url or urls properties, one and only one can be used for each repo configuration. Each repo could have its own http_proxy, which would shadow the global http_proxy (see below).
  • http_proxy is only to be used if you have pacoloco running behind a proxy
  • user_agent user agent used to fetch the files from repositories. Default value is Pacoloco/1.2.
  • The prefetch section allows to enable packages prefetching. Comment it out to disable it.
  • To test out if the cron value does what you'd expect to do, check cronexpr implementation or test it
  • For what regards mirrorlist, be sure that pacoloco itself is NOT included in the chosen mirrorlist file. It can be integrated with reflector too, either by changing reflector's output path or by including pacoloco directly for standard repos in /etc/pacman.conf (e.g. adding a Server=... entry or a custom mirrorlist file which includes only pacoloco URL).

With the example configured above http://YOURSERVER:9129/repo/archlinux looks exactly like an Arch pacman mirror. For example a request to http://YOURSERVER:9129/repo/archlinux/core/os/x86_64/openssh-8.2p1-3-x86_64.pkg.tar.zst will be served with file content from http://mirror.lty.me/archlinux/core/os/x86_64/openssh-8.2p1-3-x86_64.pkg.tar.zst

Once the pacoloco server is up and running it is time to configure the user host. Modify user's /etc/pacman.conf with

[core]
Include = /etc/pacman.d/mirrorlist

[extra]
Include = /etc/pacman.d/mirrorlist

[quarry]
Server = http://yourpacoloco:9129/repo/quarry

[sublime-text]
Server = http://yourpacoloco:9129/repo/sublime

And /etc/pacman.d/mirrorlist with

Server = http://yourpacoloco:9129/repo/archlinux/$repo/os/$arch

That's it. Since now pacman requests will be proxied through our pacoloco server.

Handling multiple architectures

pacoloco does not care about the architecture of your repo as it acts as a mere proxy.

Thus it can handle multiple different arches transparently. One way to do it is to add multiple repositories with names foobar_$arch e.g.:

repos:
  archlinux_x86_64:
    urls:
      - http://mirror.lty.me/archlinux
      - http://mirrors.kernel.org/archlinux
  archlinux_armv7h:
    url: http://mirror.archlinuxarm.org
  archlinux_x86:
    url: http://mirror.clarkson.edu/archlinux32

Then modify user's /etc/pacman.d/mirrorlist and add

For x86_64:

Server = http://yourpacoloco:9129/repo/archlinux_$arch/$repo/os/$arch

For armv7h:

Server = http://yourpacoloco:9129/repo/archlinux_$arch/$arch/$repo

For x86:

Server = http://yourpacoloco:9129/repo/archlinux_$arch/$arch/$repo

Please note that archlinux_$arch is the repo name in pacoloco.yaml.

Credits

Huge thanks to all the people who contributed to this project! Pacoloco would not be able to become successful without your help.