Index your objects
The purpose of this project is to allow using cheap Single Board Computers with one or two cheap HDDs each to store important data. No RAID, it only works well with expensive disks and still has a single point of failure in the controller, and is difficult to recover. No NAS/NFS; doing a cluster is too difficult. HTTP-based object store is the way to go.
The goal is not to try to replicate POSIX/NFS but to store WORM large files with basic metadata in a way that is better than a POSIX filesystem.
Inspired by projects like:
Consume S3 API(s) (from MinIO or the like) and expose a rich metadata store.
pip3 install https://github.com/ctengel/objectindex/archive/refs/heads/main.zip
There are then a few different ways to use this:
- RESTful API:
FLASK_APP=obj_idx.api OBJIDX_SETTINGS=/path/to/api.cfg flask run --host=0.0.0.0
- need minio running first and setup
- see
obj-idx-admin setup
- see
- need postgres running and setup
- see
OBJIDX_SETTINGS=/path/to/api.cfg python3 -m obj_idx.db_create
- see
- need API config file (see below)
- need minio running first and setup
- GUI:
FLASK_APP=obj_idx.gui OBJIDX_GUI_SETTINGS=/path/to/gui.cfg flask run --port 5001 --host=0.0.0.0
- need GUI config file (see below)
- CLI client:
obj-idx-client
Hardware and such:
- Raspberry Pi 3B, 3B+, 400
- starting specifically with 3B+
- tuning may be needed for Pis older than 4/400
- External USB hard drive with SMR
- note that HDDs like this don't play well with having additional USB devices plugged in like an SSD; if you want to do this you will need to have an extra power source like a USB hub
- ext4 format
- strongly considering xfs
- standalone/non erasure
- note that single node single drive MinIO has been deprecated in late 2022 - single drive erasure coding has been introduced so using that now
- 32GB mini SDHC
- keep the swap here; putting on USB just overloads USB power/traffic
- Download
2022-04-04-raspios-bullseye-arm64-lite.img.xz
or similar from https://www.raspberrypi.com/software/operating-systems/ xzcat 2022-04-04-raspios-bullseye-arm64-lite.img.xz | sudo dd of=/dev/sda bs=4096
-
Boot
-
sudo raspi-config
- ssh
- hostname
- disable autologin
- locale
- handle wifi killswitch?
- etc
-
/etc/dhcpcd.conf
interface eth0 static ip_address=192.168.1.254/24 static routers=192.168.1.1 static domain_name_servers=192.168.1.1
-
sudo apt update; sudo apt upgrade
-
sudo parted -a optimal /dev/sdX
$ sudo parted -a optimal /dev/sdX GNU Parted 3.4 Using /dev/sdX Welcome to GNU Parted! Type 'help' to view a list of commands. (parted) help ... (parted) mklabel New disk label type? gpt Warning: The existing disk label on /dev/sdb will be destroyed and all data on this disk will be lost. Do you want to continue? Yes/No? y (parted) mkpart Partition name? []? ... File system type? [ext2]? ext4 Start? 0% End? 100% (parted) print Model: ... Disk /dev/sdb: 2000GB Sector size (logical/physical): 512B/512B Partition Table: gpt Disk Flags: Number Start End Size File system Name Flags 1 1049kB 2000GB 2000GB ext4 ... (parted) quit Model: Seagate BUP Portable (scsi) Disk /dev/sda: 5001GB Sector size (logical/physical): 512B/4096B Partition Table: gpt Disk Flags: Number Start End Size File system Name Flags 1 1049kB 5001GB 5001GB ext4 obj1data
-
sudo mkfs.ext4 /dev/sda1
-
sudo mkdir /mnt/obj1data
-
sudo blkid -s PARTUUID /dev/sda1
-
/etc/fstab
:PARTUUID= /mnt/obj1data ext4 defaults,noatime 0 2
- set noauto to prevent attempt to mount at boot, if swapping removable drives
-
sudo useradd -mU minio
- alternatively
groupadd -g 1234 minio; useradd -m -u 1234 -g 1234 minio
may be used to set a certain UID/GID userdel -r minio
can be used to uninstall`
sudo chown minio:minio /mnt/obj1data
sudo apt install screen
We need to periodically monitor and tune hardware:
/usr/bin/vcgencmd measure_temp
- see https://www.blackmoreops.com/2014/09/22/linux-kernel-panic-issue-fix-hung_task_timeout_secs-blocked-120-seconds-problem/
echo 1440 | sudo tee /sys/block/sda/device/timeout
echo 720 | sudo tee /sys/block/sda/device/eh_timeout
- see
/etc/sysctl.d
- check SMART for the disk
sudo smartctl -a /dev/sda
- other articles -
- https://unix.stackexchange.com/questions/541463/how-to-prevent-disk-i-o-timeouts-which-cause-disks-to-disconnect-and-data-corrup
- https://www.snia.org/sites/default/files/SDC15_presentations/smr/HannesReinecke_Strategies_for_running_unmodified_FS_SMR.pdf
- https://www.usenix.org/system/files/login/articles/login_summer17_03_aghayev.pdf
sudo shutdown -r now; exit
wget https://dl.min.io/server/minio/release/linux-arm64/minio
- alternatively
GO111MODULE=on go install github.com/minio/minio@latest
which will compile and install to~/go/bin/minio
- see the official minio docs for more
- alternatively
wget https://dl.min.io/server/mc/release/linux-arm64/mc
chmod a+x minio mc
MINIO_ROOT_USER=minio MINIO_ROOT_PASSWORD=password /home/minio/minio server /mnt/obj1data --address 0.0.0.0:9000 --console-address 0.0.0.0:9001
- can be done as a script like
./start.sh
and run in a screen session
- can be done as a script like
- actually setup buckets, users, replication, etc
./mc alias set xyz http://0.0.0.0:9000 minio password
- for more info see the mc docs
./mc admin info minio
./mc admin user add minio user password
./mc mb minio/bucket
- grant access from user to bucket
vim userbucketpolicy.json
- put bucket name(s) in there./mc admin policy add minio BUCKET-policy userbucketpolicy.json
./mc admin policy set minio BUCKET-policy user=USER
./mc admin user info minio christest
./mc update && ./mc admin update xyz/
$ systemctl list-units | grep '/path/to/objectstore' | awk '{ print $1 }'
/etc/systemd/system/minio.service
:
[Unit]
Description=MinIO Object Storage Service
After=network-online.target objectstoremountpoint.mount
[Service]
ExecStart=/home/minio/start.sh
WorkingDirectory=/home/minio
User=minio
Group=minio
[Install]
WantedBy=multi-user.target
$ sudo systemctl start minio
$ sudo systemctl status minio
$ sudo systemctl enable minio
Some info on getting PostgreSQL running on Fedora:
- https://developer.fedoraproject.org/tech/database/postgresql/about.html
- https://docs.fedoraproject.org/en-US/quick-docs/postgresql/
/usr/share/doc/postgresql/README.rpm-dist
Initial steps to be performed as a sudoer:
sudo dnf install postgresql-server
sudo postgresql-setup --initdb
sudo systemctl start postgresql
sudo su -c "createuser -P USER" postgres # note you will be prompted to create a password
sudo su -c "createdb -O USER DB" postgres
Note also that modifying /var/lib/pgsql/data/pg_hba.conf
to include scram-sha-256
instead of ident
etc may be needed.
Following steps to be run as user who will run the API.
OBJIDX_SETTINGS=../samp.cfg python3 -m obj_idx.db_create
pg_dump --schema-only DB > schema.sql
The db_create.py
script will empty a database and create tables in the schema, and uses the same config file as the web app.
DEBUG = True
SQLALCHEMY_DATABASE_URI = 'postgresql:///objidx'
SQLALCHEMY_TRACK_MODIFICATIONS = False
OBJIDX_S3 = 'http://user:pass@localhost:9000/'
OBJIDX_BUCKETS = ['bucket1']
OBJIDX_S3
is a special URL for S3OBJIDX_BUCKETS
is a list of buckets that may be used.- The rest are standard Flask and sqlalchemy options
DEBUG = True
OBJIDX_URL="http://127.0.0.1:5000/" # change if running on a different host
OBJIDX_AUTH="user" # currently just username as no auth yet at API level, ideally pass thru in fut
Failed upload must be first cleared by PUT/PATCHing the object /object/<object-uuid>/
with {"deleted": true}
to signify that upload has stopped.