This is a collection of ansible code to setup seaweedfs which is nearly-functional. Various machine-specific values such as usernames, disk IDs, mountpoints, and so on have been left with placeholders which will need to be replaced with appropriate values for your particular system. You will likely need to fill in, copy-paste, comment out, and remove various blocks of YAML; since each machine has its own disk drives and usernames. It is expected that you as a system administrator will skim through to get a basic understanding of the actions performed.
The appropriate places to start skimming through are:
- the main playbook
main.pb.yml
- the main variables/config declaration
vars/vars.yml
Most of the code assumes you are using Fedora-like Linux, but some Debian-like equivalents are present, though likely not enough on their own for use on Debian-like Linux without some extra work to insert Debian-style alternative actions throughout the playbooks. Distribution-specific actions tend to have conditionals associated with them to restrict them to the appropriate distribution family.
In short, this is provided more as a starting point than a works-out-of-the-box solution. It's 95% of the way ready and you need to slap together the 5% specific to your machine with what's here.
Things that need doing first.
System packages on controller:
$ sudo dnf install ansible python3
Ansible packages on controller:
$ ansible-galaxy collection install community.general
$ ansible-galaxy collection install community.posix
$ ansible-galaxy collection install community.postgresql
$ ansible-galaxy install git+https://github.com/DevoInc/ansible-role-systemd-service.git
$ ansible-galaxy install 0x0i.systemd
How to run the playbooks.
- When running remotely:
-i .hosts.yml
- When controller is the target:
-i localhost.yml
For a fresh install, run the following stages in numerical order.
For lazy copypasta use.
#!/bin/bash
## lazy-deploy-everything.sh
cd ~/seaweedfs
git pull
sudo whoami # Cache sudo auth.
ansible-playbook -vv manual-mounts.pb.yml
ansible-playbook -vv main.pb.yml
ansible-playbook -vv manual-pgsql.pb.yml
- This is mainly for my own sanity.
Need certstrap to create certificates (If using certificates)
$ ansible-playbook -vv manual-binaries.pb.yml
Setting up disks is infrequently done and kind of risky, so it's a seperate step.
Setup disks (on target):
$ ansible-playbook -vv manual-mounts.pb.yml
-
Does not depend on other plays.
-
! Be careful when fucking with filesystems and such !
This is most setup other than the special tasks listed above.
Deploy (on target):
$ ansible-playbook -vv main.pb.yml
- Requires mounts so that dirs can be setup.
Setting up database stuff is infrequently done, so it's a seperate step.
Setup postgresql (on target):
$ ansible-playbook -vv manual-pgsql.pb.yml
- Requires weed user setup, weed dirs
Configuring seaweedfs.
You'll need to do at least some of this.
Check if you've already made keys:
ls -lah files/certs/
Keygen output will go into out/ If keys not generated already, run script to generate them:
$pwd
.../REPODIR/
$ chmod +x scripts/onetime-setup-keygen.sh
$ ./scripts/onetime-setup-keygen.sh
$ mkdir -vp files/certs/
$ mv out/* files/certs*
- Certstrap seems to dislike being given a name ending in a numeral.
Create security.toml template via weed scaffold:
$ weed scaffold -config=security 2> security.toml
To generate a 32 character hexadecimal string:
$ < /dev/urandom tr -dc a-f0-9 | head -c${1:-32};echo;
Example /etc/seaweedfs/security.toml
:
## /etc/seaweedfs/security.toml
# Put this file to one of the location, with descending priority
# ./security.toml
# $HOME/.seaweedfs/security.toml
# /etc/seaweedfs/security.toml
[jwt.signing]
key = "blahblahblahblah"
# all grpc tls authentications are mutual
[grpc]
ca = "/etc/seaweedfs/SeaweedFS_CA.crt"
[grpc.volume]
cert = "/etc/seaweedfs/volume.crt"
key = "/etc/seaweedfs/volume.key"
[grpc.master]
cert = "/etc/seaweedfs/master.crt"
key = "/etc/seaweedfs/master.key"
[grpc.filer]
cert = "/etc/seaweedfs/filer.crt"
key = "/etc/seaweedfs/filer.key"
[grpc.client]
cert = "/etc/seaweedfs/client.crt"
key = "/etc/seaweedfs/client.key"
-
Seaweedfs S3-host credentials and access are defined inside:
files/s3_conf.json
-
JSON has strict syntax.
-
Only applied when the
weed s3
process starts.
Fedora uses firewalld.
Example task located at tasks/firewalld.yml
TODO: Writeme
- Used to serve files to outside world while restricting requests.
- Allows seaweedfs to never have to see the outside internet.
- Use inventory file:
-i .hosts.yml
Generic invokation template:
$ ansible-playbook -vv -i .hosts.yml PLAYBOOK.pb.yml
Full syntax-check:
$ ansible-playbook -vv --syntax-check -i .hosts.yml main.pb.yml
Full check:
$ ansible-playbook -vv --check -i .hosts.yml main.pb.yml
Run testing shim playbook:
$ ansible-playbook -vv -i .hosts.yml test-mounts-tasklist.pb.yml
- Real host definition filename starts with a dot so it will be gitignored out to keep password secret.
- Sudo currently requires a password on host.
Validate connection works:
$ ansible-playbook -vv -i .hosts.yml validate-inventory.pb.yml
Overview of units (Does not contain all logs):
$ sudo systemctl status 'seaweedfs_*'
Logs for a unit:
$ sudo journalctl -xe -b -u seaweedfs_mount_filerroot
Follow as new messages appear:
$ journalctl -xe -b -u seaweedfs_filer_2.service -f
$ journalctl --boot --pager-end --follow --since='5 minutes ago' --unit=seaweedfs_filer_3.service
$ journalctl --boot --pager-end --follow --since='5 minutes ago' --unit=seaweedfs_volume_foo
$ journalctl --boot --pager-end --follow --since='5 minutes ago' --unit=seaweedfs_s3api_1.service
Show all seaweedfs units at once in journalctl:
## Make arguments for units:
$ systemctl --no-legend list-unit-files "seaweedf*" | cut -d' ' -f1 | xargs systemd-escape | awk '{print "-u"; print $0;}' RS=" " | tr '\n' ' '
## Give arguments to journalctl
$ journalctl --boot --pager-end --follow --since='5 minutes ago' $(systemctl --no-legend list-unit-files "seaweedf*" | cut -d' ' -f1 | xargs systemd-escape | awk '{print "-u"; print $0;}' RS=" " | tr '\n' ' ' )
Try restarting the service:
$ sudo systemctl restart postgresql.service
Fucking with privileges for authenticated users. Don't rely on these to be completely correct.
GRANT CONNECT ON DATABASE "seaweedfs-filer" TO "seaweedfs";
GRANT USAGE ON DATABASE "seaweedfs-filer" TO "seaweedfs";
GRANT USAGE ON SCHEMA public TO "seaweedfs";
GRANT SELECT ON DATABASE "seaweedfs-filer" TO "seaweedfs";
GRANT SELECT ON ALL TABLES IN SCHEMA public TO "seaweedfs";
GRANT SELECT ON ALL SEQUENCES IN SCHEMA public TO "seaweedfs";
- see https://medium.com/@swaplord/grant-read-only-access-to-postgresql-a-database-for-a-user-35c57897dd0b
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO "seaweedfs";
GRANT CONNECT ON DATABASE "seaweedfs-filer" to "seaweedfs";
GRANT USAGE ON SCHEMA public TO Postgres;
GRANT ALL PRIVILEGES ON DATABASE "seaweedfs-filer" TO "seaweedfs";
GRANT ALL ON "filemeta" TO "seaweedfs";
Config file for postgres user authentication. postgres may need to be restarted to apply changes to this. Usually located at "/var/lib/pgsql/data/pg_hba.conf"
To open weed shell:
$ weed shell
To specify specific master/filer:
$ weed shell -master=localhost:9333 -filer=localhost:8888
You can send commands to STDIN:
$ weed shell <<< "volume.list"
$ echo "volume.list" | weed shell
HEREDOCs work as normal:
weed shell <<EOF
volume.list
lock
volume.check.disk
volume.fix.replication
unlock
EOF
More complicated pipelines are possible:
#!/bin/bash
## weed-shell-complex-example.sh
## Demonstrate that weed shell can be given long scripts.
bucket_name="asagi"
bucket_replication="001"
echo "#[${0##*/}]" "Starting "at $(date -u +%Y-%m-%d_%H:%M:%s_%z)"
echo "#[${0##*/}]" "Creating bucket ${bucket_name} with replication ${bucket_replication}"
echo "#[${0##*/}]" "And fixing replication"
( tee "dbg.weed-shell.heredoc" \
| grep -v -e '^#' -e '^$' \
| tee "dbg.weed-shell.stdin" | tee /dev/tty \
| weed shell \
-master="localhost:9333" \
-filer="localhost:8888" \
| tee "dbg.weed-shell.stdout"
) <<EOF-weedshell
## This way you can have comments
#
## help command:
help lock
## -help argument:
lock -help
#
## Variables work too:
s3.bucket.create -name=${bucket_name} -replication ${bucket_replication}
#
#
## Ensure volumes are replicated properly
volume.list
lock
volume.check.disk
volume.fix.replication
unlock
volume.list
## Exit weed shell
exit
EOF-weedshell
echo "#[${0##*/}]" "Finished "at $(date -u +%Y-%m-%d_%H:%M:%s_%z)"
Two different ways to get help message, usually give different piecess of information.
> help COMMAND
> COMMAND -help
List volumes:
> volume.list
Creating a bucket:
> s3.bucket.create -name=asagi -replication 001
create bucket under /buckets
created bucket asagi
weed filer.copy
Usage: weed filer.copy file_or_dir1 [file_or_dir2 file_or_dir3] http://localhost:8888/path/to/a/folder/
Uploading a bunch of garbage:
## Dir to work in:
mkdir -vp "/home/ctrls/garbage/"
## Create garbage data:
## for n in {0..9}; do head -c "${size}" < /dev/urandom > "${junkdir}/${n}.${size}.junk" ; done; ## Random test
for n in {00..99}; do head -c "5M" < /dev/urandom > "/home/ctrls/garbage/${n}.5M.junk" ; done; ## Random test
## Disk device should spit out data faster than RNG.
head -c '100G' < /dev/sda > /home/ctrls/garbage/junkfile
## RAM dump should be super fast if we have privs to do that.
for n in {0..9}; do head -c "10G" < /dev/mem > "/home/ctrls/garbage/${n}.10G.junk" ; done;
## Bucket to throw garbage into:
weed shell <<< 's3.bucket.create -name=testing -replication 001'
## Upload the garbage
weed filer.copy /home/ctrls/garbage/ http://localhost:8888/buckets/testing/
$ weed filer.copy -replication=000 -check.size -debug /home/ctrls/garbage/ http://localhost:8888/buckets/testing/
Make more garbage files for testing:
for n in {00..99}; do head -c "5M" < /dev/urandom > "/home/ctrls/garbage/b${n}.5M.junk" ; done; ## Random test
$ for n in {000..999}; do head -c "5M" < /dev/urandom > "/home/ctrls/garbage/c${n}.5M.junk" ; done; ## Random test
Upload lots of uncompressible noise files via weed filer.copy:
$ mkdir -vp /home/ctrls/garbage/1M-a/
$ for n in {0000..9999}; do head -c "1M" < /dev/urandom > "/home/ctrls/garbage/1M-a/.1M.${n}.junk" ; done; ## Random test files.
$ time weed filer.copy -replication=001 -check.size -debug /home/ctrls/garbage/ http://localhost:8888/buckets/testing/
Create lots of uncompressible noise files in-place on weed mount:
$ mkdir -vp /weedmnt/filer-root/buckets/garbage/1M-a/
$ for n in {0000..9999}; do head -c "1M" < /dev/urandom > "/weedmnt/filer-root/buckets/garbage/1M-a/.1M.${n}.junk" ; done; ## Random test files.
Generate random garbage in 10M files:
$ mkdir -vp /weedmnt/filer-root/buckets/garbage/10M-a/
$ for n in {0000..9999}; do head -c "10M" < /dev/urandom > "/weedmnt/filer-root/buckets/garbage/10M-a/10M.${n}.junk" ; done; ## Random test files.
Stat every file in every bucket:
$ date;time du -hd0 /weedmnt/filer-root/buckets/
Automatic backup of the filer can be done via
tasks/filer-meta-bkup-multidest.yml
templates/backup-filer-multidest.cronjob.sh.j2
Filer is saved to a temporary file, then that is copied to backup dirs.
Old backups are renamed to keep most recent 3 backups.
I figure a good place to put these is the HDDs the volumeservers use, since the point is largely to prevent any single disk failing from breaking anything.
vars.yml
filermeta_backup_tempdir: '/var/lib/seaweedfs/'
filermeta_backup_destdirs:
- '/media/myhdd_1/seaweedfs-filer-backup'
- '/media/myhdd_2/seaweedfs-filer-backup'
To run the cronjob as the seaweedfs user:
$ sudo -u seaweedfs /var/lib/seaweedfs/scripts/backup-filer-multidest.cronjob.sh
To restore ownership if you forget to run as seaweedfs:
$ sudo chown -vR 420420:420420 /media/*/seaweedfs-filer-backup/
- Remember to add your rcone config file at
files/rclone.conf
so ansible can use it.
To adjust options for running rclone command:
## This will only work if the rclone job was started with the '--rc' argument!
$ rclone rc core/bwlimit rate=1M
{
"bytesPerSecond": 1048576,
"rate": "1M"
}
To do basic validation without having to actually run a deployment:
$ ansible-playbook --check main.pb.yml --skip-tags 'precautions,download'
Tag: precautions
is protective filer backup etc done at playbook run time.
Tag: download
is downloading seaweedfs and certstrap binaries from github.