Move ENRAM repository to new bucket and directory structure
Closed this issue · 4 comments
peterdesmet commented
See this post to flatten file structure
- Check if a radar year contains unique files only:
aws s3 ls lw-enram/be/jab/2020/ --recursive | awk '{print $4}' | xargs -I {} basename {} | uniq -d
# Should result in 0 files
- Move files:
aws s3 ls lw-enram/be/jab/2020/ --recursive | awk '{print $4}' | xargs -I {} sh -c 'aws s3 cp s3://lw-enram/{} s3://enram-vp/baltrad/hdf5/bejab/2020/$(basename {}) --dryrun'
More elaborate example using variables (that currently returns an error):
aws s3 ls lw-enram/be/jab/2020/ --recursive | awk '{print $4}' | xargs -I % sh -c 'year=$(echo % | cut -d'/' -f 3);file=$(basename %);aws s3 cp s3://lw-enram/% s3://lw-enram/baltrad/h5/$year/$file --dryrun'
flyway files:
- year: 2016
- move to
ecog-04003
baltrad files:
- year: 2017-2022
- move to
baltrad
peterdesmet commented
@niconoe Would be good if I can define the country year as variables. My attempt only works for the source path, not the destination path:
country="be"
radar="jab"
year="2020"
aws s3 ls lw-enram/$country/$radar/$year/
peterdesmet commented
Pseudo code for copying files:
source_bucket = "s3://lw-enram"
dest_bucket = "s3://aloft"
for path in source_bucket:
# Example source path: "s3://lw-enram/be/jab/2020/02/05/00/bejab_vp_20200205T004000Z_0x9.h5"
# Parse path
radar = dir1 & dir2 # bejab
year = dir3 # 2020
month = dir4 # 02
day = dir5 # 05
file = basename # bejab_vp_20200205T004000Z_0x9.h5
file_ext = extension # h5
# Set source
if year = 2016:
source = "ecog-04003"
else:
source = "baltrad"
# Copy file
if file_ext != "h5"
skip
if file exists at destination:
skip
else:
copy file to {dest_bucket}/{source}/hdf5/{radar}/{year}/{month}/{day}/{file}
# Example dest path: "s3://aloft/baltrad/hdf5/bejab/2020/02/05/bejab_vp_20200205T004000Z_0x9.h5"
niconoe commented
Update: implementation in progress (simple Python scripts, just requires the boto3
package).
peterdesmet commented