How to run benchmark

Create Amazon S3 Bucket and EC2 instance with access to it via IAM
Create ElasiCache Redis Instance

To Start on Ubuntu 20.04 EC2

sudo apt update
sudo apt install python3-pip
git clone https://github.com/Loonride/s3-test-1.git
cd s3-test-1
pip3 install -r requirements.txt
export REDIS_HOST=...

python3 benchmark.py

To Retrieve results then plot

./scp_data.sh <EC2 ip>
python plot.py

Results on EC2

Using Redis cache.t3.medium, EC2 t2.small
100 runs each
Redis/S3 always emptied before each PUT-GET-DELETE sequence
Single Redis connection used, boto3 makes new API calls for each request
Uses random bytes of a fixed length

Redis 1000 Bytes Means: PUT 0.001s, GET 0.001s, DELETE 0.001s
Redis 10000 Bytes Means: PUT 0.001s, GET 0.001s, DELETE 0.001s
Redis 100000 Bytes Means: PUT 0.001s, GET 0.001s, DELETE 0.001s
Redis 1000000 Bytes Means: PUT 0.012s, GET 0.009s, DELETE 0.001s
Redis 10000000 Bytes Means: PUT 0.088s, GET 0.092s, DELETE 0.001s
Redis 100000000 Bytes Means: PUT 0.989s, GET 1.057s, DELETE 0.001s
S3 1000 Bytes Means: PUT 0.047s, GET 0.025s, DELETE 0.029s
S3 10000 Bytes Means: PUT 0.038s, GET 0.020s, DELETE 0.022s
S3 100000 Bytes Means: PUT 0.063s, GET 0.026s, DELETE 0.024s
S3 1000000 Bytes Means: PUT 0.096s, GET 0.040s, DELETE 0.023s
S3 10000000 Bytes Means: PUT 0.288s, GET 0.253s, DELETE 0.024s
S3 100000000 Bytes Means: PUT 1.645s, GET 1.810s, DELETE 0.025s

Results on External Endpoint (Kir's Laptop)

Important: these results are much more susceptible to network latency since the laptop communicating with S3 is outside of the AWS internal network, as any other FuncX endpoint would be

S3 1000 Bytes Means: PUT 0.115s, GET 0.053s, DELETE 0.057s
S3 10000 Bytes Means: PUT 0.105s, GET 0.055s, DELETE 0.056s
S3 100000 Bytes Means: PUT 0.134s, GET 0.062s, DELETE 0.059s
S3 1000000 Bytes Means: PUT 0.493s, GET 0.105s, DELETE 0.060s
S3 10000000 Bytes Means: PUT 3.655s, GET 0.462s, DELETE 0.056s
S3 100000000 Bytes Means: PUT 36.265s, GET 3.626s, DELETE 0.058s

Cost Structure Analysis

All prices here are based on AWS us-east-1 region on 10/6/21
Outbound data transfer costs are $0.09 per GB for both S3 and EC2. However, note that this cost could be removed if results were sent directly to a user-owned S3 instance.
Integrating S3 alongside Redis could probably allow us to reduce the size of the Redis cache, meaning the low cost of S3 is a small price to pay for savings from Redis.

S3

S3 charges $0.023 per GB per month, which is a trivial cost compared to Redis rates and the PUT/GET rates of S3.

The PUT rate is $0.005 per 1000 PUTs and GET rate is $0.0004 per 1000 GETs. DELETE is free.

Cost for 1 million 1MB (1,000 GB total) tasks

(Assume each task sat in S3 for ~1 week, 1 PUT/GET, with outbound transfer)

monthly cost = (S3 storage cost) + (S3 PUT/GET cost) + (outbound transfer)
monthly cost = 0.023 * 1000 * 0.25 + 0.0054 * 1000 + 0.09 * 1000
monthly cost = 5.75 + 5.4 + 90
monthly cost = $101.15
monthly cost w/o outbound transfer = $11.15

Redis

For the current cache.r5.xlarge instance we are using, on-demand price per hour is $0.431/hour

Cost for 1 million 1MB (1,000 GB total) tasks

monthly cost = (redis monthly cost) + (outbound transfer)
monthly cost = 0.431 * 730 + 0.09 * 1000
monthly cost = 314.63 + 90
monthly cost = $404.63
monthly cost w/o outbound transfer = $314.63

Conclusion

Redis has massive costs compared to S3 and is less scalable, so we should only be using Redis for small tasks that require fast PUT/GET operations. The benefit of keeping Redis for fast-cache storage is reducing PUT/GET overhead in performance-critical situations. S3 can add close to 0.1s of overhead for small tasks, while Redis adds <0.002s of overhead, which is a significant difference for small tasks that can take on average <0.5s to execute using the executor interface.

Storage/Caching Rules Proposal

Tasks <100KB should be placed in Redis, and all others should go straight to S3
If Redis is almost full, tasks should go straight to S3
One parameter might be that a user can only have x tasks sitting in Redis before their tasks are forced to go into S3
User types and options
- General users will have this scheme with strict limits on their usage of Redis
- high-performance paying users might be allowed less strict Redis usage (although we would not be able to guarantee this unless parameters are tuned perfectly to prevent Redis overflow)
- users could opt for S3-only if their workload is not performance-critical (The incentive for this would be longer time in S3 before eviction)
When the web service or WebSocket service queries a task, it will first check Redis, and if it is a cache miss it will go to S3
We should make S3 directly queryable by the user so that we don't have to copy from S3 then send from our services
- The only drawback here is requiring the boto3 client as part of the SDK
- This also requires ensuring the task is deleted after the user queries it via SDK, I imagine S3 has some kind of read-once permission to force this so that the user cannot repeatedly query it
Somewhat long eviction period for Redis assuming we tune parameters correctly (e.g. 1 week)
Much longer sitting period is allowed in S3 (e.g. 1 month before eviction)

Extra Notes

Make whisker plots to check extreme cases

It's costly to establish HTTPS connection every time, vs. Redis where you can create a single connection and use it the whole time

knagaitsev/s3-test-1

How to run benchmark

To Start on Ubuntu 20.04 EC2

To Retrieve results then plot

Results on EC2

Results on External Endpoint (Kir's Laptop)

Cost Structure Analysis

S3

Cost for 1 million 1MB (1,000 GB total) tasks

Redis

Cost for 1 million 1MB (1,000 GB total) tasks

Conclusion

Storage/Caching Rules Proposal

Extra Notes