S3 Object Lambda measurement

This repository contains the source of the microbenchmarks and use cases featured in the research paper On Data Processing Through the Lenses of S3 Object Lambda, from IEEE INFOCOM 2023.

Getting started

The benchmarks and use cases have been written and executed using Python 3.8 running on Ubuntu 20.04.

Setup

Clone this repository on your local machine:

git clone https://github.com/pablogs98/Object-Lambda-Benchmark

From the repository's root directory, install its Python dependencies:

pip3 install -r requirements.txt

Make sure that your AWS account and AWS CLI are correctly set up. More information available here.
Install any additional dependencies. For instance, PycURL has additional requirements, namely, libcurl.
Make sure PYTHONPATH points to the repository's root directory.

Deploying functions

Functions are automatically deployed when an example is executed. However, the deployment packages must be generated beforehand and located in the root directory of the microbenchmark/use case (or within a configurable, specified path). In the utils module, we provide scripts which take care of the generation of the deployment packages for Node.js and Python.

More information on Java function deployments here.

Datasets

The datasets used for experimentation are publicly available and can be downloaded in the following locations:

Use case	Dataset
Grep	GHTorrent
Parallel tree reduction (streaming pipelines)	HDFS logs

References

Pablo Gimeno Sarroca, Marc Sànchez-Artigas. On Data Processing Through the Lenses of S3 Object Lambda, in IEEE INFOCOM 2023.

Acknowledgements

This project has received funding from the European Union's Horizon Europe (HE) Research and Innovation Programme (RIA) under Grant Agreement No. 101092646 and No. 101092644.