Go and Rust Benchmark

The project focus on utilize concurrent programming concepts of Go and Rust programming language on activities in Data Processing Cycles with PostgreSQL database as data storage.

Go and Rust Project
Table Of Contents
Introduction
Getting Started
- Documentation
List of Programs
- Phase 1
- Phase 2
Software Resources
About
- Contributor
- Status

Introduction

The project focus on implementation and utilization of Go and Rust programming language on data processing cycle with PostgreSQL database as data storage. These languages’ paradigm, characteristic and focus are used to build program in data preparation, data retrieval, data processing and data storage. The data process cycle is shown as below:

Big datasets with total of 5865567 rows are obtained from UK Government Website with data collection and verify with data validation to inspect the quality and logical weakness in data contents. The specification of data are shown as follow:

No	Name of Datasets	Rows	Columns	Size
01	Education	21	32707	4.2 MB
02	Company	55	3595702	1.8 GB
03	Postcode	35	1754882	667.5 MB

The raw datasets in CSV format will be backup and import into PostgreSQL database with data transformation. The defects discovered such as inconsistency, incorrect and duplication in large datasets are eliminated with data encoding, data normalization and data cleaning. Ultimately, the unnormalized and unorganized data will be migrated into normalized table as new storage with data migration to establish excellent relational database management system freed from anomalies.

Go and Rust programs are developed to support data processing activities such as data transformation, data cleaning and data migration. The processing execution’s performance of program developed from different concurrent programming language and programming style will be compared and discussed in detail.

PL/pgSQL scripts will be developed to create database entity’s data structure, objects, schemas and perform data migration within PostgreSQL database. The lightweight scripts will execute multiple written query simultaneously to perform database creation, manipulation and control eﬃciently.

The project successfully prove concurrent programming has better performance and throughput on data processing compare to sequential programming. Data duplication, data inconsistencies and data incompleteness had successfully eliminated to establish high data quality.

Getting Started

Documentation

The documentation contains:-

Introduction
Literature Review
Project Design
Implementation Methodology
Implementation Plan
Results and Findings
Discussion
Conclusion

Download the document HERE

List of programs

Phase 1

Phase 1 is conducted in one semester to demonstrate the proof-of-concept (POC) and prototype of the projects. The program files are stored in FYP-Phase1/src/FYP1 directory and tabulated in table below:

File Name	Description
import-csv-psql.go	Transform 300 rows of data in CSV from files into PostgreSQL database
sequential-psql.go	Retrieve 300 rows of data sequentially from different tables contain in PostgreSQL database
concurrent-psql.go	Retrieve 300 rows of data concurrently from different tables contain in PostgreSQL database
sequential-csv.go	Retrieve 300 rows of data sequentially from different raw CSV datasets
concurrent-csv.go	Retrieve 300 rows of data concurrently from different raw CSV datasets

Phase 2

Phase 2 is conducted in another semester to develop and implement the proposed idea of the final year project. The program files are stored in different repositories:

No	Repositories	Description
01	Data Encoding	The execution steps to perform text substitution line-by-line baed on the text patterns of regular expression provided in commands. Conversion of records or fields into specialized format.
02	Education Data Queries	The PL/pgSQL scripts are written for data transformation, normalization table creation and data migration of UK Education data with 32707 rows.
03	Postcode Data Queries	The PL/pgSQL scripts are written for data transformation, normalization table creation and data migration of UK Postcode data with 1754882 rows.
04	Company Data Queries	The PL/pgSQL scripts are written for data transformation, normalization table creation and data migration of UK Postcode data with 3595702 rows.
05	Go-Read-PSQL	The Go program retrieve 5865567 of data with average of 36 columns from three different tables in PostgreSQL database with sequential and concurrent execution.
06	Go-Read-CSV	The Go program retrieve 5865567 of data with average of 36 columns from three different raw CSV files with sequential and concurrent execution.
07	Rs-Read-PSQL	The Rust program retrieve 5865567 of data with average of 36 columns from three different tables in PostgreSQL database with sequential and concurrent execution.
08	Rs-Read-CSV	The Rust program retrieve 5865567 of data with average of 36 columns from three different raw CSV files with sequential and concurrent execution.
09	Go-Migrate-Postcode	The Go program migrate 1754882 rows from unnormalized postcode table to normalized table in PostgreSQL database.
10	Go-Migrate-Company	The Go program migrate 3595702 rows from unnormalized company table to normalized table in PostgreSQL database.

Software Resources

Linux Ubuntu 16.04.3 LTS 64-bit.
Golang language compiler 1.8.3.
Rust language compiler 1.20.0.
PostgreSQL database 9.5.8.
Eclipse for Parallel Application Developers Oxygen Release (4.7.0) IDE.
TeXstudio 2.10.8.
Visual Paradigm 14.1 free edition for non-commercial use.

CodesAreHonest/go-rust-benchmark

Go and Rust Benchmark

Table Of Contents