/awesome-greenplum

open source greenplum resource list

GNU General Public License v3.0GPL-3.0

awesome-greenplum

https://upload.wikimedia.org/wikipedia/commons/9/97/Greenplumlogotype.jpg

Greenplum is the most advanced open source MPP big data platform. This repo list the compatible open source utilities and extensions. No commercial, no ads. Feel free to create issues and PRs if you know there is anything missing.

Why another repo? The old one is on longer maintained and out-of-date.

This repo is under GPL v3.

Content

Official Component

As the largest Greenplum contributer, Pivotal has a lot of open source projects for Greenplum. This category lists open sources Greenplum components from Pivotal. Most of them are bundled in Pivotal Greenplum distribution.

  • gpdb - Greenplum Database itself
  • ORCA - ORCA is the planner optimizer for both Greenplum and Postgres. It has very good performance for complex queries with partition tables.
  • MADlib - MADlib is the machine learning and deep learning libraries for both Greenplum and Postgres.
  • PXF - PXF is an extensible framework that allows a distributed database like GPDB to query external data files.
  • gpbackup - GPBackup is the official “GPDB Backup Utility”
  • gpupgrade - Greenplum Database major version upgrade utility called gpupgrade
  • pl/container - PL/Container is GPDB execution sandboxing for Python and R
  • diskquota - PostgreSQL disk quota extension

Performance test

  • tpc-h - Traditional OLAP test.
  • tpc-ds - Data warehouse test.
  • pgbench - Bundled TPC-B performance tools for Postgres and Greenplum.
  • sysbench - Performance test utility for postgres, mysql, CUP, memory, IO, etc.
  • sysbench-tpcc - Sysbench scripts to generate a tpcc-like workload for MySQL and PostgreSQL

Client tool

  • psql - Bundled official client tool
  • pgcli - pgcli is Postgres CLI with autocompletion and syntax highlighting
  • pgweb - web based cross-platform client for PostgreSQL databases
  • dbeaver - Free universal database tool and GUI SQL client in Java
  • pgadmin 4 - pgAdmin 4 is a rewrite of the popular pgAdmin3 management tool for PostgreSQL. It is is written as a web application in Python, using jQuery and Bootstrap for the client side processing and UI
  • adminer - Database management in a single PHP file
  • pgmodeler - PostgreSQL Database Modeler - is an open source data modeling tool designed for PostgreSQL
  • phppgadmin - the premier web-based administration tool for postgresql
  • usql - Universal command-line interface for SQL databases.

Driver

Most drivers of Postgres can work well with Greenplum.

  • psqlODBC - psqlODBC is the official PostgreSQL ODBC Driver
  • pgjdbc - pgjdbc is an open source JDBC driver written in Pure Java (Type 4), and communicates in the PostgreSQL native network protocol.
  • pg - Pure Go Postgres driver for database/sql.
  • pgx - Another pure Go driver and toolkit for PostgreSQL.
  • psycopg 2 - Psycopg is the most popular PostgreSQL adapter for the Python programming language. At its core it fully implements the Python DB API 2.0 specifications.
  • asyncpg - A fast PostgreSQL Database Client Library for Python/asyncio.
  • aiopg - aiopg is a library for accessing a PostgreSQL database from the asyncio.
  • queries - Queries is a BSD licensed opinionated wrapper of the psycopg2 library for interacting with PostgreSQL.

Connection Pool

  • pgpool2 - Pgpool-II is a middleware that works between PostgreSQL servers and a PostgreSQL database client that support connection pool, replication and load balance, etc.
  • pgbouncer - lightweight and fast connection pooler for PostgreSQL based on libevent.
  • pgbouncer-rr - A patch of pgbouncer that support query routing and rewrite. It can only be used on AWS because of its license.
  • odyssey - Scalable PostgreSQL connection pooler from yandex

Utility

Data flow

This section list all ETL(extract-transform-load) and CDC (changed-data capture) tools that support Greenplum as source and target.

as target

  • gpfdist - Bundled super fast data loading utility of Greenplum.
  • outsourcer - Outsourcer automates the tasks typically done manually by ETL developers to source data from SQL Server and Oracle and load into Greenplum database.
  • gplink - GPLink links JDBC connections to Greenplum External Tables.
  • kettle - Pentaho Data Integration ( ETL ) a.k.a Kettle

as source

TO BO FINISHED

Procedural Language

Greenplum ship pl/tcl, pl/perl, pl/pgsql and pl/python together with its source code. Below are the available procedural languages.

  • pljava - This is version of PL/Java is modified by Greenplum. It is based on pl/java 1.5.0.
  • pl/sh - PL/sh is a procedural language handler for PostgreSQL that allows you to write stored procedures in a shell of your choice.
  • plgo - easily create postgresql extensions in golang
  • pllua-ng - Re-implementation of pllua, embedded Lua for postgresql
  • pllua - PL/Lua is an implementation of Lua as a loadable procedural language for PostgreSQL

Extension

Bundled

  • contrib - all support postgres extensions in Greenplum source code.
  • gpcontrib - Greenplum special extensions.

Index

Postgres Ecosystem

Most postgres extensions can work on Greenplum so long as it is not MPP sensitive. It can run on master node just as what it does for Postgres by default.

Some components come from awesome-postgres.

fdw of pg

  • fdw list - this is the fdw extension list on Postgres wiki site

etl of pg

  • pgmigrate - Simple tool to evolve PostgreSQL schema easily.
  • pg_chameleon - MySQL to PostgreSQL replica system
  • flyway - Data base migration tools. Support a lot of kinds of databases.
  • ora2pg - Ora2Pg is a free tool used to migrate an Oracle database to a PostgreSQL compatible schema.
  • pgclimb - Export data from PostgreSQL into different data formats.
  • pgloader - pgloader is a data loading tool for PostgreSQL, using the COPY command.
  • pgsync - Sync Postgres data between databases
  • pg_bulkload - High speed data loading utility for PostgreSQL
  • pg_migrate - Manage postgres schema, triggers, procedures, and views.
  • pgfutter - Import CSV and JSON into PostgreSQL the easy way
  • mysql-postgresql-converter - Lanyrd’s MySQL to PostgreSQL conversion script.
  • pgddl - DDL eXtractor functions for PostgreSQL (ddlx)
  • migrate - Database migrations written in Go. Use as CLI or import as library.
  • psql-streamer - Stream database events from PostgreSQL to Kafka

moniter of pg

  • pg_activity - pg_activity is a top like application for PostgreSQL server activity monitoring.
  • pgwatch2 - PostgreSQL metrics monitor/dashboard
  • pg_view - Get a detailed, real-time view of your PostgreSQL database and system metrics
  • check_postgres - Nagios check_postgres plugin for checking status of PostgreSQL databases.
  • check_pgactivity - Nagios remote agent
  • postgresql-metrics - Tool that extracts and provides metrics on your PostgreSQL database.
  • libzbxpgsql - Monitor PostgreSQL with Zabbix
  • netdata - full feature and real-time performance monitoring including postgres
  • pgcenter - Command-line admin tool for observing and troubleshooting Postgres.
  • pganalyze - pganalyze statistics collector for gathering PostgreSQL metrics and log data
  • pginsight - CLI tool to easily dig deep inside your Postgresql database.
  • pg_insights - Convenient SQL for monitoring Postgres database health.
  • postgresql-metrics - Tool that extracts and provides metrics on your PostgreSQL database.
  • postgres-checkup - Postgres Health Check and SQL Performance Analysis.
  • pg_web_stats - Web UI to view pg_stat_statements

development of pg

  • plpgsql_check - plpgsql_check is next generation of plpgsql_lint. It allows to check source code by explicit call plpgsql_check_function.
  • pgtap - PostgreSQL Unit Testing Suite
  • sqlcheck - Automatically identify anti-patterns in SQL queries
  • pg-formatter - A PostgreSQL SQL syntax beautifier.
  • postgrest - REST API for any Postgres database
  • prest - pREST is a way to serve a RESTful API from any databases written in Go

utility of pg

  • pgcmp - Tool for comparing Postgres database schemas
  • sqitch - Sane database change management
  • pgbadger - Fast PostgreSQL Log Analyzer.
  • apgdiff - Another PostgreSQL Diff Tool
  • pgxnclient - A command line client for the PostgreSQL Extension Network.
  • config_log - PostgreSQL custom background worker (BGW) for monitoring configuration log changes
  • postgresql_anonymizer - postgresql_anonymizer is an extension to mask or replace personally identifiable information (PII) or commercially sensitive data from a PostgreSQL database.
  • orafce - The “orafce” project implements of some functions from the Oracle database.

backup of pg

barman - BaRMan - Backup and Recovery Manager for PostgreSQL

audit of pg

  • pgMemento - Audit trail with schema versioning for PostgreSQL using transaction-based logging
  • Pyrseas - Provides utilities for Postgres database schema versioning.
  • cyanaudit - Cyan Audit is a PostgreSQL utility providing comprehensive and easily-searchable logs of DML (INSERT/UPDATE/DELETE) activity in your database.
  • pgaudit - PostgreSQL Audit Extension
  • postgresql-audit - Audit trigger for PostgreSQL
  • e-maj - logs and rollbacks table updates
  • pg_permission - A simple set of views to see ALL permissions in a PostgreSQL database
  • temporal_tables - A temporal table is a table that records the period of time when a row is valid

enhancement of pg

TO BE FINISHED

Community