/starrocks

StarRocks is a next-gen sub-second MPP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics and ad-hoc query.

Primary LanguageC++OtherNOASSERTION

StarRocks

StarRocks is a new-generation and high-speed MPP database for nearly all data analytics scenarios. We wish to provide easy and rapid data analytics. Users can directly conduct high-speed data analytics in various scenarios without complicated data preprocessing. Query speed (especially multi-tables JOIN queries) far exceeds similar products because of our streamlined architecture, full vectorized engine, newly-designed Cost-Based Optimizer (CBO) and modern materialized views. We also support efficient real-time data analytics.

Moreover, StarRocks provides flexible and diverse data modeling, such as flat-tables, star schema, and snowflake schema. Compatible with MySQL protocols and standard SQL syntax, StarRocks can communicate smoothly across the MySQL ecosystem, for example, MySQL clients and common BI tools. It is an integrated data analytics platform that allows for high availability and simple maintenance and doesn’t rely on any other external components.

We recommend you read the Introduction to StarRocks first.

Architecture

StarRocks’s streamlined architecture is mainly composed of two modules, Frontend (FE for short) and Backend (BE for short), and doesn’t depend on any external components, which makes it easy to deploy and maintain. Meanwhile, the entire system eliminates single points of failure through seamless and horizontal scaling of FE and BE, as well as replication of meta-data and data.

Architecture of StarRocks

Technology

  • Native vectorized SQL engine: StarRocks adopts vectorization technology to make full use of the parallel computing power of CPU, achieving sub-second query returns in multi-dimensional analyses, which is 5 to 10 times faster than previous systems.
  • Simple architecture: StarRocks does not rely on any external systems. The simple architecture makes it easy to deploy, maintain and scale out. StarRocks also provides high availability, reliability, scalability and fault tolerance.
  • Standard SQL: StarRocks supports ANSI SQL syntax (fully supported TPC-H and TPC-DS). It is also compatible with the MySQL protocol. Various clients and BI software can be used to access StarRocks.
  • Smart query optimization: StarRocks can optimize complex queries through CBO (Cost Based Optimizer). With a better execution plan, the data analysis efficiency will be greatly improved.
  • Realtime update: The updated model of StarRocks can perform upsert/delete operations according to the primary key, and achieve efficient query while concurrent updates.
  • Intelligent materialized view: The materialized view of StarRocks can be automatically updated during the data import and automatically selected when the query is executed.
  • Convenient query federation: StarRocks allows direct access to data from Hive, MySQL and Elasticsearch without importing.

Use cases

StarRocks can provide satisfying performance in various data analytics scenarios, including multi-dimensional screening and analysis, real-time data analytics, ad hoc analysis. StarRocks also supports thousands of concurrent users. As a result, StarRocks is widely used by companies in business intelligence, real-time data warehouse, user profiling, dashboards, order analysis, operation, and monitoring analysis, anti-fraud, and risk control. At present, over 100 medium-sized and large enterprises in various industries have used StarRocks in their online production environment, including Airbnb, JD.com, Tencent, Trip.com and other well-known companies. There are thousands of StarRocks servers running stably in the production environment.

Upstream

Apache Doris(incubating) is the upstream of StarRocks. We are very grateful to Apache Doris(incubating) community for contributing such an excellent OLAP database.

StarRocks was developed based on version 0.13 of Apache Doris (incubating) released in early 2020. We have adopted the framework and columnar storage engine from Apache Doris(incubating), while added a full vectorized execution engine, CBO optimizer, real-time update engine, and other important features.

Of the approximately 700K lines of code currently in StarRocks, about 40% is identical to Apache Doris(incubating), which is still under the Apache 2.0 license, leaving 60% as additions or modification.

We will continue to contribute to Apache Doris(incubating) and help to build the open source ecosystem in the future. 

* Statistics from GitHub, September 2021

Build

Because of the thirdparty dependencies, we recommend building StarRocks with the development docker image we provide.

For detailed instructions, please refer to build.

Install

Download the current release here.
For detailed instructions, please refer to deploy.

Links

Community

LICENSE

Code in this repository is provided under the Elastic License 2.0. Some portions are available under open source licenses. Please see our FAQ.

Contributing to StarRocks

A big thanks for your attention to StarRocks! In order to accept your pull request, please follow the CONTRIBUTING.md.