Ecclesiastical Latin IPA: /ˈʃi.o/, [ˈʃiː.o], [ˈʃi.i̯o]
Verb: I can, know, understand, have knowledge.
Scio is a Scala API for Google Cloud Dataflow inspired by Spark and Scalding. See the current API documentation for more information.
- Scala API close to that of Spark and Scalding core APIs
- Fully managed service*
- Unified batch and streaming programming model*
- Integration with Google Cloud products: Cloud Storage, BigQuery, Pub/Sub, Datastore, Bigtable*
- HDFS source/sink
- Interactive mode with Scio REPL
- Type safe BigQuery
- Integration with Algebird and Breeze
- Pipeline orchestration with Scala Futures
- Distributed cache
* provided by Google Cloud Dataflow
The ubiquitous word count example can be run directly with SBT in local mode, using README.md
as input.
sbt "project scio-examples" "run-main com.spotify.scio.examples.WordCount --input=README.md --output=wc"
cat wc/part-00000-of-00001.txt
- Scio Wiki - wiki page
- ScalaDocs - current API documentation
- Scio REPL - tutorial for the interactive Scio REPL
- Scio, Spark and Scalding - comparison of these frameworks
- Type safe BigQuery - tutorial for the type safe BigQuery API
- HDFS - using Scio with HDFS files
Scio includes the following artifacts:
scio-core
: core libraryscio-test
: test utilities, add to your project as a "test" dependencyscio-bigquery
: Add-on for BigQuery, included inscio-core
but can also be used standalonescio-bigtable
: Add-on for Bigtablescio-extra
: Extra utilities for working with collections, Breeze, etc.scio-hdfs
: Add-on for HDFS
Copyright 2016 Spotify AB.
Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0