links
Just a bunch of useful links
Scala
-
Scala Design Patterns - great stuff, how you do (or don't) traditional Java / OOP patterns in Scala
-
The Human Side of Scala - great post on styling Scala for readability
-
Sneaking Scala Through the Back Door - how to promote Scala in an organization
-
Effective Scala - Twitter's guide to writing good Scala code
-
Between Zero & Hero - tips and tricks for the intermediate Scala developer
-
Scala School 2 - Twitter's next generation interactive scala tutorial
-
Type of Types - an unfinished tutorial on the Scala type system
-
Monads are not Metaphors - a great explanation of monads
-
Important compiler flags
-
Recursive Types - signatures like
class Foo[T <: Foo[T]]
, useful for inheritance and proper return types. Tho if you hit this, there are probably better ways of solving the problem, ie via composition.
Serialization
- Simple Binary Encoding - supposedly 20-50x faster than Google Protobuf !!
- Comparison of Cap'n Proto, SBE, FlatBuffers from the Cap'n Proto people
- Jawn - @d3's new fast JSON parser, parses to multiple ASTs including rojoma-json, spray-json, argonaut
- Extracting case class param names using Macros
- Fast-Serialization - a drop in replacement for Java Serialization but much faster
- Akka's ByteString class - immutable rope class for fast byte additions
Concurrency, Actors
-
CKite - Raft Scala implementation, Finagle, MapDB etc.
-
SafeFuture CancellableFuture etc - very useful
-
Execute Futures serially - in nonblocking fashion
-
Scala.Rx - "Reactive variables" - smart variables who auto-update themselves when the values they depend on change
-
Monifu - a nice set of wrappers around j.u.c.Atomic*, as well as super-lightweight cancellable tasks and futures utilities. Accompanying blog post.
-
CEP using Akka Streams - great example of using Akka's new Streams for distributed stream processing with backpressure
-
akka instrumentation - an experiment to walk the actor tree and see stuff at runtime
- rxmon - Akka monitoring via RxJava
-
Actor Provisioning pattern - if you have a long, failure-prone initialization procedure for an actor, this trait splits out the work, to say another actor and dispatcher
-
Running an Akka cluster with Docker Containers
-
Ask, Tell, and Per-Request Actors - why one company moved from Ask/Futures to per-request
Async Database Libs
- Asyncpools - Akka-based async connection pool for Slick. Akka 2.2 / Scala 2.10.
- Postgresql-Async - Netty-based async drivers for PostgreSQL and MySQL
Caching
- Cacheable - a clever memoization / caching library (with Guava, Redis, Memcached or EHCache backends) using Scala 2.10 macros to remember function parameters
Big Data Processing
-
Great list of Big Data Projects
-
Debasish G's list of streaming papers and algorithms - esp stuff on CountMinSketch and HyperLogLog
-
Summingbird - For any dataset that can be aggregated using a monoid, promises to unify Storm, Hadoop, and in the future, Akka and Spark with a single DSL. Also has a neat library of monoids built in.
-
Making Zookeeper Resilient, an excellent blog post from Pinterest
-
Probability Monad - super useful for stats or random data generation
-
stringmetric - Approximate string matching and phonetic algorithms
-
Factorie - a Scala library for Natural Language Processing
Spark
- Jaws - Spark SQL REST server, includes query cancellation, logs, load balancing.
Geospatial and Graph
-
GeoTrellis - distributed raster processing, adding Vector/geom support, Akka Cluster and Spark implementations!
-
Spatial framework for Hadoop - PostGIS-like operators / UDFs for Hive. We want this for Spark!
-
trails - parser combinators for graph traversal. Supports Tinker/Blueprints/Neo4j APIs.
-
scala-graph - in-memory graph API based on scala collections. Work in progress.
Collections, Numeric Processing, Fast Loops
- Breeze, Spire, and Saddle - Scala numeric libraries
- spire-ops - a set of macros for no-overhead implicit operator enrichment
- ScalaXY - collection of macros for performant for loops, extension methods etc
- Squants - The Scala API for Quantities, Units of Measure and Dimensional Analysis
- FastTuple - a dynamic (runtime-defined) C-style struct library, with support for off-heap storage. Would work really well for in-memory queries.
- and the excellent blog covers all of the on- and off-heap access and allocation patterns on the JVM very thoroughly.
- Unboxing, Runtime Specialization - a cool post on how to do really fast aggregations using unboxed integers
- product-collections - useful library for working with collections of tuples
- SuperFastHash - also see Murmur3
Big Data Storage
- Phantom - Scala DSL for Cassandra, supports CQL3 collections, CQL generation from data models, async API based on Datastax driver
- Athena - Asynchronous Cassandra client built on Akka-IO
- Stubbed Cassandra - super useful for testing C* apps
- Pithos - an S3-API-compatible object store for Cassandra
- Sirius - Akka-based in-memory fast key-value store for JVM objects, with Paxos consistency, persistence/txn logs, HA recovery
- Storehaus - Twitter's key-value wrapper around Redis, MySql, and other stores. Has a neat merge() functionality for aggregation of values, lists, etc.
- MapDB - Not a database, but rather a database engine with tunable consistency / ACIDness; support for off-heap memory; fast performance; indexing and other features.
- HPaste - a nice Scala client for HBase
Web / REST / General
-
Scalaj-http - really simple REST API. Although, the latest Spray-client has been vastly simplified as well.
-
REPL as a service - would be kick ass if integrated into Spark
-
IScala - Scala backend for IPython. Looks promising. There is also Scala Notebook but it's more of a research project.
-
Scaposer - i18n / .po file library
-
Adding Reflection to Scala Macros - example of using reflection in an annotation macro to add automatic ByteBuffer serialization to case classes :)
-
Scaldi - A lightweight dependency injection library, with Akka integration
-
How to use Typesafe Config across multiple environments
-
Scala-rainbow - super simple terminal color output, easier than Console.XXX
-
SExt - Supplies some missing Standard Library functions, like pretty-printing data structures, unfold, etc.
-
ScalaUtils - ===, !== with tolerance for floats, an OR operator for types for easy validation (
Int Or One[ErrorMessage]
)
Build, Tooling
-
Run Scala scripts with dependencies - ie you don't need a project file
-
sbt-assembly 0.10.2 supports adding a shell script to your jar to make it executable! No more "java ...." to start your Scala program, and no more
ps ax | grep java | grep ....
-
Other useful SBT plugins - sbt-sonatype, sbt-pom-reader, sbt-sound, plugins page
-
SCoverage - statement coverage tool, much more useful than line-based or branch-based tools. Has SBT plugin. Blog post on why it's an improvement.
-
sbt-jmh - Plugin for running SBT projects with the JMH profiling tool
-
SBT Shell Prompt with Git and project name :) (SBT 0.13 only)
-
SBT updates - Tool for discovering updated versions of SBT dependencies
-
Thyme and Parsley - microbenchmarking and profiling tools, seems useful
-
ScalaStyle - Scala style checker / linter
-
Linter - Scala linter compiler plugin
-
utest - a small micro test framework
-
lions share - a neat JVM heap and GC analysis tool, with charts and SBT integration.
SBuild seems like a promising replacement for SBT. Still Scala, but much much simpler, more like Scala version of Make. With MVN dependency and ScalaTest support.
JVM Other
- Quick dumping your JVM heap using GDB -- too bad it doesn't work on OSX.
- jHiccup -- "Hiccup" or GC pause analysis tool
- Bintray - friendlier alternative to Sonatype OSS / Maven central. Also see bintray-sbt plugin.
Databases
Indexing and OLAP
- Adaptive Radix Trees - cache friendly indexing for in-memory databases
- Quotient Cubes - semantic grouping and rollup algorithm for OLAP cubes. Ruby implementation.
- Top K queries and cubes
- Scalable In-memory Aggregation - column-oriented, in memory with bitmap indexing and memoization
ML and Data Science
- LearnDS - A set of IPython notebooks for learning data science
Distributed Systems
- Raft Visualization - great 5-min visualization of the distributed consensus protocol
Sublime Text
I love Sublime and use it for everything, even Scala! Going to put my Sublime stuff in a separate page.
Best Practices and Design
- Semver - Semantic versioning, how to deal with dev workflows and corner cases -- a must read
- Pragmatic RESTful API Design - really good stuff
- Blameless Post-Mortems - why they are crucial to good culture
- GitHub Flow - how github.com does continuous deploys, uses pull requests for an automated, process-free development workflow. Some gems include naming branches descriptively and using github.com to browse the work currently in progress by looking at active branches.
- Pull Requests and other good Github Practices
Other Random Stuff
-
JQ - JSON processor for the shell. Super useful with RESTful servers.
-
Underscore-CLI - a Node-JS based command line JSON parser
-
MacroPy - Scala-like macros, case classes, pattern matching, parser combos for Python (!!)
-
Scala 2.11 vs Swift - Apple's new iOS language is often compared to Scala.
-
Gherkin - a Lisp implemented in bash !!
-
Nimrod - a neat, compile-straight-to-binary, static systems language with beautiful Python-like syntax, union types, generics, macros, first-class functions. What Go should have been.
-
Bret Victor - A set of excellent essays and talks from a great visual designer