Just a bunch of useful links. BTW see rust links as well.
-
Scala Design Patterns - great stuff, how you do (or don't) traditional Java / OOP patterns in Scala
-
The Human Side of Scala - great post on styling Scala for readability
-
Sneaking Scala Through the Back Door - how to promote Scala in an organization
-
Effective Scala - Twitter's guide to writing good Scala code
-
SBT - a declarative DSL - an excellent guide to SBT tasks and settings
-
Between Zero & Hero - tips and tricks for the intermediate Scala developer
-
Better Type Classes - also see one of first links for good intro to type classes
-
Type classes and generic derivation - How to avoid common boilerplate for type classes and case classes using Shapeless HLists
-
Type of Types - an unfinished tutorial on the Scala type system
-
Monads are not Metaphors - a great explanation of monads
-
Selfless Trait Pattern - allow users to either mix in a Trait or import an Object.
-
Scalacaster - classic data structures in Scala
-
Important compiler flags
-
Recursive Types - signatures like
class Foo[T <: Foo[T]]
, useful for inheritance and proper return types. Tho if you hit this, there are probably better ways of solving the problem, ie via composition. -
Preprocessor - combination of different Scala Types like Phantom Types, Recursive Types, Self Types to make pipeline of computation in typesafe manner
- Simple Binary Encoding - supposedly 20-50x faster than Google Protobuf !!
- Comparison of Cap'n Proto, SBE, FlatBuffers from the Cap'n Proto people
- Cap'n Proto native layout uses 64-bit words, relies on separate packing/unpacking to achieve efficient wire representation. Has RPC (but not for Java). Bitset support. Java is third party support.
- Flatbuffers is from Google. 32-bit word size, more compact native representation, native Java support.
- Both Cap'n Proto and Flatbuffers allows random access of lists, whereas SBE is really only for streaming access
- Using Unsafe for C-like memory access speeds - a great guide. Many Unsafe operations turn into Java intrinsics - which translate to direct machine code
- Scala-offheap - fast, safe off heap objects
- FastTuple - a dynamic (runtime-defined) C-style struct library, with support for off-heap storage. Only works for primitives right now :(
- and the excellent blog covers all of the on- and off-heap access and allocation patterns on the JVM very thoroughly.
- ObjectLayout - efficient struct-within-array data structures
- jvm-unsafe-utils - @rxin of Spark/Shark fame library for working with Unsafe.
- Agrona and blog post - a ByteBuffer wrapper, off-heap, with atomic / thread-safe update operations. Good for building off heap data structures.
- Sidney - an experimental columnar nested struct serializer, with Parquet-like repetition counts
- OHC - Java off-heap cache
- Boon ByteBuf and the JavaDoc - a very easy to use, auto-growable ByteBuffer replacement, good for efficient IO
- Jawn - @d6's new fast JSON parser, parses to multiple ASTs including rojoma-json, spray-json, argonaut
- Grisu-scala - much faster double to string conversion
- Extracting case class param names using Macros
- Fast-Serialization - a drop in replacement for Java Serialization but much faster
-
Retry for futures. Also, SafeFuture CancellableFuture etc - very useful
-
Throttling Scala Futures - using a custom executor
-
Futiles - really useful set of utilities for working with and sequencing Futures, converting between Try, timeouts, etc.
-
Scala.Rx - "Reactive variables" - smart variables who auto-update themselves when the values they depend on change
-
Monifu - a nice set of wrappers around j.u.c.Atomic, as well as super-lightweight cancellable tasks and futures utilities. Accompanying blog post.
-
Colossus - an extremely fast, NIO and Akka-based microservice framework. Read their blog post.
-
Socko and Xitrum - Two very fast web frameworks built on Akka and Netty
-
Kamon - great looking Actor monitoring using bytecode weaving? no code change required.
-
akka-tracing - A distributed tracing Akka extension based on Twitter's Zipkin, which can be used as performance diagnostics and debugging tool. Supports Spray!
-
DI in Akka - great guide to using MacWire with Akka for DI
-
Akka Cluster Inventory extension - very useful. All the other blog posts in the series are also excellent reads.
-
Akka ZK cluster seed - another Akka extension to automatically register seed nodes with ZK
-
Akka Data Replication - replicated low-latency in memory datastore built using Akka cluster and CRDTs
-
Actor Provisioning pattern - if you have a long, failure-prone initialization procedure for an actor, this trait splits out the work, to say another actor and dispatcher
-
Reactive Visualization for Akka streams!!
-
Running an Akka cluster with Docker Containers
-
Why Async - An excellent overview of async architecture from Async I/O all the way up to application layer.
-
Ask, Tell, and Per-Request Actors - why one company moved from Ask/Futures to per-request
-
Dos and Donts deploying Akka in Production - an excellent read, full of advice even for non-Akka JVM apps
-
CKite - Raft Scala implementation, Finagle, MapDB etc.
-
Dirigiste - dynamic scalable / smarter Threadpools
-
Scala-gopher - a #golang-style CSP / channels implementation for Scala. Other niceties: defer()
- Akka Streams Extensions - helpers, connectors with PostGres, and more.
- Reactive Kafka
- Zoom - reactive programming with ZK, in Scala using ReactiveX
-
Asyncpools - Akka-based async connection pool for Slick. Akka 2.2 / Scala 2.10.
-
Postgresql-Async - Netty-based async drivers for PostgreSQL and MySQL
-
Relate - a very lightweight, fast Scala wrapper on top of JDBC
- Cacheable - a clever memoization / caching library (with Guava, Redis, Memcached or EHCache backends) using Scala 2.10 macros to remember function parameters
-
Great list of Big Data Projects
-
List of Database Papers
-
List of free big data sources - includes some Socrata datasets, climate data, data from Google, tweets, etc.
-
Debasish G's list of streaming papers and algorithms - esp stuff on CountMinSketch and HyperLogLog
-
Cubert - CUBE operator + fast "cost-based" block storage on Hadoop / Tez/ Spark
-
Kylin - OLAP CUBEs from HIVE tables, includes query layer
-
Aesop - a scalable pub-sub / change propagation system, esp between different datastores, with reliability. Based on LinkedIn DataBus, suports pull or push producers.
-
Making Zookeeper Resilient, an excellent blog post from Pinterest
-
ImpalaToGo - run Cloudera Impala directly on S3 files without HDFS!
-
Calcite - new Apache project, offers ANSI SQL syntax over regular files and other input sources
-
redash.io - data visualization / collaboration. TODO: integrate this with Spark SQL / Hive...
-
Fast SQL Query Parser in Scala - based on the Scala-LMS project, compiles a query down to C!
-
Probability Monad - super useful for stats or random data generation
-
stringmetric - Approximate string matching and phonetic algorithms
-
Factorie - a Scala library for Natural Language Processing based on factor graphs
- spark-jobserver - REST Job Server for Spark jobs; low-latency query server
- docker-spark to easily deploy a Spark cluster
- Andy's Spark Notebook
- Magellan - Geospatial analytics on Spark
- Kafka Spark Consumer - a low-level consumer which avoids the data loss issues with the high level consumer built into Spark Streaming
- Tuning Spark Streaming for throughput
- Supplemental Spark Projects - lots of other interesting projects, including IPython notebooks, dataframe stuff, stream + historical data processing, and more.
-
GeoTrellis - distributed raster processing on Spark. Also see GeoMesa - distributed vector database + feature filtering
-
ApertureTiles - system using Spark to generate a tile pyramid for interactive analytical geo exploration
-
Twofishes - Foursquare's Scala-based coarse forward and reverse geocoder
-
trails - parser combinators for graph traversal. Supports Tinker/Blueprints/Neo4j APIs.
-
scala-graph - in-memory graph API based on scala collections. Work in progress.
- Breeze, Spire, and Saddle - Scala numeric libraries
- spire-ops - a set of macros for no-overhead implicit operator enrichment
- Framian - a new data frame implementation from the authors of Spire
- Scala DataTable - An immutable, updatable table with heterogenous types of columns. Easily add columns or rows, and have easy Scala collection APIs for iteration.
- ScalaXY - collection of macros for performant for loops, extension methods etc
- Squants - The Scala API for Quantities, Units of Measure and Dimensional Analysis
- An immutable priority map for Scala
- Unboxing, Runtime Specialization - a cool post on how to do really fast aggregations using unboxed integers
- product-collections - useful library for working with collections of tuples. Also, great strongly-typed CSV parser.
- SuperFastHash - also see Murmur3
-
Phantom - Scala DSL for Cassandra, supports CQL3 collections, CQL generation from data models, async API based on Datastax driver
-
Athena - Asynchronous Cassandra client built on Akka-IO
-
CCM - easily build local Cassandra clusters for testing!
-
SSTableAttachedSecondaryIndex - Improved Cassandra 2i, OR and many other enhancements. Requires modified C* build.
-
Stubbed Cassandra - super useful for testing C* apps
-
Pithos - an S3-API-compatible object store for Cassandra
-
Doradus - A Graph / OLAP store on top of Cassandra
-
Khronus - Time series DB built on Cassandra + Akka Cluster
-
Stratio-Cassandra - a fork with Lucene full-text search and CQL support (see the blog). Also see Stargate.
-
Sirius - Akka-based in-memory fast key-value store for JVM objects, with Paxos consistency, persistence/txn logs, HA recovery
-
CurioDB - distributed persistent Redis built on Akka cluster, etc. :)
-
Ivory - An immutable, versioned, RDF-triple / fact store for feature extraction / machine learning
-
Hibari - ordered key-value store using chain replicaton for strong consistency
-
Storehaus - Twitter's key-value wrapper around Redis, MySql, and other stores. Has a neat merge() functionality for aggregation of values, lists, etc.
-
ArDB - like Redis, but with spatial indexes, and pluggable storage engines
-
MapDB - Not a database, but rather a database engine with tunable consistency / ACIDness; support for off-heap memory; fast performance; indexing and other features.
-
HPaste - a nice Scala client for HBase
-
OctopusDB paper - interesting idea of using a WAL of RDF triples as the primary storage, with secondary views of row or column orientation
- An excellent talk on Akka Cluster and distributed systems from Jonas Boner, including summary of lots of distributed systems theory
-
Scalaj-http - really simple REST API. Although, the latest Spray-client has been vastly simplified as well.
-
Quick Start to Twitter Finagle - though one should really look into Finatra
-
REPL as a service - would be kick ass if integrated into Spark
-
Enumeratum - a Scala Enum library, much better than built in Enumeration
-
Ammonite - Scala DSL for easy BASH-like filesystem operations
-
IScala - Scala backend for IPython. Looks promising. There is also Scala Notebook but it's more of a research project.
-
Scaposer - i18n / .po file library
-
Adding Reflection to Scala Macros - example of using reflection in an annotation macro to add automatic ByteBuffer serialization to case classes :)
-
Scaldi - A lightweight dependency injection library, with Akka integration
-
Knobs - Scala config library with reactive change detection, env var substitution, can read from Typesafe Config/HOCON, ZK, AWS
-
How to use Typesafe Config across multiple environments
-
lamma.io - the easiest date generation library
-
Pimpathon - a set of useful pimp-my-library extensions
-
Scala-rainbow - super simple terminal color output, easier than Console.XXX
-
Run Scala scripts with dependencies - ie you don't need a project file
-
sbt-assembly 0.10.2 supports adding a shell script to your jar to make it executable! No more "java ...." to start your Scala program, and no more
ps ax | grep java | grep ....
-
acyclic - a Compiler plugin to detect cylical dependencies between source files. Eliminate them for faster builds!
-
Other useful SBT plugins - sbt-sonatype, sbt-pom-reader, sbt-sound, plugins page
-
SCoverage - statement coverage tool, much more useful than line-based or branch-based tools. Has SBT plugin. Blog post on why it's an improvement.
-
sbt-jmh - Plugin for running SBT projects with the JMH microbench profiling tool
-
Comcast - a tool to inject network latency, and less-severe issues
-
Adaptive microbenchmarking of big data - really neat JVM agent which allows turning benchmarking code on and off for better benchmarking
-
SBT updates - Tool for discovering updated versions of SBT dependencies
-
Twitter Iago - Perf load test tool based on replaying logs. Compare vs Gatling for example.
-
Thyme and Parsley - microbenchmarking and profiling tools, seems useful
-
ScalaStyle - Scala style checker / linter
-
Towards a Safer Scala - great talk/slides on tools for Scala linting and static analysis
-
utest - a small micro test framework
-
lions share - a neat JVM heap and GC analysis tool, with charts and SBT integration.
SBuild seems like a promising replacement for SBT. Still Scala, but much much simpler, more like Scala version of Make. With MVN dependency and ScalaTest support.
- Swiss Java Knife - super handy collection of JVM tools. Try
java -jar sjk.jar ttop -p PID -o CPU -n 10
for regular reporting of the top 10 threads by CPU usage! - -XX:+PerfDisableSharedMem
- Al's Guide to Cassandra 2.1 Ops - awesome, not just for C* but tools in general
- Al Tobey's flags for running JDK8 apps. Note: G1GC! Also no need for
MaxPermSize
anymore:-Xmx8G -Xms8G -Xss256k -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:InitiatingHeapOccupancyPercent=0
- Tuning Spark apps for GC - excellent write-up from Intel
- Perils of writing isolating classloaders - Good read, tips on how to write a classloader that can isolate and load different versions of the same classes
- Quick dumping your JVM heap using GDB -- too bad it doesn't work on OSX.
- Start a JMX agent in running JVM:
jcmd <pid> ManagementAgent.start jmxremote.port=26010 jmxremote.ssl=false jmxremote.authenticate=false
- HeapAudit - A Java agent for lightweight production heap profiling
- Lion's Share - tools for memory analysis, outputs Google Charts compatible output
- jHiccup -- "Hiccup" or GC pause analysis tool
- Bintray - friendlier alternative to Sonatype OSS / Maven central. Also see bintray-sbt plugin.
- Changing JVM flags live - such as enabling GC logging without restarting JVM. Cool!
- Keywhiz - a store for infrastructure secrets
- Ranwhen - Visualize when your system was running, graphs in Terminal
- HTrace - distributed tracing library, can dump data to Zipkin or HBase
- cass_top - simple top utility for cass clusters
- Grafana and Graphene - great replacement UIs for the clunky default Graphite UI
- Elastic Mesos - create Mesos clusters on AWS with ZK, HDFS
- Clustering Graphite - in depth look at how to scale out Graphite clusters
- Adaptive Radix Trees - cache friendly indexing for in-memory databases
- Nanocubes - Fast visualization of large spatiotemporal datasets. Amazing stuff. Paper and Github repo.
- Quotient Cubes - semantic grouping and rollup algorithm for OLAP cubes. Ruby implementation.
- Top K queries and cubes
- Scalable In-memory Aggregation - column-oriented, in memory with bitmap indexing and memoization
- LearnDS - A set of IPython notebooks for learning data science
- Machine Learning for developers
- Achieving Great Response Times in Distributed Systems - an excellent talk on how the 99%-tile latency can kill, and techniques to tame it
- Raft Visualization - great 5-min visualization of the distributed consensus protocol
I love Sublime and use it for everything, even Scala! Going to put my Sublime stuff in a separate page.
- Semver - Semantic versioning, how to deal with dev workflows and corner cases -- a must read
- Pragmatic RESTful API Design - really good stuff
- Blameless Post-Mortems - why they are crucial to good culture
- How to Pair with Jr Devs - really good advice. Make them type. Listen and be on the same level.
- GitHub Flow - how github.com does continuous deploys, uses pull requests for an automated, process-free development workflow. Some gems include naming branches descriptively and using github.com to browse the work currently in progress by looking at active branches.
- Pull Requests and other good Github Practices
-
Awesome public datasets - no doubt some are Socrata sites!
-
Mermaid - think of it as Markdown for diagrams. Would be awesome to integrate this into reveal.js!
-
Markdeep - Markdown++ with diagrams, add single line at bottom to convert to HTML!
-
How To Be a Great Developer - a reminder to be empathetic, humble, and make lives around us better. Awesome list.
-
JQ - JSON processor for the shell. Super useful with RESTful servers.
-
Underscore-CLI - a Node-JS based command line JSON parser
-
MacroPy - Scala-like macros, case classes, pattern matching, parser combos for Python (!!)
-
Scala 2.11 vs Swift - Apple's new iOS language is often compared to Scala.
-
Gherkin - a Lisp implemented in bash !!
-
Nimrod - a neat, compile-straight-to-binary, static systems language with beautiful Python-like syntax, union types, generics, macros, first-class functions. What Go should have been.
-
Bret Victor - A set of excellent essays and talks from a great visual designer
becoz it's so darn painful.
- On OSX: make sure setUID bit is not set on dtrace:
sudo chmod -s /usr/sbin/dtrace
(see this Homebrew issue) - Try chruby and ruby-install instead of rbenv. Installs rubies into /opt/rubies and lighter weight, also there is a fish shell chruby-fish.