Why learn q ?

fast (faster than Python or R) and big data ready. Why?
- prefers vectors over loops: vector-based ops are very fast (SIMD, SSE, AVX512 …)
- prefers binary over text
- prefers mmap over stdio read/write
- code lives right next to the data, so no transmission cost
- kdb+ is columnar DBMS, so optimal use of locality during full-table scans
- in-memory when real-time; historical on disk stored column-wise and sequential read (in 4,096 blocks)
- terse, efficient, Turing Award winning notation with strong theoretical foundation
- compact data structures optimised to use L1 and L2 caches
- parallelisation built-in natively (functional programming)
- clever optimisations and reuse of just ~50 primitives
- column attributes optimise qsql queries
well-established, growing, well-paid, well-integrated with other tech

Myth debunked: I have to admit that q is convenient for manipulating datasets but q doesn’t have those built-packages like Python. q can’t do complicated statistical analysis like Python.

q is Turing complete, general purpose, functional style programming language, and you can do anything in q that you could do in Python, only much simpler and faster. On the other hand, Python doesn’t have everything q does: q comes with seamless built-in database. Python has to interface (slow) with another database, often translating queries to SQL and incurring cost due to object-relational impendance mismatch. That’s both runtime and developer time bottleneck.

You can naturally do machine learning in q, for example check this complete machine learning library written in just 1000 lines, where verbose languages would consume multiple times more. Or serve websites, for example using WebSockets.

q reads like ASCII dump, is too terse, write-only and hard to be productive in.

By relieving the brain of all unnecessary work, a good notation sets it free to concentrate on more advanced problems, and in effect increases the mental power of the race. – A.N. Whitehead

The quantity of meaning compressed into small space by algebraic signs, is another circumstance that facilitates the reasonings we are accustomed to carry on by their aid. – Charles Babbage

At Harvard, Kenneth E. Iverson devised a mathematical notation for his work with economist Wassily Leontieff. Leontieff’s work won him the Nobel Prize. Iverson’s notation won him the Turing Award. It is the foundation of the programming languages APL, A+, J, k, and q. If Python is a bustling bazaar, q is a cathedral. Unlike Python, q is rooted in a strong theoretical foundation.

why is it so fast

By far the biggest selling point for kdb+ is its speed. Kdb+ is a column oriented database, which means that, unlike most databases where rows of data are stored together, data in kdb + is stored by column. Each column is stored in contiguous memory, both in-process and on-disk, allowing computations across columns to be performed with astonishing speed.

When the results of a vector operation are then used as the inputs of the next operation, all the data stored in the CPU’s memory cache can be accessed immediately without needing to search in the slower/ larger caches or even the slowest and largest memory store located on the motherboard.

In addition, modern CPUs provide custom interfaces to accelerate vector processing. Kdb + takes full advantage of these optimizations to achieve optimal performance.

Saving data in columns instead of rows also allows each column to be mapped in and out of memory when needed, thus reducing the need for all data to be loaded simultaneously.

All updates in kdb + are performed in a single thread. This removes the need for any resource locking, and thus provides another speed enhancement.

forget OOP, why functional

OOP: Trillion dollar disaster, Object-relational impendance mismatch

Alan Kay: I’m sorry that I long ago coined the term “objects” for this topic because it gets many people to focus on the lesser idea. The big idea is “messaging”

Alan Kay’s big idea was to have independent programs (cells) communicate by sending messages to each other. The state of the independent programs would never be shared with the outside world (encapsulation). That’s it. OOP was never intended to have things like inheritance, polymorphism, the “new” keyword, and the myriad of design patterns.

Dijkstra: “our intellectual powers are rather geared to master static relations and that our powers to visualize processes evolving in time are relatively poorly developed.”

Torvalds: “I’m a huge proponent of designing your code around the data, rather than the other way around.” “Bad programmers worry about the code. Good programmers worry about data structures and their relationships.”

Joe Armstrong (Erlang): The problem with object-oriented languages is they’ve got all this implicit environment that they carry around with them. You wanted a banana but what you got was a gorilla holding the banana and the entire jungle.

encourages promiscuous sharing of mutable state
non-deterministic: 2+2 or calculator.Add(2,2): dependencies of Calculator might change the result in subtle ways
complexity, with numerous design patterns
no theoretical foundation came out of proper research institution, like lambda calculus
human brains evolved to do things, not organize world into complex hierarchies of abstract objects

Where is it useful?

finance
blockchain - trade processing platform Cobalt DL: trade processing plaform
any industry with real-time analytics or big data needs (recently IoT, manufacturing, retail, space, …)

What’s q/kdb+ ?

functional, array, dynamic, static, turing complete; in-memory database with disk persistence; high level abstractions: C->k->q/qsql->kdb+; 350kb interpreter runtime, REPL, interactive

no need for messaging middleware (Tibco et al.)

Today almost all the trading happens electronically. If your requirement is not to loose any order or execution received from either client or exchange, Tibco EMS caters to this requirement by:

providing durable topic which holds the data until every subscriber consumes it

guaranteeing data will not get lost during network transmission

q/kdb+ pub/sub architecture can satisfy both requirements by logging any topically organised messages to disk and any subscriber can replay (consume) these persisted messages if it ever crashes. Data loss over network downstream of publisher is remedied by publisher having output queues on the socket and subscriber reading off that queue.

Use cases

data analysis (OLAP, prototyping):
- machine learning in q , ml in 1000 lines
streaming and CEP (atomic OLTP):
- tick architecture
batch big data: (batch OLTP/OLAP)):
- fastest CPU database, 1.1bn taxi rides

key takeaways

use FP, versatile, productive time, impactful, worthwhile