IBMStreams/administration

Proposal: streamsx.cassandra

Closed this issue · 9 comments

Proposal: streamsx.cassandra

streamsx.cassandra is a toolkit in active development (and production!) at The Weather Company.
It consists of an operator that writes Streams tuples to Cassandra.

The operator is a very thin Java facade for a Scala implementation. It's built using SBT.

Basic Capabilities

The operator is configured by specifying connection information in a ZooKeeper node.
Additionally, it provides mechanisms, also configurable in ZK, for writing values as NULLS.

Nearly all SPL types are supported, including sets, lists, and maps.

Null Value Mechanism

For a real-life example, say we have a tuple representing a report from a weather station:

stream<rstring stationID, uint64 timestamp, uint32 tempF, uint32 windspeedMPH, ......>

Old-school meteorological convention specifies that invalid observations are reported as -9999. When these observations are written in Cassandra, however, we don't want to keep them as -9999, we want to take advantage of Cassandra's ability to write nulls.
So if I specify in the JSON blob that I store in ZooKeeper:

{
  "tempF" : -9999
}

Any tuples that pass into the operator with the tempF value of -9999 will be written with a tempF value of NULL in Cassandra.

Licensing

The source is licensed under Apache V2.

Currently Supported Versions

  • Streams versions 4.0, 4.1
  • Cassandra versions 2.0, 2.1 (aka, Cassandra versions that use CQL 3.1)

Sample SPL Code

namespace com.weather.test;

composite CassandraTest {

    graph

        stream<rstring greeting, uint64 count, list<int32> testList, set<int32> testSet, map<int32, boolean> testMap, int32 nInt> Greeting = Beacon() {
            param
                iterations: 1000000u; //generate 1000000 tuples
                period : 0.5; //generate a tuple every 0.5 seconds
            output
                Greeting:
                    greeting =  "Hello Streams!",
                    count = IterationCount() + 1ul,
                    testList = [1,2,3],
                    testSet = {4, 5, 6},
                    testMap = {7: true, 8 : false, 9: true},
                    nInt = -2147483647;
        }

        () as CoolStuff = com.weather.streamsx.cassandra::CassandraSink(Greeting) {
            param
                connectionConfigZNode: "/cassandra_config";
                nullMapZnode: "/null_values";
        }
}

The znodes specify the connection to a dev Cassandra cluster and specifies that the null value for "nint" is -2147483647.

And here's a sample of the output, which I am pulling using CQL using a call that you should never ever use on a real table :)

cqlsh:testkeyspace> select * from testtable;

 count | greeting       | nint | testlist  | testmap                      | testset
-------+----------------+------+-----------+------------------------------+-----------
    19 | Hello Streams! | null | [1, 2, 3] | {7: True, 8: False, 9: True} | {4, 5, 6}
     2 | Hello Streams! | null | [1, 2, 3] | {7: True, 8: False, 9: True} | {4, 5, 6}
    24 | Hello Streams! | null | [1, 2, 3] | {7: True, 8: False, 9: True} | {4, 5, 6}
     3 | Hello Streams! | null | [1, 2, 3] | {7: True, 8: False, 9: True} | {4, 5, 6}
    35 | Hello Streams! | null | [1, 2, 3] | {7: True, 8: False, 9: True} | {4, 5, 6}
    30 | Hello Streams! | null | [1, 2, 3] | {7: True, 8: False, 9: True} | {4, 5, 6}
    16 | Hello Streams! | null | [1, 2, 3] | {7: True, 8: False, 9: True} | {4, 5, 6}
    ... etc etc

Future Work

  • Support for Streams 4.2 and Cassandra 3.x
  • Consistent Region support
  • Cassandra read operator

+1
FYI, for the configuration you currently store in Zookeeper we added secure application config in V4.2 that could be used for configuration. See details here http://www.ibm.com/support/knowledgecenter/SSCRJU_4.2.0/com.ibm.streams.dev.doc/doc/creating-secure-app-configs-dev.html

+1! very nice proposal btw!

+1

2016-10-07 0:04 GMT+03:00 Samantha Chan notifications@github.com:

+1! very nice proposal btw!


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#99 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AGvlA2CSLe0VJaqiVc0FQUgWkWdSW0tJks5qxWJLgaJpZM4KQRUL
.

Best regards,
Leonid Gorelik.

Repository creation underway.

Sweet deal, I see the repo that got created!
Two questions

  1. Am I good to go ahead and start pushing code?
  2. I've been doing releases for internal teams at TWC, so I'm currently on version 1.3.0. Can I keep that versioning here or do I need to start over? Either is fine, just don't want to step on toes :)

@ecurtin before you can push code, can you please sign this document:
https://github.com/IBMStreams/administration/blob/master/IBMStreams-cla-individual.pdf

Yes, I think you can keep the current version number.

Repository created and set up. @ecurtin and @cin are initial committers.