Spark data source for Mark to Future cube binary files. It only supports reading data at the moment, but no writing. It is recommended to persist cube data in parquet format if required.
val spark = SparkSession.builder.master("local").getOrCreate
val df = spark.read.format("de.frosner.spark.mtf")
.option("csrFile", "src/test/resources/withxml/cube.csr")
.load("src/test/resources/withxml/cube.dat.0")
df.show()
+--------------+------------+--------------------+--------+-------------+-----+
| Time|BaseCurrency| Instrument|Currency| Scenario|Value|
+--------------+------------+--------------------+--------+-------------+-----+
|2000/01/01 (0)| EUR|[Instrument 1,Typ...| null|Base Scenario| 0.0|
|2000/01/01 (0)| EUR|[Instrument 1,Typ...| null| MC_1| 0.0|
|2000/01/01 (0)| EUR|[Instrument 1,Typ...| null| MC_2| 0.0|
|2000/01/01 (0)| EUR|[Instrument 2,Typ...| null|Base Scenario| 0.0|
|2000/01/01 (0)| EUR|[Instrument 2,Typ...| null| MC_1| 0.0|
|2000/01/01 (0)| EUR|[Instrument 2,Typ...| null| MC_2| 0.0|
|2000/01/01 (0)| EUR| null| CUR1|Base Scenario| 0.0|
|2000/01/01 (0)| EUR| null| CUR1| MC_1| 0.0|
|2000/01/01 (0)| EUR| null| CUR1| MC_2| 0.0|
|2000/01/01 (0)| EUR| null| CUR2|Base Scenario| 0.0|
|2000/01/01 (0)| EUR| null| CUR2| MC_1| 0.0|
|2000/01/01 (0)| EUR| null| CUR2| MC_2| 0.0|
|2000/01/01 (0)| EUR| null| CUR3|Base Scenario| 0.0|
|2000/01/01 (0)| EUR| null| CUR3| MC_1| 0.0|
|2000/01/01 (0)| EUR| null| CUR3| MC_2| 0.0|
+--------------+------------+--------------------+--------+-------------+-----+
Option | Description | Possible Values | Default |
---|---|---|---|
csrFile |
Path to the meta data XML file. The library will try to load it first from the driver and then from the executors. | local or cluster path | - |
checkCube |
Verify that the cube has the correct size | {true , false } |
false |
Option | Description | Possible Values | Default |
---|---|---|---|
numTimes * |
Number of time points simulated | int > 1 | - |
numInstruments * |
Number of instruments simulated | int > 1 | - |
numScenarios * |
Number of scenarios simulated | int > 1 | - |
endianType * |
Byte ordering in the data files | {LittleEndian , BigEndian } |
- |
valueType * |
Value type of the simulated values | {FloatType , DoubleType } |
- |
* only used and required if csrFile
is not specified
root
|-- Time: string (nullable = false)
|-- BaseCurrency: string (nullable = false)
|-- Instrument: struct (nullable = true)
| |-- ID: string (nullable = false)
| |-- InstrType: string (nullable = false)
| |-- Unit: string (nullable = false)
|-- Currency: string (nullable = true)
|-- Scenario: string (nullable = false)
|-- Value: float (nullable = false)
Column | Description |
---|---|
Time | Simulation time dimension of the cube |
BaseCurrency | Base currency of the simulation |
Instrument | Simulated instrument dimension if the simulatable dimension is an instrument, else null |
Currency | Simulated currency dimension if the simulatable dimension is a currency, else null |
Scenario | Scenario dimension of the cube |
Value | Simulated value |
Cube files consist of a meta data file (XML) and a data file (binary encoded sequence of numerics).
The meta data is read from an XML file. This file contains information about the different cube dimensions plus additional meta information about the simulation run.
Currently the following entries are parsed from the XML tree:
Node | Property |
---|---|
/resultInfo/prop(name=precision) |
value type |
/resultInfo/prop(name=endianType) |
byte ordering of values |
/timeDimensionInfo/timeList/timePoint |
time dimensions |
/simulatableDimensionInfo/SADescriptor |
instrument dimensions |
/scenarioDimensionInfo/scenarioList/scenarioInfo |
scenario dimensions |
An example file can be found in the test resources.
The data file is in binary format and encoded as a sequence of numerical values. It corresponds to a three dimensional cube (time t, instrument i, scenario s) containing values x which can be either float or double. The file consists of one binary record per time. Each time record consists of individual instrument records. Each instrument record contains the values for each scenario.
Given three instruments, two scenarios and two times, the structure looks as follows:
od -f cube.dat.0 | head
0000000 34234.83 53654.33 34234.45 66456.46
0000020 21312.945 53453.83 23233.32 66456.74
0000040 88888.945 11332.65 55552.22 31231.61
0000060 32984.69 76532.59 65400.29 43259.08
0000080 34234.83 53654.33 34234.45 66456.46
0000100 21312.945 53453.83 23233.32 66456.74
0000120 88888.945 11332.65 55552.22 31231.61
0000140 32984.69 76532.59 65400.29 43259.08