esProc is the unique name for esProc SPL package. esProc SPL is an open-source programming language for data processing, which can perform computing independently. For latest package and release notes, see Download esProc Community Edition Package.
SPL focuses on the mainstream embedded and Java application architecture. SPL script is the counterpart of the stored procedure in RDB. A SPL script will be passed to a Java program through JDBC interface to be executed or to achieve the structured computation.
-
Combined the advantages of Java, Beyond SQL
Comparison of SQL & SPL: Set-oriented Operations
Comparison of SQL & SPL: Select Operation
Comparison of SQL & SPL: Order-based Computations
Comparison of SQL & SPL: Equi-grouping
Comparison of SQL & SPL: Non-equi-grouping
Comparison of SQL & SPL: Order-based Computations
Comparison of SQL & SPL: Join Operations (Ⅰ)
Comparison of SQL & SPL: Join Operations (Ⅱ)
Comparison of SQL & SPL: Join Operations (Ⅲ)
Comparison of SQL & SPL: Static Transposition
Comparison of SQL & SPL: Complicated Static Transposition
-
Well designed rich library functions and consistency syntax,Easier to master and better performance than Python.
Example: Find out the sales clerks whose sales are within top 8 for every moth in 1995.
Python:
import pandas as pd sale_file = ‘E:\\txt\\SalesRecord.txt’ sale_info = pd.read_csv(sale_file,sep = ‘\t’) sale_info[‘month’]=pd.to_datetime(sale_info[‘sale_date’]).dt.month sale_group = sale_info.groupby(by=[‘clerk_name’,‘month’],as_index=False).sum() sale_group_month = sale_group.groupby(by=‘month’) set_name = set(sale_info[‘clerk_name’]) for index,sale_g_m in sale_group_month: sale_g_m = sale_g_m.sort_values(by=‘sale_amt’,ascending = False) sale_g_max_8 = sale_g_m.iloc[:8] sale_g_max_8_name = sale_g_max_8[‘clerk_name’] set_name = set_name.intersection(set(sale_g_max_8_name)) print(set_name)
SPL:
+ A 1 E:\txt\SalesRecord.txt 2 =file(A1).import@t() 3 =A2.groups(clerk_name:name,month(sale_date):month;sum(sale_amt):amount) 4 =A3.group(month) 5 =A4.(~.sort(-amount).to(8)) 6 =A5.isect(~.(name)) -
Seamless integration into Java applications
For more details, see Call SPL Script in Java.
For other integrations, see Call SPL in applications.
-
Got SQL
SQL has certain computing power, but it is not available in many scenarios, so you will have to hard code in Java. SPL provides lightweight computing power independent of database and can process data in any scenario:
-
Structured text (txt/csv) calculation Ref. [1] [2] [3] [4] [5]
-
Java computing class library, surpass Stream/Kotlin/Scala Ref. [1] [2]
-
SQL-like calculation on Mongodb, association calculation Ref. [1] [2] [3]
-
Post calculation of Salesforce, Post calculation of SAP Ref. [1] [2]
-
Post calculation of various data sources: HBase,Cassandra,Redis,ElasticSearch,Kafka,… Ref. [1]
-
-
Beyond SQL
SQL is difficult to deal with complex sets and ordered operations, and it is often read out and calculated in Java. SPL has complete set capability, especially supports ordered and step-by-step calculation, which can simplify these operations:
-
Cooperate DB
The computing power of the database is closed and cannot process data outside the database. It is often necessary to perform ETL to import data into the same database before processing.
SPL provides open and simple computing power, which can directly read multiple databases, realize mixed data calculation, and assist the database to do better calculation.
-
Fetch data in parallel to accelerate JDBC Ref. [1]
-
SQL migration among different types of databases Ref. [1]
-
Cross database operations Ref. [1]
-
T+0 statistics and query Ref. [1]
-
Replace stored procedure operation, improve code portability and reduce coupling Ref. [1]
-
Avoid making ETL into ELT or even LET
-
Reduce intermediate tables in the database
-
Report data source development, support hot switching, multiple data sources and improve development efficiency Ref. [1] [2] [3]
-
Implement microservices, occupy less resources and support hot switching Ref. [1] [2]
-
-
Surpass DB
SQL is difficult to implement high-performance algorithms. The performance of big data operations can only rely on the optimization engine of the database, but it is often unreliable in complex situations.
SPL provides a large number of basic high-performance algorithms (many of which are pioneered in the industry) and efficient storage formats. Under the same hardware environment, it can obtain much better computing performance than the database, and can comprehensively replace the big data platform and data warehouse.
-
In-memory search:binary search, sequence number positioning, position index, hash index, multi-layer sequence number positioning Ref. [1]
-
Dataset in external storage:parallel computing of text file, binary storage, double increment segmentation, columnar storage composite table, ordered storage and update
-
Search in external storage:binary search, hash index, sorting index, row-based storage and valued index, index preloading, batch search and set search, multi index merging, full-text searching Ref. [1]
-
Traversing technique:post filter of cursor, multi-purpose traversal, parallel traversing and multi cursors, aggregation extension, ordered traversing, program cursor, partially ordered grouping and sorting, sequence number grouping and controllable segmentation Ref. [1]
-
Association technique: foreign key addressing, foreign key serialization, index reuse, alignment sequence, large dimension table search, unilateral splitting, orderly merging, association positioning, schedule Ref. [1]
-
Multidimensional analysis:pre summary and time period pre summary, alignment sequence, tag bit dimension Ref. [1]
-
Distributed:free computing and data distribution, cluster multi-zone composite table, cluster dimension table, redundant fault tolerance, spare tire fault tolerance, Fork-Reduce, multi job load balancing
-
-
For Excel
The combination of SPL and Excel can enhance the calculation ability of Excel and reduce the difficulty of calculation implementation. Ref. [1]
Through SPL's Excel plug-in, you can use SPL functions in Excel, and you can also call SPL scripts in VBA. Ref. [1]
SPL provides Excel-oriented set operations:
-
Cell value and summary value calculation Ref. [1]
-
Set operation and subordinate judgment Ref. [1]
-
Duplication judgment, count and deduplication Ref. [1]
-
Sorting and ranking Ref. [1]
-
Special grouping and aggregate methods Ref. [1]
-
Association and comparison Ref. [1]
-
Row-column transpose Ref. [1]
-
Expansion and supplement Ref. [1]
-
-
For Industry
There are a large number of time series data in industrial scenarios, and databases often only provide SQL. The ordered calculation capability of SQL is very weak, resulting in that it can only be used for data retrieval and cannot assist in calculation.
Many basic mathematical operations are often involved in industrial scenarios. SQL lacks these functions and the data can only be read out to process.
SPL can well support ordered calculation, and provides rich mathematical functions, such as matrix and fitting, and can more conveniently meet the calculation requirements of industrial scenes.
-
Time series cursor: aggregation by granularity, translation, adjacence reference, association and merging
-
Historical data compression and solidification, transparent reference
-
Vector and matrix operations
-
Various linear fitting: least squares, partial least squares, Lasso, ridge …
-
…
Industrial algorithms often need repeated experiments. SPL development efficiency is very high, and you can try more within the same time period:
-
Instrument anomaly discovery algorithm
-
Abnormal measurement sample locating
-
Curve lifting and oscillation pattern recognition
-
Constrained linear fitting
-
Pipeline transmission scheduling algorithm
-
…
-
-
Tutorial esProc download, installation, as well as principles and applications
-
Function Reference esProc syntax, applications and examples
-
Sample Program Guide to all functions under menus in esProc
-
Code Reference esProc grid-style code examples
-
User Reference esProc programming by examples
-
External Library Guide Deployment of and connection to esProc external libraries
-
esProc Official WebSite: http://www.scudata.com
-
Please head to http://c.raqsoft.com/article/1595817756260 to download esProc executable files
-
More detail materials can be found at http://c.raqsoft.com
esProc is under the Apache 2.0 license. See the LICENSE file for details.