Pinned Repositories
spark-dashboard
Spark-Dashboard is a solution for monitoring Apache Spark jobs. This repository provides the tooling and configuration for deploying an Apache Spark Performance Dashboard using containers technology.
SparkDLTrigger
Code and links to the data for the article "Machine Learning Pipelines with Modern Big DataTools for High Energy Physics"
SparkPlugins
Code and examples of how to write and deploy Apache Spark Plugins. Spark plugins allow runnig custom code on the executors as they are initialized. This also allows extending the Spark metrics systems with user-provided monitoring probes.
SparkTraining
Material for the course "Introduction to Apache Spark APIs for Data Processing" https://sparktraining.web.cern.ch/
Linux_tracing_scripts
Scripts and tools for troubleshooting and performance analysis in Linux. This includes dynamic tracing scripts with SystemTap both for system calls and for userspace function tracing.
Miscellaneous
Includes notes on using Apache Spark in general, notes on using Spark for Physics, how to run TPCDS on PySpark, how to create histograms with Spark, tools for performance testing CPUs, Jupyter notebooks examples for Spark, examples for Oracle and other DB systems.
Oracle_DBA_scripts
A collection of old-school CLI scripts for Oracle RDBMS monitoring and performance troubleshooting.
PerfSheet4
PerfSheet4 is a tool for performance troubleshooting of Oracle databases. Query and visualize Oracle AWR data using pivot charts.
PyLatencyMap
PyLatencyMap is a tool for heat map visualization on the CLI. It is integrated with scrips to collect and visualize I/O latency heat maps from various sources, including SystemTap, DTrace, Oracle wait events, NetApp filers, trace files.
sparkMeasure
This is the development repository for sparkMeasure, a tool and library designed for efficient analysis and troubleshooting of Apache Spark jobs. It focuses on easing the collection and examination of Spark metrics, making it a practical choice for both developers and data engineers.
LucaCanali's Repositories
LucaCanali/sparkMeasure
This is the development repository for sparkMeasure, a tool and library designed for efficient analysis and troubleshooting of Apache Spark jobs. It focuses on easing the collection and examination of Spark metrics, making it a practical choice for both developers and data engineers.
LucaCanali/Miscellaneous
Includes notes on using Apache Spark in general, notes on using Spark for Physics, how to run TPCDS on PySpark, how to create histograms with Spark, tools for performance testing CPUs, Jupyter notebooks examples for Spark, examples for Oracle and other DB systems.
LucaCanali/Linux_tracing_scripts
Scripts and tools for troubleshooting and performance analysis in Linux. This includes dynamic tracing scripts with SystemTap both for system calls and for userspace function tracing.
LucaCanali/Oracle_DBA_scripts
A collection of old-school CLI scripts for Oracle RDBMS monitoring and performance troubleshooting.
LucaCanali/PyLatencyMap
PyLatencyMap is a tool for heat map visualization on the CLI. It is integrated with scrips to collect and visualize I/O latency heat maps from various sources, including SystemTap, DTrace, Oracle wait events, NetApp filers, trace files.
LucaCanali/PerfSheet4
PerfSheet4 is a tool for performance troubleshooting of Oracle databases. Query and visualize Oracle AWR data using pivot charts.
LucaCanali/Stack_Profiling
Tools and scripts for stack profiling: Userspace, Kernel, OS state and optionally Oracle wait
LucaCanali/PerfSheet.js
PerfSheet.js is a tool for Oracle RDBMS performance troubleshooting. Use it to extract and visualize Oracle AWR time series data in the browser using JavaScript and dynamic pivot charts.
LucaCanali/ipython-sql
%%sql magic for IPython, hopefully evolving into full SQL client
LucaCanali/OraLatencyMap
OraLatencyMap is a performance widget running on SQL*plus (Oracle's CLI) to collect and visualize latency histograms for Oracle wait events using heat maps.
LucaCanali/spark-sql-perf
LucaCanali/hadoop
Fork of Apache Hadoop, used to work on S3A and HDFS instrumentation
LucaCanali/bcc
BCC - Tools for BPF-based Linux IO analysis, networking, monitoring, and more
LucaCanali/dist-keras
Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.
LucaCanali/gallery
A set of examples for CERN SWAN a Service for Web based ANalysis
LucaCanali/hbase-connectors
Apache HBase Connectors
LucaCanali/jupyter-extensions
Jupyter extensions for SWAN
LucaCanali/jupyterhub-extensions
Customized components of the Jupyterhub server in SWAN (handlers, spawners, templates).
LucaCanali/oci-hdfs-connector
HDFS Connector for Oracle Cloud Infrastructure
LucaCanali/SLOB_2.5.4
Official SLOB distribution for version 2.5.4.0
LucaCanali/SLOB_distribution
A Git repository used only for distributing the official SLOB release.
LucaCanali/spark
Mirror of Apache Spark
LucaCanali/spark-root
Apache Spark Data Source for ROOT File Format
LucaCanali/SparkDLTrigger
Notebooks with code and sample data for the blog article: "Machine Learning Pipelines for High Energy Physics Using Apache Spark with BigDL and Analytics Zoo"
LucaCanali/sparkmonitor
Monitor Apache Spark from Jupyter Notebook
LucaCanali/tf-spawner
spawn workers for tensorflow MultiWorkerMirroredStrategy