Talk to be presented at 13eJournées informatiques IN2P3/Irfu
The presentation was written as a Jupyter(lab) notebook. For an interactive presentation, click on the binder badge above or here. The static version follows. The presentation notebook (01-Julia-dream-JIII2021.ipynb) and the accompagning ones can be downloaded from this repository.
Philippe Gras (Université Paris-Saclay, CEA/Irfu)
- Julia is a relatively new programming language
- Let's start with few words on the context...
- Computing plays a central role in the research done at LHC
- Theoretical predictions: simulation of the proton-proton collisions
- Simulation of the detector response
- Reconstruction of the phyics events
- Analysis of reconstructed events to perform measurements and new physics search.
- Behind an LHC experiment result publication there are tens of thousands of computing jobs that have run on the worldwide computing grid. Both ATLAS and CMS reached their 1000th papers in June 2020.
- Research code: developed by the main authors of the prepared publication
- C++ is widely used and the performance it offers is essential
- More and more combined with Python, that provides easy/fast coding it offers, and comes with an extensive library ecosystem. But it does not meet the performance provided by C++
- A programming language that offers at the same time the C++ and Python pros would be more convenient
Fast/easy coding fast | Fast running | |
---|---|---|
Python | ⇔ | C/C++ |
-
J. Bezanson, A. Edelman, S. Karpinski, and V. B. Shah tackled the problem in 2009 aiming to design a programing langue that providess both Fast/easy coding AND Fast running
- Birth of Julia, release 0.1 in 2013
- This breakthrough was recognized by awards attributed to the authors
-
In 12 years since its conceptualisation, Julia has been improved from release to release and has aggratated many package developers
-
Julia for HEP: Performance of Julia for High Energy Physics Analyses, Marcel Stanitzki & Jan Strube
HEP enjoys loop: we loop on physics events to loop on particles/physics objects. We often perform particle matching and clustering and for this we loop on events to loop on objects to loop on objects.
for event in billions_of_lhc_events
for tens_or_hundreds_of_objects in event
for tens_or_hundres_of_objects_to_match in event
...
end
end
end
- The outer loops hides several loops: datasets > files
- This is repeated several times for each analysis.
⇒ Lines of code executed billions of times even for a Kleenex code, written specially for a publication.
- A master rule for high-performance code in Python is to avoid writing loop in Python
- ⇒ push the loop to underlying compiled libraries. Approach of the
numpy
vectorisation.
- ⇒ push the loop to underlying compiled libraries. Approach of the
Let's run a simple loop
Python |
---|
45 ms |
#include <iostream>
#include <sys/time.h>x
int main(){
struct timeval t0, t1;
gettimeofday(&t0, 0);
double a = 0.;
for(unsigned i = 0; i <= 1000000; ++i) a += 1.0/i;
std::cout << "Computation Result: " << a << "\n";
gettimeofday(&t1, 0);
std::cerr << "Duration: " << (t1.tv_sec-t0.tv_sec)
+ 1.e-6*(t1.tv_usec-t0.tv_usec)
<< " seconds\n";
return 0;
}
run(`g++ -Wall -o simple-loop simple-loop.cc`)
run(`./simple-loop`)
;
run(`g++ -O3 -Wall -o simple-loop simple-loop.cc`)
run(`./simple-loop`)
;
Computation Result: 14.3927
Computation Result: 14.3927
Duration: 0.003238 seconds
Duration: 0.001518 seconds
Python | C++ |
---|---|
45 ms | 1.0 ms |
#
# Julia
#
function f()
a = 0.0
for i in 1:1_000_000 # ✨ Note the underscores that improves legibility
a = a + 1.0/i
end
return a
end
f()
@time b = f()
0.001369 seconds (1 allocation: 16 bytes)
14.392726722864989
Python | C++ | Julia |
---|---|---|
45 ms | 1.0 ms | 1.0 ms |
- You have already seen in the previous example that the code syntax and grammar is similar to Pythons. No ''std::map<std:string, std::vector>''..., no compilation step.
- Easy to learn
- Syntactic sugars similar to Python for a concise code: list comprehension, a < b < c, 1_000_000, support of symbols for variables... and more: e.g. a function call is "vectorized" (ala numpy) with a simple dot, f.(x)
- Interactive help, nice tools to debug, to optimize code, for introspection.
- Multidispatch eases remarkably use/extension of third-party libraries → explains the rapid grow of the Julia ecosystem.
Internet search engine and stack overflow play is an essential ingredient in nowadays programming workflow.
Julia is already widespread enough, to find all the information on the Internet.
In addition to usual resource, Julia has dedicated forum on Discourse, Slack, and Zulip with an active and friendly community.
Go to DuckDuckGo or your prefered search engine and make a try.
- Large set of libraries and active developement
- Julia is firstly used by scientific community ⇒ oriented to our needs
- I did the following game during the PyHEP2021 workshop: I've looked for a Julia equivalent each time a speaker mention a Python library (apart from HEP specific ones).
- Caveat: I've not checked that it covers all the features of the Python package
- The results of this survey shows the large activity around Julia
cmd (Olivier Mattelaer) ✓
FreeCAD interface (Christophe Delaere) ❌ In discussion
Telegram bot (Matias Senge) ✓ https://github.com/Arkoniak/Telegram.jl
DataFrames (Vincenzo Eduardo Padulano) ✓
Spark (Vicenzo and Andr F.) ✓ https://github.com/dfdx/Spark.jl
Dask (Vincenzo E. P., Graham Markal) ✓
Batch computing (Vincenzo E. P.) ✓ https://docs.julialang.org/en/v1/manual/parallel-computing/, https://github.com/JuliaParallel, https://juliagpu.org/
Apache Parquet (Andre Frankenthal) ✓
Jupyter/Binder/SWAN ✓
Bokeh (Bruno Alves) ✓ https://github.com/samuelcolvin/Bokeh.jl
CUDA (Graham Markall) ✓ https://juliagpu.org/cuda/
Hypothesis (Santam Roy Choudhury, property testing) ❌ (beside an unmaintained QuickCheck projet)
Virtualenv (Henry Schneider, Packaging talk) ✓ built in the std package manager
Unit test tools (Henry Schneider, Packaging talk) ✓ std package and more: Coverage, FactCheck
JIT/Numba (Graham Markal, Henry Schneider) ✓ Intrisic to the language
Machine learning ✓ Flux, JuliaML,
TensorFlow(Matthew Feickert)TensorFlow
GPyTorch/Gaussian Process ML (Irina Espejo Morales talk) GPML
- Non-HEP format
- HDF5 and Parquet are fully supported (also CSV and Excel, less relevant our data size)
- ROOT
- Two packages, developed by users.
- Emacs and vim support
- Atom and VScode support. Many features. Code can be run and debugged with the IDE, with support for plots.
- Jupyter
- Pluto. A new generation notebook with automatic update of cells.
- Debugger: Debugger, Rebugger, Juno debugger (for Atom IDE)
- Python made it easy with conda and pip. It's even easier in Julia
- A standard library part of the Julia installation
- Give instructions to the user, when he or she tries to import a missing package. Try it:
import Blink
ArgumentError: Package Blink not found in current path:
- Run `import Pkg; Pkg.add("Blink")` to install the Blink package.
Stacktrace:
[1] require(into::Module, mod::Symbol)
@ Base ./loading.jl:967
[2] eval
@ ./boot.jl:373 [inlined]
[3] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
@ Base ./loading.jl:1196
In the REPL (interactive terminal application equivalent ipython), since version 1.7, the user is proposed to install the missing package and does not even need to enter the installation command.
💡 Dedicated command mode for package handling in the REPL:
julia> ]
(@v1.6) pkg> add Blink
(@v1.6) pkg> [Backspace]
julia>
- Python, C, Fortran code: direct call from Julia and Jupyter Julia kernels
- C++ code: call via a wrapper. Lacking a tool for automatic generation of wrapper like swig. Project for direct-call (ala cppyy) on hold and not working for recent versions of Julia.
The other way around
- Python code can call Julia as well
- C/C++ code can call Julia code
As simple as calling Julia code
# Enable Python call:
using PyCall
# Inport a python module:
math = pyimport("math")
# Use it as a Julia module:
math.sin(math.pi / 4)
0.7071067811865475
path = ccall(:getenv, Cstring, (Cstring,), "SHELL")
unsafe_string(path)
"/bin/bash"
For C, you will typically write a wrapper in ''Julia'' to handle errors, like:
function getenv(var::AbstractString)
val = ccall(:getenv, Cstring, (Cstring,), var)
if val == C_NULL
error("getenv: undefined variable: ", var)
end
return unsafe_string(val)
end
getenv (generic function with 1 method)
println(getenv("USER"))
println(getenv("SMOKE")) # ⇒ will through an exception unless you have SMOKE in your environment
pgras
getenv: undefined variable: SMOKE
Stacktrace:
[1] error(::String, ::String)
@ Base ./error.jl:42
[2] getenv(var::String)
@ Main ./In[7]:4
[3] top-level scope
@ In[8]:2
[4] eval
@ ./boot.jl:373 [inlined]
[5] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
@ Base ./loading.jl:1196
- Julia code can also be embedded in C/C++
$ python3 -m pip install julia # install PyJulia
... # you may need `--user` after `install`
$ python3
>>> import julia
>>> julia.install() # install PyCall.jl etc.
>>> from julia import Base # short demo
>>> Base.sind(90)
1.0
⚡ It's an extremely simple analysis, way far from usual LHC analysis
Let's go,
Running times for different implentations of the dimuon analysis:
Julia | Python event loop | Python RDataFrame JIT-compiled C++ |
Python RDataFrame JIT-compiled python (Numba) |
---|---|---|---|
35 s | 4h 5min | 60 s | 125s |
No need to think about peformance when writing the code
- Julia is not just another language. It answers a need,
Reconciling fast running with an easy and fast coding
- It comes with a large Ecosystem
- It's now mature enough to be used for HEP
It's the right time to catch the Julia's train - Study of its potential for HEP started within the HEP Software foundation last September a report is in preparation.