Find me at SIAM MDS 2024

I will be hosting MT24 (MamBayes: Model-Free Long-Horizon Time Series Forecasting with Mamba) on Thursday at 5:30.

I will also be presenting a poster titled Normalizing Flows for Simulation-Based Inference of Non-Markovian and Multiscale Stochastic Processes during the Friday section.

Blog
Tutorials on Bayesian statistics
  1. Why do we need Bayesian statistics? Part I – Asserting if a coin is biased -- GitHub
  2. Why do we need Bayesian statistics? Part II — The lighthouse problem -- GitHub
  3. Why do we need Bayesian statistics? Part III – Learning multivariate distributions -- GitHub
Other blog posts
Selected Publications
Avoiding matrix exponentials for large transition rate matrices

Published: P Pessoa, M Schweiger, S Pressé (2024) Journal of Chemical Physics, 160, 094109

Preprint: Available at arXiv

Abstract:

Exact methods for exponentiation of matrices of dimension N can be computationally expensive in terms of execution time (N3) and memory requirements (N2) not to mention numerical precision issues. A type of matrix often exponentiated in the sciences is the rate matrix. Here we explore five methods to exponentiate rate matrices, some of which apply even more broadly to other matrix types. Three of the methods leverage a mathematical analogy between computing matrix elements of a matrix exponential and computing transition probabilities of a dynamical process (technically a Markov jump process, MJP, typically simulated using Gillespie). In doing so, we identify a novel MJP-based method relying on restricting the number of "trajectory" jumps based on the magnitude of the matrix elements with favorable computational scaling. We then discuss this method's downstream implications on mixing properties of Monte Carlo posterior samplers. We also benchmark two other methods of matrix exponentiation valid for any matrix (beyond rate matrices and, more generally, positive definite matrices) related to solving differential equations: Runge-Kutta integrators and Krylov subspace methods. Under conditions where both the largest matrix element and the number of non-vanishing elements scale linearly with N — reasonable conditions for rate matrices often exponentiated — computational time scaling with the most competitive methods (Krylov and one of the MJP-based methods) reduces to N2 with total memory requirements of N.

How many submissions are needed to discover friendly suggested reviewers?

Published: P Pessoa, S Pressé (2023) PLoS ONE, 18(4), e0284212

Preprint: Available at arXiv

Abstract:

It is common in scientific publishing to request from authors reviewer suggestions for their own manuscripts. The question then arises: How many submissions are needed to discover friendly suggested reviewers? To answer this question, as the data we would need is anonymized, we present an agent-based simulation of (single-blinded) peer review to generate synthetic data. We then use a Bayesian framework to classify suggested reviewers. To set a lower bound on the number of submissions possible, we create an optimistically simple model that should allow us to more readily deduce the degree of friendliness of the reviewer. Despite this model’s optimistic conditions, we find that one would need hundreds of submissions to classify even a small reviewer subset. Thus, it is virtually unfeasible under realistic conditions. This ensures that the peer review system is sufficiently robust to allow authors to suggest their own reviewers.

Bose-Einstein statistics for a finite number of particles

Published: P Pessoa (2021) Physical Review A, 104, 043318

Preprint: Available at arXiv

Abstract:

This paper presents a study of the grand canonical Bose-Einstein (BE) statistics for a finite number of particles in an arbitrary quantum system. The thermodynamical quantities that identify BE condensation—namely, the fraction of particles in the ground state and the specific heat—are calculated here exactly in terms of temperature and fugacity. These calculations are complemented by a numerical calculation of fugacity in terms of the number of particles, without taking the thermodynamic limit. The main advantage of this approach is that it does not rely on approximations made in the vicinity of the usually defined critical temperature, rather it makes calculations with arbitrary precision possible, irrespective of temperature. Graphs for the calculated thermodynamical quantities are presented in comparison to the results previously obtained in the thermodynamic limit. In particular, it is observed that for the gas trapped in a three-dimensional box, the derivative of specific heat reaches smaller values than what was expected in the thermodynamic limit—here, this result is also verified with analytical calculations. This is an important result for understanding the role of the thermodynamic limit in phase transitions and makes possible to further study BE statistics without relying neither on the thermodynamic limit nor on approximations near critical temperature.

Information geometry for Fermi–Dirac and Bose–Einstein quantum statistics

Published: P Pessoa, C Cafaro (2021) Physica A: Statistical Mechanics and its Applications, 576, 126061

Preprint: Available at arXiv

Abstract:

Information geometry is an emergent branch of probability theory that consists of assigning a Riemannian differential geometry structure to the space of probability distributions. We present an information geometric investigation of gases following the Fermi–Dirac and the Bose–Einstein quantum statistics. For each quantum gas, we study the information geometry of the curved statistical manifolds associated with the grand canonical ensemble. The Fisher–Rao information metric and the scalar curvature are computed for both fermionic and bosonic models of non-interacting particles. In particular, by taking into account the ground state of the ideal bosonic gas in our information geometric analysis, we find that the singular behavior of the scalar curvature in the condensation region disappears. This is a counterexample to a long held conjecture that curvature always diverges in phase transitions.

Entropic dynamics on Gibbs statistical manifolds

Published: P Pessoa, F Xavier Costa, A Caticha (2021) Entropy 2021, 23(5), 494

Preprint: Available at arXiv

Abstract:

Entropic dynamics is a framework in which the laws of dynamics are derived as an application of entropic methods of inference. Its successes include the derivation of quantum mechanics and quantum field theory from probabilistic principles. Here, we develop the entropic dynamics of a system, the state of which is described by a probability distribution. Thus, the dynamics unfolds on a statistical manifold that is automatically endowed by a metric structure provided by information geometry. The curvature of the manifold has a significant influence. We focus our dynamics on the statistical manifold of Gibbs distributions (also known as canonical distributions or the exponential family). The model includes an “entropic” notion of time that is tailored to the system under study; the system is its own clock. As one might expect that entropic time is intrinsically directional; there is a natural arrow of time that is led by entropic considerations. As illustrative examples, we discuss dynamics on a space of Gaussians and the discrete three-state system.

Selected Software Packages
SMN (Sparse matrices in Numba)

Library with a class for sparse matrices that is compatible with the popular Numba compilation tool for fast machine code in Python.

IGQG (Information Geometry of Quantum Gases)

Library with functions needed for my work on Bose-Einstein condensation. These tools are based on the mpmath library for arbitrary numeric precision.