Extended stochastic dynamics: theory, algorithms, and applications in multiscale modelling and data science
MetadataShow full item record
This thesis addresses the sampling problem in a high-dimensional space, i.e., the computation of averages with respect to a defined probability density that is a function of many variables. Such sampling problems arise in many application areas, including molecular dynamics, multiscale models, and Bayesian sampling techniques used in emerging machine learning applications. Of particular interest are thermostat techniques, in the setting of a stochastic-dynamical system, that preserve the canonical Gibbs ensemble defined by an exponentiated energy function. In this thesis we explore theory, algorithms, and numerous applications in this setting. We begin by comparing numerical methods for particle-based models. The class of methods considered includes dissipative particle dynamics (DPD) as well as a newly proposed stochastic pairwise Nosé-Hoover-Langevin (PNHL) method. Splitting methods are developed and studied in terms of their thermodynamic accuracy, two-point correlation functions, and convergence. When computational efficiency is measured by the ratio of thermodynamic accuracy to CPU time, we report significant advantages in simulation for the PNHL method compared to popular alternative schemes in the low-friction regime, without degradation of convergence rate. We propose a pairwise adaptive Langevin (PAdL) thermostat that fully captures the dynamics of DPD and thus can be directly applied in the setting of momentum-conserving simulation. These methods are potentially valuable for nonequilibrium simulation of physical systems. We again report substantial improvements in both equilibrium and nonequilibrium simulations compared to popular schemes in the literature. We also discuss the proper treatment of the Lees-Edwards boundary conditions, an essential part of modelling shear flow. We also study numerical methods for sampling probability measures in high dimension where the underlying model is only approximately identified with a gradient system. These methods are important in multiscale modelling and in the design of new machine learning algorithms for inference and parameterization for large datasets, challenges which are increasingly important in "big data" applications. In addition to providing a more comprehensive discussion of the foundations of these methods, we propose a new numerical method for the adaptive Langevin/stochastic gradient Nosé-Hoover thermostat that achieves a dramatic improvement in numerical efficiency over the most popular stochastic gradient methods reported in the literature. We demonstrate that the newly established method inherits a superconvergence property (fourth order convergence to the invariant measure for configurational quantities) recently demonstrated in the setting of Langevin dynamics. Furthermore, we propose a covariance-controlled adaptive Langevin (CCAdL) thermostat that can effectively dissipate parameter-dependent noise while maintaining a desired target distribution. The proposed method achieves a substantial speedup over popular alternative schemes for large-scale machine learning applications.