## Sketch and project: randomized iterative methods for linear systems and inverting matrices

##### Abstract

Probabilistic ideas and tools have recently begun to permeate into several fields where they
had traditionally not played a major role, including fields such as numerical linear algebra
and optimization. One of the key ways in which these ideas influence these fields is via the
development and analysis of randomized algorithms for solving standard and new problems of
these fields. Such methods are typically easier to analyze, and often lead to faster and/or more
scalable and versatile methods in practice.
This thesis explores the design and analysis of new randomized iterative methods for
solving linear systems and inverting matrices. The methods are based on a novel sketch-and-project
framework. By sketching we mean, to start with a difficult problem and then randomly
generate a simple problem that contains all the solutions of the original problem. After sketching
the problem, we calculate the next iterate by projecting our current iterate onto the solution
space of the sketched problem.
The starting point for this thesis is the development of an archetype randomized method
for solving linear systems. Our method has six different but equivalent interpretations: sketch-and-project, constrain-and-approximate, random intersect, random linear solve, random update
and random fixed point. By varying its two parameters – a positive definite matrix (defining
geometry), and a random matrix (sampled in an i.i.d. fashion in each iteration) – we recover
a comprehensive array of well known algorithms as special cases, including the randomized
Kaczmarz method, randomized Newton method, randomized coordinate descent method and
random Gaussian pursuit. We also naturally obtain variants of all these methods using blocks
and importance sampling. However, our method allows for a much wider selection of these two
parameters, which leads to a number of new specific methods. We prove exponential convergence
of the expected norm of the error in a single theorem, from which existing complexity results
for known variants can be obtained. However, we also give an exact formula for the evolution
of the expected iterates, which allows us to give lower bounds on the convergence rate.
We then extend our problem to that of finding the projection of given vector onto the
solution space of a linear system. For this we develop a new randomized iterative algorithm:
stochastic dual ascent (SDA). The method is dual in nature, and iteratively solves the dual of
the projection problem. The dual problem is a non-strongly concave quadratic maximization
problem without constraints. In each iteration of SDA, a dual variable is updated by a carefully
chosen point in a subspace spanned by the columns of a random matrix drawn independently
from a fixed distribution. The distribution plays the role of a parameter of the method. Our
complexity results hold for a wide family of distributions of random matrices, which opens the
possibility to fine-tune the stochasticity of the method to particular applications. We prove
that primal iterates associated with the dual process converge to the projection exponentially
fast in expectation, and give a formula and an insightful lower bound for the convergence rate.
We also prove that the same rate applies to dual function values, primal function values and the
duality gap. Unlike traditional iterative methods, SDA converges under virtually no additional
assumptions on the system (e.g., rank, diagonal dominance) beyond consistency. In fact, our
lower bound improves as the rank of the system matrix drops. By mapping our dual algorithm to
a primal process, we uncover that the SDA method is the dual method with respect to the sketch-and-project method from the previous chapter. Thus our new more general convergence results
for SDA carry over to the sketch-and-project method and all its specializations (randomized
Kaczmarz, randomized coordinate descent...etc). When our method specializes to a known
algorithm, we either recover the best known rates, or improve upon them. Finally, we show
that the framework can be applied to the distributed average consensus problem to obtain an
array of new algorithms. The randomized gossip algorithm arises as a special case.
In the final chapter, we extend our method for solving linear system to inverting matrices,
and develop a family of methods with specialized variants that maintain symmetry or positive
definiteness of the iterates. All the methods in the family converge globally and exponentially,
with explicit rates. In special cases, we obtain stochastic block variants of several quasi-Newton
updates, including bad Broyden (BB), good Broyden (GB), Powell-symmetric-Broyden (PSB),
Davidon-Fletcher-Powell (DFP) and Broyden-Fletcher-Goldfarb-Shanno (BFGS). Ours are the
first stochastic versions of these updates shown to converge to an inverse of a fixed matrix.
Through a dual viewpoint we uncover a fundamental link between quasi-Newton updates and
approximate inverse preconditioning. Further, we develop an adaptive variant of the randomized
block BFGS (AdaRBFGS), where we modify the distribution underlying the stochasticity of
the method throughout the iterative process to achieve faster convergence. By inverting several
matrices from varied applications, we demonstrate that AdaRBFGS is highly competitive when
compared to the well established Newton-Schulz and approximate preconditioning methods. In
particular, on large-scale problems our method outperforms the standard methods by orders
of magnitude. The development of efficient methods for estimating the inverse of very large
matrices is a much needed tool for preconditioning and variable metric methods in the big data
era.