Professor of Mathematics and Statistics
Stanford University, USA
Department of Statistics
390 Jane Stanford Way
Stanford, CA 94305, USA
I am interested in probability theory, statistics, and mathematical physics. The following are some selected papers and preprints, classified by subject area. For the complete list of publications and preprints, click here or visit my page in Google Scholar.
A state space for 3D Euclidean Yang-Mills theories. Preprint. 2021.
(Coauthored with Sky Cao.)
It is believed that Euclidean Yang-Mills theories behave like the massless Gaussian free field (GFF) at short distances. This makes it impossible to define the main observables for these theories, the Wilson loop observables, in dimensions greater than two, because line integrals of the GFF do not exist in such dimensions. Taking forward a proposal of Charalambous and Gross, this article shows that it is possible to define Euclidean Yang-Mills theories on the 3D unit torus as "random distributional gauge orbits", provided that they indeed behave like the GFF in a certain sense. One of the main technical tools is the existence of the Yang-Mills heat flow on the 3D torus starting from GFF-like initial data, which is established in the companion paper below. A key consequence of this construction is that under the GFF assumption, one can define a notion of "regularized Wilson loop observables" for Euclidean Yang-Mills theories on the 3D unit torus.
The Yang-Mills heat flow with random distributional initial data. Preprint. 2021.
(Coauthored with Sky Cao.)
This paper constructs local solutions to the Yang-Mills heat flow (in the DeTurck gauge) for a certain class of random distributional initial data, which includes the 3D Gaussian free field. The main idea, which goes back to work of Bourgain as well as work of Da Prato and Debussche, is to decompose the solution into a rougher linear part and a smoother nonlinear part, and to control the latter by probabilistic arguments. In the above companion work, the main results of this paper are used to propose a way towards the construction of 3D Yang-Mills measures.
A probabilistic mechanism for quark confinement. Comm. Math. Phys. 2021.
The confinement of quarks is one of the enduring mysteries of modern physics. There is a longstanding physics heuristic that confinement is a consequence of "unbroken center symmetry". This article gives mathematical confirmation of this heuristic, by rigorously defining of center symmetry in lattice gauge theories and proving that a theory is confining when center symmetry is unbroken. Furthermore, a sufficient condition for unbroken center symmetry is given: It is shown that if the center of the gauge group is nontrivial, and correlations decay exponentially under arbitrary boundary conditions, then center symmetry does not break.
Rigorous solution of strongly coupled SO(N) lattice gauge theory in the large N limit. Comm. Math. Phys. 2019.
Gauge-string duality is a general notion in physics which claims that gauge theories — which are theories of the quantum world — are sometimes dual to certain string theories, which are theories of gravity. The main result of this paper is a rigorous computation of Wilson loop expectations in strongly coupled SO(N) lattice gauge theory in the large N limit, in any dimension. The formula appears as an absolutely convergent sum over trajectories in a kind of string theory on the lattice, demonstrating an explicit gauge-string duality.
Local KPZ behavior under arbitrary scaling limits. To appear in Comm. Math. Phys.
One of the main difficulties in proving convergence of discrete models of surface growth to the Kardar-Parisi-Zhang (KPZ) equation in dimensions higher than one is that the correct way to take a scaling limit, so that the limit is nontrivial, is not known in a rigorous sense. To understand KPZ growth without being hindered by this issue, this article introduces a notion of "local KPZ behavior", which roughly means that the instantaneous growth of the surface at a point decomposes into the sum of a Laplacian term, a gradient squared term, a noise term that behaves like white noise, and a remainder term that is negligible compared to the other three terms and their sum. The main result is that for a general class of surfaces, which contains the model of directed polymers in a random environment as a special case, local KPZ behavior occurs under arbitrary scaling limits, in any dimension.
Convergence of deterministic growth models. To appear in Arch. Rational Mech. Anal.
(Coauthored with Panagiotis E. Souganidis.)
This paper proves the uniform in space and time convergence of the scaled heights of large classes of deterministic growth models that are monotone and equivariant under translations by constants. The limits are unique viscosity solutions of first- or second-order partial differential equations depending on whether the growth models are scaled hyperbolically or parabolically. The results simplify and extend a recent work by the first author on convergence to deterministic KPZ growth to more general surface growth models. The proofs are based on the methodology developed by Barles and the second author to prove convergence of approximation schemes.
It is a longstanding conjecture that in the model of first-passage percolation on a lattice, two important numbers, known as the fluctuation exponent and the wandering exponent, are related through a universal relation that does not depend on the dimension. This is sometimes called the KPZ relation. This paper gives a rigorous proof of the KPZ relation assuming that the exponents exist in a certain sense.
Nonlinear large deviations. Adv. Math. 2016.
(Coauthored with Amir Dembo.)
This paper develops techniques for understanding large deviations of sparse random graphs. The techniques for dense graphs, based on Szemerédi's regularity lemma, are not applicable in the sparse regime. The technology developed here applies more broadly to large deviations for nonlinear functions of independent random variables, going beyond classical methods which cater mostly to linear functions.
The large deviation principle for the Erdős-Rényi random graph. European J. Comb. 2011.
(Coauthored with S. R. S. Varadhan.)
This paper develops a theory of large deviations for random graphs, using Szemerédi's regularity lemma and the theory of graph limits.
Superconcentration and Related Topics. Springer, Cham. 2014.
This monograph studies three features of Gaussian random fields, called superconcentration, chaos, and multiple valleys, and explores the relations between them. It is shown that superconcentration is equivalent to chaos, and chaos implies multiple valleys. Superconcentration has been a known feature in probability theory for a while (under different names). This book connects it to chaos and multiple valleys. Two main results in the book are proofs of the disorder chaos and multiple valley conjectures for mean-field spin glasses.
The missing log in large deviations for triangle counts. Random Structures Algorithms. 2012.
This paper solves the problem of sharp large deviation estimates for the upper tail of the number of triangles in an Erdős-Rényi random graph, by establishing a logarithmic factor in the exponent that was missing till now. It is possible that the method of proof may extend to general subgraph counts.
Stein's method for concentration inequalities. Probab. Theory Related Fields. 2007.
This paper introduces a version of Stein's method for proving concentration inequalities, building on methods developed in my Ph.D. thesis. Previously, Stein's method was used only for proving central limit theorems.
This paper introduces a new class of probabilistic inequalities called second order Poincaré inequalities, which can be used to prove central limit theorems for complicated functions of independent random variables where we can only gather information about the sizes of first and second order derivatives. Applications to linear statistics of eigenvalues of random matrices are worked out.
A new method of normal approximation. Ann. Probab. 2008.
This paper introduces a new version of Stein's method that reduces a large class of normal approximation problems to variance bounding exercises, thus making a connection between central limit theorems and concentration of measure. Unlike Skorokhod embeddings, the object whose variance must be bounded has an explicit formula that makes it possible to carry out the program more easily. As an application, a general CLT is proved for functions that are obtained as combinations of many local contributions, where the definition of "local" itself depends on the data. Several examples are given, including the solution to a nearest-neighbor CLT problem posed by P. Bickel.
A generalization of the Lindeberg principle. Ann. Probab. 2006.
This paper generalizes Lindeberg's proof of the central limit theorem to an invariance principle for arbitrary smooth functions of independent and weakly dependent random variables. The result is then used to prove universality of limiting spectral distributions of certain kinds of random matrices.
Convergence of gradient descent for deep neural networks. Preprint. 2022.
Optimization by gradient descent has been one of main drivers of the "deep learning revolution". Yet, despite some recent progress for extremely wide networks, it remains an open problem to understand why gradient descent often converges to global minima when training deep neural networks. This article presents a new criterion for convergence of gradient descent to a global minimum, which is provably more powerful than the best available criteria from the literature, namely, the Lojasiewicz inequality and its generalizations. This criterion is used to show that gradient descent with proper initialization converges to a global minimum when training any feedforward neural network with smooth and strictly increasing activation functions, provided that the input dimension is greater than or equal to the number of data points.
A new coefficient of correlation. J. Amer. Statist. Assoc. 2021.
Is it possible to define a coefficient of correlation which is (a) as simple as the classical coefficients like Pearson's correlation or Spearman's correlation, and yet (b) consistently estimates some simple and interpretable measure of the degree of dependence between the variables, which is 0 if and only if the variables are independent and 1 if and only if one is a measurable function of the other, and (c) has a simple asymptotic theory under the hypothesis of independence, like the classical coefficients? This article answers this question in the affirmative, by producing such a coefficient.
Matrix estimation by Universal Singular Value Thresholding. Ann. Statist. 2015.
Consider the problem of estimating the entries of a large matrix, when the observed entries are noisy versions of a small random fraction of the original entries. This paper introduces a simple estimation procedure, called Universal Singular Value Thresholding (USVT), that works for any matrix that has "a little bit of structure". Surprisingly, this simple estimator achieves the minimax optimal error rate up to a constant factor. Many examples are worked out.
A new perspective on least squares under convex constraint. Ann. Statist. 2014.
Consider the problem of estimating the mean of a Gaussian random vector when the mean vector is assumed to be in a given convex set. The most natural solution is to take the Euclidean projection of the data vector on to this convex set; in other words, performing "least squares under a convex constraint". Many problems in modern statistics and statistical signal processing theory are special cases of this general situation. Examples include the lasso and other high-dimensional regression techniques, function estimation problems, matrix estimation and completion, shape-restricted regression, constrained denoising, linear inverse problems, etc. This paper presents three general results about this problem, namely, (a) an exact computation of the main term in the estimation error by relating it to expected maxima of Gaussian processes (existing results only give upper bounds), (b) a theorem showing that the least squares estimator is always admissible up to a universal constant in any problem of the above kind and (c) a counterexample showing that least squares estimator may not always be minimax rate-optimal. The result from part (a) is then used to compute the error of the least squares estimator in two examples of contemporary interest.
Estimating and understanding exponential random graph models. Ann. Statist. 2013.
(Coauthored with Persi Diaconis.)
This paper introduces a method for the theoretical analysis of exponential random graph models. The method is based on a large-deviations approximation to the normalizing constant shown to be consistent using theory developed by Chatterjee and Varadhan.
Average Gromov hyperbolicity and the Parisi ansatz. Adv. Math. 2021.
(Coauthored with Leila Sloman.)
Gromov hyperbolicity of a metric space measures the distance of the space from a perfect tree-like structure. The measure has a worst-case aspect to it, in the sense that it detects a region in the space which sees the maximum deviation from tree-like structure. This article introduces an "average-case" version of Gromov hyperbolicity, which detects whether the "most of the space", with respect to a given probability measure, looks like a tree. The main result of the paper is that if this average hyperbolicity is small, then the space can be approximately embedded in a tree. The result is applied to construct hierarchically organized pure states in any model of a spin glass that satisfies the Parisi ultrametricity ansatz.
The sample size required in importance sampling. Ann. App. Probab. 2018.
(Coauthored with Persi Diaconis.)
The goal of importance sampling is to estimate the expected value of a given function with respect to a probability measure Q using a random sample of size n drawn from a different probability measure P. If the two measures are nearly singular with respect to each other, which is often the case in practice, the sample size required for accurate estimation is large. The main result of this article shows that in a fairly general setting, a sample of size approximately exp(D) is necessary and sufficient for accurate estimation by importance sampling, where D is the Kullback-Leibler divergence of P from Q. In particular, the required sample size exhibits a kind of cut-off in the logarithmic scale.
Invariant measures and the soliton resolution conjecture. Comm. Pure Appl. Math. 2014.
The soliton resolution conjecture for the focusing nonlinear Schrödinger (NLS) equation is the vaguely worded claim that a global solution of the NLS equation, for generic initial data, will eventually resolve into a radiation component that disperses like a linear solution, plus a localized component that behaves like a soliton or multi-soliton solution. This paper gives a tentative formulation and proof of the soliton resolution conjecture for the discrete NLS equation.
Gravitational allocation to Poisson points. Ann. Math. 2010.
(Coauthored with Ron Peled, Yuval Peres and Dan Romik.)
This paper introduces and analyzes a technique for allocating regions in space to the points of a Poisson point process, such that the regions have equal volume, are connected, and collectively form a partition of the whole space.