This talk considers questions at the interface between statistics and computation that arise in solving high-dimensional inference problems. First, given a practical (polynomial-time) algorithm, what are the statistical limitations of its performance? Second, how do such practical guarantees compare to information-theoretic bounds, which apply to the performance of any algorithm regardless of its computational complexity?
We analyze these issues in high-dimensional versions of two well-known inference problems: (a) model selection in Markov random fields; and (b) the sparse PCA or eigenvector problem. For both problems, we begin by analyzing simple polynomial-time methods based on convex relaxations: \ell_1-constrained logistic regression for Markov random fields; and a semidefinite programming (SDP) relaxation, due to Aspremont et al, for sparse PCA. For both methods, we provide thresholds on the sample size n that controls their success/failure as a function of the problem size p, and the sparsity index k. Using information-theoretic methods, we then analyze the performance of optimal methods for these problems, and compare them to the practical ones.
Based on joint works with Arash Amini, John Lafferty, Pradeep Ravikumar, Prasad Santhanam.
