# notes

As a general note, please contact me if you find errors in my notes so that I may fix them.

- Talks
- Ravi Kannan on Linear Algebra in Machine Learning [pdf]
- Nicholas Boumal on ManOpt: Manifold Optimization [pdf]
- Michael Jordan on Computational Thinking, Inferential Thinking, and Data Science [pdf]
- Joel Tropp on Universality Laws for Randomized Dimension Reduction [pdf]
- Joel Tropp on Finding Structure With Randomness [pdf]
- Michael Jordan on Nonparametric Bayesian and Combinatorial Stochastic Processes [pdf]
- Jacob Steinhardt on Unsupervised Risk Estimation with Only Structural Assumptions [pdf]
- Yuanzhi Li on Weighted Low-Rank Matrix Approximation [pdf]
- Zeyuan Allen Zhu on Linear Coupling Framework of Gradient and Mirror Descent [pdf]
- Naman Agarwal and Brian Bullins on LiSSA: A Linear Time Second-Order Stochastic Algorithm [pdf]
- Andrew Barron on Computationally Feasible Greedy Algorithms for Neural Networks [pdf]
- Samy Bengio on Neural Image Captioning [pdf]
- Zaid Harchaoui on Convolutional Kernel Neural Networks [pdf]
- Lester Mackey on Divide-and-Conquer Matrix Completion [pdf]
- Gillat Kol on Interactive Information Theory [pdf]
- Sanjeev Arora on Reversible Deep Nets [pdf]
- Santosh Vempala on the Complexity of Detecting Planted Solutions [pdf]
- Amir Ali Ahmadi on Optimizing over Nonnegative Polynomials [pdf]
- Francisco Pereira on Decoding Generic Representations of fMRI [pdf]
- Elad Hazan on Simulated Annealing and Interior Point Methods [pdf]
- Dana Lahat on Joint Independent Subspace Analysis and Blind Source Separation [pdf]
- Barbara Engelhardt on Bayesian Structured Sparsity Using Gaussian Fields [pdf]
- Dimitris Bertsimas on Statistics and Machine Learning from a Modern Optimization Lens [pdf]
- Sébastien Bubeck on Optimal Regret Bounds for the General Convex Multi-Armed Bandit Setting [pdf]
- Han Liu on Nonparametric Graphical Models [pdf]
- Mehryar Mohri on Deep Boosting [pdf]
- [pdf]
- Young Kun Ko on the Hardness of Sparse PCA [pdf]
- Ben Recht on Perceptron Learning and Stability
- Tom Griffiths on Rationality, Heuristics, and the Cost of Computation [pdf]
- Anna Choromanska on Optimization for Large-Scale Machine Learning [pdf]
- Richard Socher on Dynamic Memory Networks [pdf]

- Classes
- Useful Links

## Talks

I attend various academic talks and seminars and sometimes scribe them.

### Ravi Kannan on Linear Algebra in Machine Learning [pdf]

Ravi Kannan from Microsoft Research gave a talk at COLT 2016 on the role of linear algebra in machine learning over the years, taking a historical perspective and covering the main highlights as well as some more recent results.

### Nicholas Boumal on ManOpt: Manifold Optimization [pdf]

Nicholas Boumal from Princeton gave a talk at the Alg-ML Reading Group on manifold optimization and ManOpt, his tool for solving optimization problems.

### Michael Jordan on Computational Thinking, Inferential Thinking, and Data Science [pdf]

Professor Michael Jordan from Berkeley gave a general talk on the trends towards intersecting the statistical, computer science, and optimization communities. The discussion includes inferential differential privacy, inferential communication theory, inferential computation theory, and finally, an aside on a general framework for accelerated Nesterov methods.

### Joel Tropp on Universality Laws for Randomized Dimension Reduction [pdf]

Professor Joel Tropp from CalTech CMS gave a talk at Alg-ML on the quality of random matrices from fairly arbitrary distributions such that the image of the random projection matrix does not intersect the origin (nothing in the original dataset gets collapsed to zero). It turns out that computationally efficient random matrix distributions, like sparse Rademacher matrices, provably obey the same phase transition laws as Gaussian random matrices for the condition that the image does not intersect the nullspace, which allows for considerable computational speedups. Additionally, there are also provable bounds on the stability of such random matrix projections.

### Joel Tropp on Finding Structure With Randomness [pdf]

Professor Joel Tropp from CalTech CMS gave a talk on another approach to approximating low-rank SVD in a manner appropriate for very large matrices.

### Michael Jordan on Nonparametric Bayesian and Combinatorial Stochastic Processes [pdf]

Professor Michael Jordan gave a talk on applications and theory of nonparametric Bayesian statistics.

### Jacob Steinhardt on Unsupervised Risk Estimation with Only Structural Assumptions [pdf]

Jacob Steinhardt gives a talk on his recent work on provable methods for estimating the risk of unlabeled data using multiple views of the data.

### Yuanzhi Li on Weighted Low-Rank Matrix Approximation [pdf]

Yuanzhi Li gives a background review on various approaches towards weighted low-rank approximations.

### Zeyuan Allen Zhu on Linear Coupling Framework of Gradient and Mirror Descent [pdf]

Dr. Zeyuan Allen Zhu explains Nesterov’s accelerated gradient descent with a nice framework involving analysis of alternating gradient and mirror descent, and also shows that strong linear coupling approaches can also be applied to the analysis of other settings (for instance, coordinate gradient descent and even some nonconvex cases) to get robust improved convergence rates, in addition to intuitively explaining the opaque Nesterov acceleration.

### Naman Agarwal and Brian Bullins on LiSSA: A Linear Time Second-Order Stochastic Algorithm [pdf]

In this result, joint with Professor Elad Hazan at Princeton, the authors come up with a new algorithm based on a novel estimator of the inverse Hessian which allows for a theoretical linear time convergence rate for an algorithm which uses full second-order information. Empirically, the algorithm (called LiSSA) also performs well with respect to its competition.

### Andrew Barron on Computationally Feasible Greedy Algorithms for Neural Networks [pdf]

Professor Andrew Barron from Yale discussed ways approaches from the greedy algorithms literature with accompanying statistical risk bounds can be applied to bounding the expected error of single-layer neural networks.

### Samy Bengio on Neural Image Captioning [pdf]

Dr. Samy Bengio from Google Brain discusses Google’s Show-and-Tell paper on captioning images using a sequence-to-sequence deep learning approach.

### Zaid Harchaoui on Convolutional Kernel Neural Networks [pdf]

Professor Zaid Harchaoui from NYU’s Courant Institute discuss Convolutional Kernel Neural Nets, which can be trained in an unsupervised manner and manage to outperform well-known CNN architectures with a fraction of the data. The formalism takes advantage of kernels to define an approximate minimization problem, adding more insight into how to tackle the problem of explaining convolutional networks theoretically.

### Lester Mackey on Divide-and-Conquer Matrix Completion [pdf]

Professor Lester Mackey from Stanford talks on his recent work for the fast-parallelization of matrix completion under a novel divide and conquer framework.

### Gillat Kol on Interactive Information Theory [pdf]

Dr. Gillat Kol from IAS gives a talk on information theoretic measures and bounds on the compression of information transmission in the setting of interactive information theory, where no player knows the whole message being compressed.

### Sanjeev Arora on Reversible Deep Nets [pdf]

Professor Arora, Tengyu Ma, and Yingyu Liang discuss their results on giving a generative model for reversible deep nets. They also give a better training method that improves upon Dropout.

### Santosh Vempala on the Complexity of Detecting Planted Solutions [pdf]

Professor Santosh Vempala of Georgia Tech gave a talk on showing that for statistical algorithms (e.g. PCA, EM, and so on), solving planted clique and planted \(k\)-SAT is at least exponential time in the size of the input.

### Amir Ali Ahmadi on Optimizing over Nonnegative Polynomials [pdf]

Professor Amir Ali Ahmadi from Princeton spoke about formulating convex optimization problems in terms of finding nonnegative polynomials to provably optimize various problems in controls, dynamical systems, and machine learning, improving upon the time complexity of previous sum-of-squares SDP relaxations. He also describes robust dynamics optimization (RDO), a framework for solving optimization problems over a set of points defined by a dynamical system.

### Francisco Pereira on Decoding Generic Representations of fMRI [pdf]

Dr. Francisco Pereira from Siemens talked about his work using sentences and pictures to localize fMRI voxels for representing semantic content in brains.

### Elad Hazan on Simulated Annealing and Interior Point Methods [pdf]

Professor Elad Hazan gave a talk to Princeton’s Algorithm-ML reading group on a result demonstrating the existence of a universal barrier function for interior point methods in the membership oracle model of convex optimization. This barrier is related to the heuristic simulated annealing approach often taken in non-convex optimization.

### Dana Lahat on Joint Independent Subspace Analysis and Blind Source Separation [pdf]

This talk gives an overview of blind source separation with various statistical independence assumptions, generalizing ICA to learning subspaces of low-rank instead of just rank \(1\) subspaces.

### Barbara Engelhardt on Bayesian Structured Sparsity Using Gaussian Fields [pdf]

Professor Barbara Engelhardt gave a talk on her work on identifying associations between SNPs and phenotypes with sparse machine learning methods. She also spoke on how these methods can be translated to brain studies.

### Dimitris Bertsimas on Statistics and Machine Learning from a Modern Optimization Lens [pdf]

Professor Dimitris Bertsimas from MIT gave a talk on using the mixed integer-programming (MIO) framework as a new lens through which to view machine learning, statistics, and optimization problems.

### Sébastien Bubeck on Optimal Regret Bounds for the General Convex Multi-Armed Bandit Setting [pdf]

Dr. Sébastien Bubeck from Microsoft Research gave a talk to the Alg-ML reading group on his \(2015\) result on a tight minimax regret bound for the setting of general convex bandit optimization in dimensions greater than one.

### Han Liu on Nonparametric Graphical Models [pdf]

Professor Han Liu from Princeton gave an overview of his recent research and theoretical results on nonparametric graphical models.

### Mehryar Mohri on Deep Boosting [pdf]

Professor Mehryar Mohri from the Courant Institute speaks about ensemble boosting methods that take advantage of complex hypothesis classes along with the use of standard simple weak learners. He also presents an interpretation of boosting as truly being about model selection.

### Percy Liang on Learning Hidden Computational Processes [pdf]

Professor Percy Liang from Stanford discussed approaches to solving question-answering tasks on a new hand-built dataset.

### Young Kun Ko on the Hardness of Sparse PCA [pdf]

This talk summarizing two results on sparse principal components analysis was given to the Braverman Reading Group at Princeton.

### Ben Recht on Perceptron Learning and Stability

Professor Ben Recht from Berkeley discussed a notion of stability applied to stochastic gradient descent to explain why it reaches the same local optima in the nonconvex setting. Notes still to be added.

### Tom Griffiths on Rationality, Heuristics, and the Cost of Computation [pdf]

Professor Tom Griffiths from Berkeley discussed a notion of rationality which takes into account computation time as a resource. From his perspective, we can explain why some decision-making procedures which appear to be suboptimal are actually optimal from a sparse-resource respective.

### Anna Choromanska on Optimization for Large-Scale Machine Learning [pdf]

Dr. Anna Choromanska from the Courant Institute talks about new learning algorithms for decision trees and spin-glass interpretations of deep learning.

### Richard Socher on Dynamic Memory Networks [pdf]

Dr. Richard Socher’s last lecture for the Stanford class 224d (Deep Learning for NLP) was on a recent paper by his startup, Metamind. He spoke about using a novel deep learning architecture to solve question-and-answer problems, and also about how to generalize all of NLP to a question-and-answer framework with this kind of model.

## Classes

For now, I have individual class notes for some classes I’ve taken. Usually, I prioritize scribing classes for which there is no good book already existing. At some point in the future, I plan to put together big documents of notes parametrized by subject in one big document.

### MAT 529: Metric Embeddings [pdf]

Holden Lee (see links at the bottom of the page) and I are collaborating on notes for this class. As Spring 2016 continues, they will continue to be updated and hosted at Holden’s website. The source is available on github . After the notes are complete, I will post a final pdf on this website as well.

### APC 529: Coding Theory and Random Graphs [pdf]

These are my notes on the random graphs portion of APC 529, taught by Professor Emmanuel Abbe of Princeton. The topics include an introduction to random graphs, the Erdos-Renyi model, graph properties and phase transition phenomena (for giant component and connectivity), some spectral graph theory, and finally an introduction to recent work on the Stochastic Block Model (SBM).

### MAT 340: Applied Algebra [pdf]

Applied algebra covered the basic theorems of group theory with a section on representation theory at the end. These notes will be updated in the future with coverage of the Sylow theorems and finite simple groups.

### COS 511: Theoretical Machine Learning

I took scribe notes for most lectures of this class in its Spring 2015 iteration. Topics in the notes consist of an introduction to statistical learning theory, the Online Convex Optimization (OCO) framework, regularization, Bandit Convex Optimization (BCO), boosting, some game theory all from the point of view of OCO, and finally, an explicit connection between OCO and statistical learning theory in the form of theorems which convert regret analysis into ERM guarantees. There are also two guest lectures, one by Professor Jake Abernethy of University of Michigan, and one by Professor Sanjeev Arora of Princeton. A lot of the notes follow the excellent book by Professor Elad Hazan.

### COS 510: Programming Languages

Here are my scribe notes on the Curry-Howard Isomorphism.

### APC 486: Transmission and Compression of Information

Here are my scribe notes on probabilistic source models.

## Useful Links

This section holds links to other notes, class webpages, research groups, and other people’s webpages which I have found interesting and helpful.

### Holden Lee’s blog

I collaborate with Holden on some notes, and he has a lot more interesting material about math and machine learning on his website.

### Billy Fang’s website

In particular, his blog and notes are interesting and useful respectively.