Fall 2022: High-dimensional Probability for Machine
Learning
Instructor: Prof Namrata Vaswani
contact: namrata@iastate.edu , 3021 Coover Hall, 4-4012
Time: 12:40-2pm Tues-Thurs
Location: Carver 0204
Overview: This
course will teach all of the following as well as teach topics from my Machine Learning: A Signal Proc Perspective
course. That course is EE425 so it goes slower. This course will teach that
material and teach the below material in one semester.
This course will teach (a) key topics from non-asymptotic
random matrix theory (bounds on minimum and maximum singular values of many
classes of high-dimensional random matrices, as well as on sums of a large
number of random matrices) as well as (b) other high-dimensional probability
topics like Chaining that are commonly used in Foundational Data Science and
Machine Learning research. It will conclude with a discussion of a few recent
papers that use the material, these papers can change from year to year. Most
of the material involves heavy use of linear algebra, and only basic
probability concepts. The goal will be to enable students to understand or
develop provably accurate and fast algorithms for practically relevant ML
problems.
Prerequisites: MATH 510 or 507
(or equivalent); EE322 or EE 523 or STAT 542 (or equivalent)
(A strong grasp of EE/STAT 322 and MATH 317 concepts may suffice too.)
Motivation for learning this material: See
Fig 3.6 on page 53 of https://www.math.uci.edu/~rvershyn/papers/HDP-book/HDP-book.pdf:
in 2D, most points of a standard Gaussian point cloud lie inside and on a
circle; while in n-D, most points lie ONLY on the surface of a hyper-sphere of
radius \sqrt{n}.
Textbook:
1. My ML notes: link tbd
2. High Dimensional Probability for Data Science, by RomanVershynin, download here https://www.math.uci.edu/~rvershyn/papers/HDP-book/HDP-book.html#
Syllabus,
grading and introduction:
see this file
Grading based on: (1) Theory
home-works below or Python coding home-works; (2) Mid-term exam or small project;
(3) Final exam or second project; (4) Class participation
Accommodation: Please email me and I can provide most reasonable
accommodations, or call 515-294-7220. More details in the Syllabus file above
Class notes
o
Folder with updated class
notes
o
ML
algorithms file: https://www.dropbox.com/scl/fi/sw41wn8i9u4pc4aszg3g8/ML_algorithms.pdf?rlkey=potcspucrmrip2kocwsfdsabm&dl=0
o
Slides
on first few chapters of Vershynin book:
https://www.dropbox.com/scl/fo/a4t4yldr8n3rd7pcchrll/h?rlkey=skfrbpqd4sj8y92tjwjij55nu&dl=0
§
Old
version also in: https://drive.google.com/drive/u/0/folders/1WAa5eK9J-93T5V_SWMxQWtMJ3UDsF__H
o
Introduction
o
Introduction
and brief discussion of applications: first few slides of file
o
Probability background and
recap
o
Recap
of EE 322: file Dropbox
link (Dropbox link is the latest
version, but the link may get broken during editing sometimes)
o
Probability
background: file Dropbox
link
o
Week
1, 2
o
Concentration bounds for
different types of scalar r.v.s: Chap 2
o
Scalar
Bernoulli, general bounded, sub-gaussian and sub-exponential random variables
summary: file , Dropbox
link
o
Detailed
derivations: handwritten
§
Detailed
derivations of equivalent definitions and properties for subGaussian
rvs
§
Detailed
derivations of equivalent definitions and properties for sub-expo rvs
o
Application
in boosting randomized algorithms, and analyzing degrees of random graphs: also
see book Chap 2 and homework
o
Week
2, 3, 4
o
Linear algebra review and
preliminaries
o
Linear
algebra review and preliminaries: file,
Dropbox
link
o
Week
5
o
High dimensional random
vectors and matrices, use of epsilon-nets: Chap 3 (3.1-3.4), Chap 4 (4.1 - 4.7)
o
High-dimensional
random vectors and matrices, use of epsilon-nets: file, Dropbox
link
o
Handwritten
notes:
§
Chap
3 detailed hand-written notes
o
Week
o
Quadratic forms, Symmetrization, Chap 6, 6.1 – 6.6
o
Quadratic
forms and Symmetrization: file Dropbox
link
o
Concentration without
independence
o
Sec 5.1, 5.2:
o
Proof of Matrix Bernstein
inequality, Sec 5.4:
o
Parts
of Chap 5, TBD – term papers
o
Random processes and Chaining
o
Short
summary: file Dropbox
link
o
Parts
of Chap 7, 8
o
Applications of Chaps 1-4
o
Low
rank and sparse recovery problems, AltMin, GD
algorithms: file
§
Also
contains a detailed proof of application to LRPR problem (my work)
o
Application
to PCA and community detection in networks: handwritten
o
More:
TBD
Old offering of a part of
this course as EE 520 in Fall 2019: https://www.ece.iastate.edu/~namrata/MachineLearning_class/
Home-works
o
Policies:
o
Okay
to use any source - books, internet, friends, cite what source or webpage or
classmate that helped you.
o
Need
carefully Latexed solutions (do not just copy what
you find online or what friend tells you)
o
HW
3, 4: can be handwritten. I would suggest Latexing
the ones with long solutions but not required.
o
A subset of problems will be
graded (to be decided after submission date)
o
Homework 1: due Tuesday Sept
13
o
1.2.2,
1.2.3
o
2.2.8,
2.2.9
o
Homework 2: due Thursday Sept
22
o
2.4.2,
2.4.3, 2.4.4
o
2.5.10,
2.6.5, 2.6.6, Show that a sub-Gaussian r.v. is
sub-exponential.
o
Extra
credit: 2.5.5, 2.5.11, 2.6.9
§
Hints
for extra credit: 2.5.5,
2.5.11
o
Solutions:
see Chap 2 slides
o
Homework 3: due Monday Oct 31
o
3.1.5, 3.4.3, find an example where the claim of Ex 3.4.5 is
true,
o
Quantify Remark 3.2.5 (extra credit)
o
4.4.3, 4.4.4, 4.1.6
o
Homework 4: due end semester
o
Community detection in networks: explore further beyond what is
covered in Sec 4.5 of the book. This can be a little group project for the
entire class. I will make a Discord channel for discussions on this. a.
Depending on each person’s interests, this exploration can be one or all of i.
more general community models, ii. better algorithms that do not require q to
be large to work iii. theoretical results for a and b and intuition on why the
proof goes through, along with refs for the proofs.\
o
Chapter 6: re-write (by hand or type) proofs of Sec 6.1 and 6.2:
o
Old from 2021
o
Ex
4.4.3 (this proof will need proof of 4.4.2)
o
Ex
4.4.4
o
Ex
4.1.6 and write out proof of Lemma 4.1.5
o
Ex
4.7.3
o
Lower
bound sigma_min(A) by using epsilon-net argument
(this will use upper bound on ||A||)
o
3.1.5,
o
3.4.3,
o
Quantify
Remark 3.2.5
o
Find
an example where the claim of Ex 3.4.5 is true
o
Extra credit: 3.3.1, 3.3.7
o
(can hand-write or Latex but
write clearly;
since you are allowed to use any source
including friends or internet to find solution idea, I would like to see effort
in writing a clear-cut proof)
Term
Paper and Scribe topics
-
See
Discord channels on each.