Professor Weiqing Gu Harvey Mudd College Fall 2016
M_ 06:30-09:15PM. SHAN B460 (Lecture) Tr 06:30-08:00PM. SHAN B460 (Optional Section)
This is a course in how to utilize data: infer, predict, coerce, and classify. We will cover a large breadth of material, spanning supervised and unsupervised learning, recommender systems, and Bayesian modelling, to a high level of mathematical rigor. Upon successful completion of the course, students should be fully equipped to enter industry as a data scientist, read active research in the field of Machine Learning, and approach huge (data and otherwise) problems seen in the real world. Additionally, another goal of this course is to become comfortable using Amazon Web Services and GitHub as these tools are extremely prevalent in industry and academia when developing and deploying models. To that end, all code for homework and your final project will be hosted on GitHub.
There will be mandatory Monday lectures with readings to be completed before class (detailed below). A section held each Thursday will either review prerequisite material, go over supplementary material, or investigate an interesting application of our coursework. They will either be taught by the instructor or a teaching assistant. For the review sections (the first two) attendance is recommended to anyone who (for the Linear Algebra review) doesn't know what any of {Cholesky Decomposition, SVD, inner product, outer product} are, or (for the Probability review) doesn't know what any of {Bayes' Rule, binomial distribution, Bernoulli distribution, multinomial distribution, Poisson distribution, Gaussian distribution, covariance} are. Notes will be posted prior to the meetings so just check those out before and see if you feel comfortable with the material. We expect around half of the students to be comfortable with more than 75% of the linear algebra and probability we will be using in the course. This is fine! Just come to the For sections after the review sections (ie. Convex Optimization Overview, Sparsity; SVM Training), attendance is again not required but highly (!) recommended. Sections are designed to give you inspiration and insight into your final project, and shed light on the material in a new way.
Murphy, Kevin P. Machine learning: a probabilistic perspective. MIT press, 2012.
All readings are compulsory, but some are more compulsory than others. To encourage the goal of reading active research in the field, half-page reading summaries for all non-Murphy readings will be due at the beginning of class. They must be legible, and demonstrate that you have read the paper with a high degree of confidence. Credit will be given on a {0-10} scale for each summary. Your summaries should be written at a high level, and should focus on the main point of the readings (ie. avoid complicated math). As long as your summary is reasonable, you will be given full credit.
For coding: feel free to use any of {R, Julia, Python, Matlab}. If you want to use something not on that list, just ask us with a good reason and we'll probably say yes. The homework is split approximately evenly between mathematical analysis and extension of our course material and application of algorithms to real world data.
The midterm will be 3 hour take-home exam covering all topics seen until October 10 (inclusive). You will receive the exam in class on Oct. 10 (the Monday before break) and have until Oct. 24 (the Monday after break) to complete it. More detailed instructions will be given on the exam, but you are to turn the exam into the designated box outside Prof. Gu's office immediately after completion, or, if she is not there, under the door. The hard cut-off for handing the exam in is at the beginning of class on Oct. 24.
This is by far the largest component of the course. You will discover, explore, and attack a real world problem of your choosing. There are 3 types of projects you can work on, shown below in order of increasing difficulty.
Only one copy of each item need be turned in per group.
Students who need disability-related accommodations are encouraged to discuss this with the instructor as soon as possible.
Name | Email (@hmc.edu) |
---|---|
Conner DiPaolo (head grader/tutor) | cdipaolo |
Paul David (CGU Graduate Student) | paul.david (@cgu.edu) |
Kathryn Dover | kdover |
Zoe Tucker | ztucker |
Ricky Pan | rpan |
Natchanon Suaysom | nsuaysom |
Mek Jenrungrot | mjenrungrot |
Herrick Fang | hfang |
Bo Zhang | bzhang |