To get myself educated about this exciting field, I have started ‘cramming’ the fundamentals in the field. Although there are many cool and complicated tools in libraries and I can use them in a plug-and-go fashion, I do not find this approach satisfying. I want to know why things work.

Personally, I believe that clear understanding of the fundamentals will be powerful in the long run. One indicator is to be able to explain things intuitively, as in the Feynman Technique. This belief determines the goal and strategy of my learning.

A great thing about machine learning is that there are so many great materials online and there is an active community. Because the aim is to understand the fundamentals and I enjoy learning by doing, I have found the following way for the first round:

  1. Learn the theory.
  2. Come up with a toy example to test things out, poke around, and test the understanding.
  3. If available, try a more realistic and complex data set.

Learning materials

Prior knowledge

  • I took the machine learning MOOC by Andrew Ng 4 years ago ( I wish I had known the importance of fundamentals and intuitions at that time.)
  • Linear algebra is really useful and my experience is mainly from studying dynamical system theory.
  • Some probabilities and stats.
  • Machine learning: I have learnt a little about deep learning (ANN, CNN) and Bayesian networks. Had research experience with compressed sensing. Very familiar with PCA.

Week 0

  • Chapters 1 & 2 of Bishop (2006): Pattern Recognition and Machine Learning
    • The book is very well-written but I do not find it a great material for ‘cramming’.
  • Udacity: Intro into Data Science
    • The Ucacity course gave me some practice with numpy and pandas and helped me transit from using MatLab.

Week 1

  • Stanford CS229 Machine Learning
    • I mainly follow this Stanford Course because it was cleverly designed to introduce students to the field, so it means rigor and structure. (Also there are problem sheets to work through.) The other benefits is to get an motivation.
  • Bishop (2006)
    • Sometimes, using multiple materials can be confusing because of the different notations. I refer to Bishop (2006) to get a second perspective and answer some questions based on the lecture notes.
    • The book is also a source of linking the concepts together in a bigger picture.
  • Googling
    • There are lots of videos, python notebooks, and tutorials online. The Scikit learn websites have many useful examples that are lightweight and insightful.

Results summary

Theory notes:

Python notebooks

  1. Logistic regression
    • Basic gradient descent implementation
    • Insightful example about the importance of ‘normalisation’ such as that used in SVM
  2. Naive Bayes
    • Toy example to check understanding
  3. SVM
    • Simple example to illustrate high-dimensional space using 2nd-order polynomial.
    • Visualise decision boundaries using grid.
  4. k-Nearest neighbours
    • Example from CS229 on the importance of vectorisation
    • Time saving is astonishing!
  5. 2 layer neural network with logistic regression
    • Stochastic and vectorised implementation
    • Numerical gradient verification
    • MNIST test
  6. 2 layer neural network with soft-max regression
    • MNIST test (the accuracy is not satisfactory).

Other interesting things

  • Highly recommend the Talking Machines podcast! I just learnt the EM before stubling upon an old episode on it. The explanation was consice and insightful. Also I learnt that people are using linear dynamical systems (LDS) in machine learning, such as in combination with hidden Markov Model. The example Ryan talked about echoes with my idea of using LDS with robotics.
  • The Microsoft Research podcast is also really interesting. The breath of research and impact of the ‘grand goal’ is fascinating.
  • Finished the book Weapons of maths destruction recently. The book promotes building fairness, transparency, etc., into the algorithms. It explains the ideas behind the recent headline stories (Facebook and Cambridge Analytica*) and many more. *This company has nothing to do with the University of Cambridge!

Other thoughts

  • Began to understand why neural networks are so versatile.
  • Vectorisation is extremely powerful! (Although the vectorised version of the 2-layer NN looks similar to the stochastic one, it took me a whole night but I find it worthwhile, for the happiness when it works :))
  • I thought that we only need the right cost function and smart implementation of the optimisation. They are important. However, this two weeks of learning revealed a more insightful picture from the Bayesian perspective. It was an epiphany when I read about least square cost function being the maximum likelihood solution with Gaussian noise (Bishop (2006) §1.2.5).

Some ideas

  • Visualisation of NN and the potential application of sparsity in NN.
  • Using SVM for the low-dimensional representation of the Dog-Cat classification. See if it improve classification using SPOCC vectors.