3yrs into Machine Learning as a PhD student

[linkstandalone]

Machine Learning from a PhD student perspective

Introduction

So it has been almost 3 years and a half since I started my PhD in Autonomous Underwater Vehicles (AUVs), where I'm using Machine Learning, Deep Learning and Computer Vision to create a framework for autonomous underwater intervention. Puff, this was relatively quick to describe 3 and 1/2 years in one sentence. Anyway, to the point. So, what I've learned during those years, and how easy (or maybe not) it's to get into the magic world of AI? Well, I realised that it's not easy at all. Getting into ML is not an easy task and needs to have a lot of patience.

Learning the concepts of Machine Learning, Computer Vision, and Deep Learning is something that initially I thought could be learnt in a year or two. After all, I knew most of the mathematical concepts needed for ML since I was in a STEM field for the past four years. I had relatively advanced knowledge of Algebra, Matrix Algebra, I was reasonably fluent in calculus, etc. I was good at problem-solving through my years in university, but a professional marine engineer for more than 10 years.

I learned those 2 years that there is no such thing as "I know Machine Learning". This is because fields such as Machine Learning rapidly evolve, particularly in the last years, making it even harder to learn the entire area (if this is possible) even after many years in the field.

So, in this article, I would like to give the experience I had over the past 3 years, mainly studying Machine Learning for my PhD research project. This point of view will definitely be biased towards the more academic point of view since I lack the experience that someone has working in the industry.

First, I'll go through the road I followed from the beginning until now. Then I'll spend some time on what I did right and wrong. Last, I want to give some of my thoughts on whether the journey worth or not.


My roadmap to ML

We leave indeed in extraordinary times in terms of information and availability of learning material for almost anything we want to learn. Of course with so many options ready available, and most of the time free, to us it's difficult to choose and get the work done. And the vast variety of material, training videos and courses was the enemy for me, because I was so exited about learning so many interesting things about Machine Learning, Deep Learning, Computer Vision, about the advances in Autonomous Underwater Vehicles (AUVs) and many more, were I realised that I'm not productive at all learning concepts that were irrelevant to my cause.

Realising the problem was a catalyst to sit down and plan what I needed, at the time, to learn and at what concepts I should be focusing on. So, my Machine Learning study plan included the following:

Programming and Python

Programming for me was on top of my priorities because I had no previous xperience, though I did some assignments during my Bachelor’s and Master’s degrees using mainly Matlab and to some extent Python. 
Of course for someone that starts to learn ML the language of choice, in most cases is Python. I don’t like to stress how I learned Python or what resources I used here (I might go my experience learning Python in another writing), rather give a quick overview. So, I started to learn Python by doing what everybody else does, watching youtube videos. Soon I realised that watching videos and doing tutorials won't help me to get into programming. So I started to learn the basics by developing small programs that ultimately will be useful to my PhD project. Such programs helped me to automate folder and file manipulation, image pre-processing, object extraction from annotated images, web scraping to download underwater images and videos, frame extraction from underwater videos etc. The above "side projects" wasn't the most efficient or the most elegant in terms of construction, but helped me not to fall into the loop of spending my time watching videos, pretend that I understand what I was watching and at the end not know to how to loop over a list. Programming my own staff helped me to understand that:

Machine Learning online material

For maths I wanted to get up to date as quickly as possible, since I already knew most of the material I needed, and for this reason I mostly used online materials. Some excellent resources include:

Additionally to the online materials I used some math text books that I found interested to study:

Online Courses

Projects

I started to do the projects that the courses provided, and then once I was confident enough I applied what I knew to my own project.

Application to my project

What I did right

Sometimes I think that I could have spent my time and energy more efficient when studying Machine Learning. But in a broader view, I did learn the concepts in the correct order. I started with the mathematics required to get confident when using ML, and at the same time studied Python and Programming, then started to study machine learning, next followed Deep Learning and Computer vision. Finally, I used the concepts to work on my PhD project. To give the above studying approach on a time scale, the time I devote to each of these concepts is as follows:

  1. Mathematics - The first step was to revise mathematics that I already knew and study the concepts that I was less familiar with, such as statistics. So, I spend a good month revising Linear Algebra, Matrix Algebra and calculus, and another month studying statistics. (2 months)
  2. Programming (Python) - I would be lying if I said that I have studied programming and particularly Python for the past 3 years, but I don't exaggerate about that. Ok, I didn't spend the last 3 years solely studying programming and Python, but I definitely constantly practising and trying to develop my skills (3 years).
  3. Machine Learning - Again, same with the programming I am constantly study and researching about ML, DL and CV. The field is so rapidly evolving that need to be alert about the advances happening. It's quite difficult to keep up but fortunately there is a solution for that. Some time ago I came across with an amazing tool, the Research rabbit platform and it saved my day. Research rabbit is an online platform that can sync the research tools such as Zootero and Mendelay, and visually represent the papers connected to the paper we are working with, and all the papers that referencing the one we are working with.

What I did wrong

When you start to learn something new will never think that the way you do the preparation, the research and in general the path you follow might be a wrong. And even if you do know that might need to do things in a different way, at the time you will realize it, you have already spent a good amount of time. So, for me there isn't something that I wish I had do it differently, except the fact that in some cases I spent much more time that really needed in topics that wasn't so important, or at least not important to my project anyway. For example, I remember spending a fair amount of time revising Linear Algebra (LA) and Matrices when I started with ML. This of course isn't so bad, but I think that I could use that time more wisely, so to spend 1/3 of the time in LA revision and the rest of the time to implement that knowledge in Python.

Final thoughts

As my PhD studies are inevitably approaching the end I had the urge to go through where I was and what I've done so far. This writing is more like a clearing my thoughts about the journey I decided to take almost 3 years ago, and I realised that I enjoy to learn Programming, research about Machine Learning, and do Research.