Week 2 started off with a pair programming challenge on HTML to ease us into web scraping. Web scraping has been my favourite part of the bootcamp so far – it’s so empowering to be able to turn something you come across every day into data you can use. This allows you to “peer under the hood” a bit at websites you use every day. For example, one of my peers thought it was odd that products with low rankings and a low number of reviews could make it onto the first page of an Amazon search. Being on the first page is a boon to sales as customers rarely bother to go beyond the first few search pages so it seemed consumer welfare was taking a hit with the current arrangement. Being able to take a stab at answering such questions about large organisations that are incredibly protective about their data (even if you have not as much success as you’d like in answering such questions) is pretty cool!
We spent most of the Week 2 and all of Week 3 on linear regression and its interpretation, using it as a first step into setting up a pipeline of scraping data, modelling a continuous variable, validating the model using various metrics (R squared, RMSE, MAE) and then testing it against holdout data. I really enjoyed a pair programming exercise where we effectively conducted gradient descent manually before learning about it theoretically – I thought it was a really good way to develop our intuition around it. I also really enjoyed better understanding regularisation through looking at it geometrically – deliberately limiting the space in which we allow the optimisation to occur to avoid making the model “too optimal” and overfitting.
We also learnt a bit about hypothesis testing, and I was keen to emphasise when you can infer a causal relationship – I didn’t leave my Economist training at the door!
The conclusion of these two weeks was a project scraping data from a website and then modelling a problem using linear regression and this data. You’ll find the write-up to my project here.