Challenge: Having just moved to San Francisco, I needed to find a specialist car which was wheelchair accessible (my partner needs to be able to get in the back still on his wheels!) I was shocked at the prices at a local specialist dealership (where the cheapest, oldest cars start at $30k…)
- Predict how much a car should cost on the basis of characteristics you’re generally told when you’re buying (age, brand, engine size etc)
- Compare similar cars at the dealership and on Craig’s List to see how much of a mark-up there is.
- Build a searchable web app with a search function suited to searching for additional accessibility features.
The first step was getting the data on which to build the model. Having scraped the listings website of a local specialist dealership, I ended up with a list of c.600 cars, their price and their characteristics. Hmm, not enough data to do much validating and testing with. So I scraped all car and truck listings across the US on Craig’s List. This returned c.80k listings. Now we’re in business! It returned such beauties as…
So I restricted it to cars that were at least driveable, and iteratively added features to train my model, and tested whether it reduced how far off the mark I was. The most complex model I tested was a linear regression with polynomials of order 2 and interaction terms. The next stage is cross-validating my model. With just training and test datasets, I was at risk of learning too much from the test dataset and overfitting to it. Watch this space!