Data Science meets life: finding a car

Challenge: Having just moved to San Francisco, I needed to find a specialist car which was wheelchair accessible (my partner needs to be able to get in the back still on his wheels!) I was shocked at the prices at a local specialist dealership (where the cheapest, oldest cars start at $30k…)

Solution:

Predict how much a car should cost on the basis of characteristics you’re generally told when you’re buying (age, brand, engine size etc)
Compare similar cars at the dealership and on Craig’s List to see how much of a mark-up there is.
Build a searchable web app with a search function suited to searching for additional accessibility features.

The first step was getting the data on which to build the model. Having scraped the listings website of a local specialist dealership, I ended up with a list of c.600 cars, their price and their characteristics. Hmm, not enough data to do much validating and testing with. So I scraped all car and truck listings across the US on Craig’s List. This returned c.80k listings. Now we’re in business! It returned such beauties as…

old car

So I restricted it to cars that were at least driveable, and iteratively added features to train my model, and tested whether it reduced how far off the mark I was. The most complex model I tested was a linear regression with polynomials of order 2 and interaction terms. The next stage is cross-validating my model. With just training and test datasets, I was at risk of learning too much from the test dataset and overfitting to it. Watch this space!

See a technical write up of my progress so far here and my presentation of the project here.

Vicky Clayton

Data Science meets life: finding a car

Leave a comment Cancel reply

Related

Leave a comment Cancel reply