School of Data - Learn how to find, process, analyze and visualize data

billwilliams · on Dec 23, 2012

Before you become a professional data jerk (like myself) people tell you 80% of the job is data scraping, cleaning, organizing, piping etc. And the last 20% is the analysis and stats (or predictive whosiwhatsits). They're lying. Its 85% cleaning, organizing etc, 5% doing "real" stats and 10% convincing people you're not lying. This site seems to nicely outline many of the tasks that fall outside of the "fun" 5%.

disgruntledphd2 · on Dec 23, 2012

I love your breakdown, especially the 10%, so sadly true. But is it right? is the question I get asked a lot, and its a tough one to answer without getting into all the intricacies of what you actually did to the data to get it into a form where you could answer the question asked.

stfu · on Dec 22, 2012

Loving it. This is really easy and accessible stuff, and uses real world data and questions early on in the process. Perfect way to get people experience the subject and bring a bit more fun and inspiration to a otherwise not that exciting area.

mblake · on Dec 23, 2012

After quite a few years studying 'data', I can confidently say that you can take the mini-courses from School of Data, then plunge into Coursera courses, then stop by your local library, if you have money throw them at cherry-picked books from Amazon.com, bribe friends from college to get you papers from science journals and at the end of this, you will still find things you won't know.

Statistics, Probability, Data Analysis, Data Mining, Decision Theory, (Digital) Signal Analysis, Machine Learning, Algorithmics, Graph Theory etc.

af3 · on Dec 22, 2012

Undergrad class in Statistics will do a better job, I believe.

paulgb · on Dec 22, 2012

I think it really depends on what you're trying to do. I'm all for statistical rigour, and stats will help you with a nice structured dataset. A lot of the time though knowing how to scrape data and convert between formats opens opportunities that a pure statistician wouldn't have. Most of the interesting visualizations I've seen lately don't involve much stats at all.

dbecker · on Dec 23, 2012

Neither an conventional undergrad class nor this are "better." They just have different focuses.

But many (perhaps "most") intro stats classes don't involve any programming. So if you want to "implement" anything, an intro stats class may not get you there (even if it gives you a better foundation to understand what various statistical manipulations actually mean.)

DanBC · on Dec 22, 2012

Probably, but undergrad classes in Statistics are not available to everyone.

Did you notice any specific weak areas?

keithpeter · on Dec 22, 2012

I think this is aimed at an audience outside the academic system, and I like the focus on active involvement with data. I think there could be an activist underpinning here - Paulo Freire style data literacy. (http://www.infed.org/thinkers/et-freir.htm)

Having said that, my Maths teacher self wants to do some work on the glossary. In the spirit of 'code talks' I'll post some definitions up and link them to the issue tracker and see what happens...

maxvs · on Dec 22, 2012

>Probably, but undergrad classes in Statistics are not available to everyone.

If you have internet connection you can learn Statistics, there is a lot of good resources. For example:

https://www.coursera.org/course/stats1 (maybe not undergrad level, but good place to start)

http://ocw.mit.edu/courses/mathematics/18-05-introduction-to...

joshz · on Dec 23, 2012

Speaking of Coursera, Data Analysis[1] starts January - "applied statistics course focusing on data analysis".

[1] https://www.coursera.org/course/dataanalysis

d0gsbody · on Dec 22, 2012

Undergrad classes don't teach you how to scrape: http://schoolofdata.org/handbook/recipes/scraping/

af3 · on Dec 22, 2012

scraping is not related to data.

achompas · on Dec 22, 2012

Why not? Scraping is part of the data acquisition and cleanup process. You need to do it unless you're working with Bloomberg terminals or Census data.

keithpeter · on Dec 22, 2012

I agree. If I want to engage with my local government on a local issue (e.g. anti-social behaviour) I need data. The data is increasingly available on Web sites. Hence scraping and format conversion become important...

clicks · on Dec 22, 2012

Yep.

Just the other day I was thinking -- I end up losing in debates because I'm unable to cite data. Scraping and acquiring data is a key part of research, so I'm very much looking for a text that presents the big picture as well as the nitty gritty details from beginning to end.