Coursera Course On Data Science Tools: Can you really know a tool unless you have used it?

April 7, 2014
No Comment

By Bob Gourley

I believe every enterprise CTO develops a knack for asking hard questions about software, including its functionality, how to install/configure/maintain it and how well it will work with other enterprise capabilities. Enterprise CTOs are also quick to seek information on how well software works with identity management/policy management systems and how well it scales.  In most cases, these many things can theoretically be understood without being a user of the software, every enterprise techie I know at least likes to see a demo of a capability before making a decision. And in many cases a much deeper dive is required.

When it comes to analytical tools, I have seen so many demos I can’t remember them all. And in some cases I have spent time and energy to download software, install it on my local systems or on cloud based servers, and configure tools till I’m confident that I really understand what is going on. I know I’m not alone in doing this, most enterprise techies I know go through similar processes to stay current.

I have just started another effort meant to help me stay current. I signed up for a short Coursera course on the Data Scientist’s Toolbox.  The course is held online and is done in a way where students work to identify and classify data science problems and then use a wide variety of data science tools to address the challenge. Here is more from the course description:

In this course you will get an introduction to the main tools and ideas in the data scientist’s toolbox. The course gives an overview of the data, questions, and tools that data analysts and data scientists work with. There are two components to this course. The first is a conceptual introduction to the ideas behind turning data into actionable knowledge. The second is a practical introduction to the tools that will be used in the program like version control, markdown, git, GitHub, R, and RStudio.

Upon completion of this course you will be able to identify and classify data science problems. You will also have created your Github account, created your first repository, and pushed your first markdown file to your account.

The instructor for this class is Jeff Leek, assistant professor of Biostatistics at Johns Hopkins University. He is the co-editor of the Simply Statistics blog, which can also be found at Twitter at @SimplyStats.

I will be blogging about this coursework at our Analyst One site.  I would love to see you in the class too. Please check it out and join us if you can. Find out more at:  “The Data Scientist’s Toolbox.