A short demo on how to use IPython Notebook as a research notebook

As promised, here’s the IPython Notebook tutorial I mentioned in my introduction to IPython Notebook.

Downloading and installing IPython Notebook

You can download IPython Notebook with the majority of the other packages you’ll need in the Enthought Python distribution. There’s also a smaller free version if you don’t have a .edu email address. From there, it’s just a matter of running the installer, clicking Next and Accept buttons a bunch of times, and voila! IPython Notebook is installed.

Running IPython Notebook

For Mac and Linux users, open up your terminal. Windows users need to open up their Command Prompt. Change directories in the terminal (using the cd command) to the working directory where you want to store your IPython Notebook data.

To run IPython Notebook, enter the following command:

ipython notebook

It may take a minute or two to set itself up, but eventually IPython Notebook will open in your default web browser and should look something like this:

IPython Notebook

(NOTE: currently, IPython Notebook only supports Firefox and Chrome.)

Creating a new notebook

Conveniently, Titus Brown has already posted a quick demo on YouTube. (Start at 2m16s.)



Now that we’ve covered the basics, let’s get into how to actually use all this as a research notebook.

Using IPython Notebook as a research notebook

The great part about the seamless integration of text and code in IPython Notebook is that it’s entirely conducive to the “form hypothesis – test hypothesis – evaluate data – form conclusion from data – repeat” process that we all follow (purposely or not) in science. For this example, let’s say we’re studying an Artificial Life swarm system and the effects of various environmental parameters on the swarm.

Here’s the example research notebook: [pdf] [[ipynb w/ accompanying files]

I designed this demo research notebook to be a self-guided tour through the thought process of a researcher as he works on a research project, so hopefully it’s helpful to other researchers out there.

Statistics in IPython Notebook

UPDATE (10/19/2012): Please refer to my other blog post for an up-to-date guide on statistics in Python.

For those of you who (understandably) don’t want to search through an entire research notebook to figure out how to do statistics in IPython Notebook, here’s the cut and dry code.

Reading data
# Library for reading and parsing csv files
import csv

# My personal library that contains some useful helper functions
import rso_stats

# Read and parse data for file "control1.csv"
control1 = csv.reader(open('control1.csv', 'rb'), delimiter=',')
control1, control1_columns = rso_stats.parse_csv_data(control1)

control1 is the dictionary of parsed data

control1_columns is the list of column names used to access the data dictionary, sorted in the same order as the csv data file.

NOTE: This uses a function from my custom Python library, which parses the data into convenient data dictionaries.

The data in the dictionaries can be accessed by:

# Access the first column's list of data
control1[control1_columns[0]]

# Access the fourth column's list of data
control1[control1_columns[3]]
Standard error of the mean
import scipy
from scipy import stats

mean = scipy.mean(dataset_list)

# Compute 2 standard errors of the mean of the values in data_list
stderr = 2.0 * stats.sem(dataset_list)
Bootstrapped 95% confidence intervals

The code below shows you how to compute bootstrapped 95% CIs for the mean. However, this function can bootstrap any range of CIs for any statistical function (mean, mode, standard deviation, etc.). Here’s the input parameter description:

Input parameters:
   data        = data to get bootstrapped CIs for
   statfun     = function to compute CIs over (usually, mean)
   alpha       = size of CIs (0.05 --> 95% CIs). default = 0.05
   n_samples   = # of bootstrap populations to construct. default = 10,000

Returns:
   bootstrapped confidence intervals, formatted for the matplotlib errorbar() function
import scipy
import rso_stats

CIs = rso_stats.ci_errorbar(dataset_list, scipy.mean)

NOTE: This uses a couple functions from my custom Python library, since bootstrapping CIs isn’t currently supported by SciPy/NumPy.

Mann-Whitney-Wilcoxon RankSum test
from scipy import stats

z_stat, p_val = stats.ranksums(dataset1_list, dataset2_list)
Analysis of variance (ANOVA)

SciPy’s ANOVA function takes two or more dataset lists as its input parameters.

from scipy import stats

f_val, p_val = stats.f_oneway(dataset1_list, dataset2_list, dataset3_list, ...)

Hopefully everyone finds this useful. Get in touch if you have any more ideas on IPython Notebook as a research notebook, or if you’d like to figure out how to do some more statistical tests in Python.

Randy is a PhD candidate in Michigan State University's Computer Science program. As a member of Dr. Chris Adami's research lab, he studies biologically-inspired artificial intelligence and evolutionary processes.

Posted in ipython, productivity, statistics, tutorial Tagged with: , , , , , , , , , , , , ,
9 comments on “A short demo on how to use IPython Notebook as a research notebook
  1. Thomas Kluyver says:

    A couple more libraries you might be interested in:

    Pandas (http://pandas.pydata.org/) provides data structures for things like tables of data, and loads of tools to manipulate them. I had my own module to read/write csv tables until I found this.

    Statsmodels (http://statsmodels.sourceforge.net/stable/) has a load of stats tools, although I don’t find the interface very easy.

    Thanks for the post – I’m also trying to do stats in Python.

  2. Thomas Kluyver says:

    There’s a fair bit of stuff that we were taught how to do in R that I don’t know how to do in Python (>=two way ANOVA, mixed effects models, and so on). I suspect most of the framework is there to do that sort of thing, but I don’t know enough of the nuts and bolts to work out what I need.

    I’ve also come across rpy2. It’s a bit more than just running an R command, it can translate Python objects so you can call R functions on them. The next version of pandas will be able to translate a DataFrame into an R data.frame, which will be very useful.

  3. I found your demo useful, I have never used python notebook and been living with python as is. The notebook option looks very convenient and that you can use it as a log with log notes included together with code. I will test this myself later on.

6 Pings/Trackbacks for "A short demo on how to use IPython Notebook as a research notebook"
  1. [...] a recommendation in my previous blog post, I decided to follow up and write a short how-to on how to use pandas to process data from multiple [...]

  2. [...] A short demo on how to use IPython Notebook as a research notebook [...]

  3. [...] Also, the following two tutorials from his student are helpful – here and here. [...]

  4. [...] comes along with the IPython package, so just follow my IPython tutorial to install IPython. Beyond IPython, all you need is Python’s rpy2 package to run [...]

  5. Homepage says:

    … [Trackback]…

    [...] Find More Informations here: randalolson.com/2012/05/12/a-short-demo-on-how-to-use-ipython-notebook-as-a-research-notebook/ [...]…

About this blog

The data visualizations on this blog are the result of my “data tinkering” hobby, where I tackle a new data analysis problem every week. If I find something interesting, I report my findings here to share with the world.

If you like the work in this blog, I'm currently available for hire as a freelancer. Send me an email if you'd like to discuss freelance work.

If you would like to use one of my graphs on your website or in a publication, please email me. Donations to keep the site running ad-free are greatly appreciated, but never required.

Archives

Enter your email address to subscribe to this blog and receive notifications of new posts by email.