IPython Notebook: Finally, the research notebook I’ve always been looking for is here!

I attended a Software Carpentry workshop hosted by Titus Brown and Greg Wilson this week and was introduced to, among many other things, a piece of software that I’ve been looking for ever since I started my graduate program: IPython Notebook. It can easily be installed with the majority of the other packages you’ll need in the Anaconda Python distribution.

I do the majority of my post-experiment data analysis in Python nowadays, since it’s one of the few sanely-designed scripting languages out there with all the functionality I need. What I’ve been missing is a seamless user interface where I can both take notes about my research and perform my data analysis in the same location. IPython Notebook finally provides that.

Ever since I announced my conversion from RTF files to IPython Notebook as my primary means of taking research notes, I’ve received a lot of flack about how Python doesn’t support advanced statistical tests, such as bootstrapping confidence intervals, Mann-Whitney Wilcoxon RankSum tests, and ANOVA tests. After a day of searching with my lab mates, I finally turned up all the libraries I need:

If you can’t find a Python library for a statistical test you need, post here and we’ll try to find it. The IPython notebook has plenty of uses beyond a research notebook, too. For example, Titus Brown recently posted the IPython notebook that he used to generate all of the graphs in one of his recent papers. Imagine the implications for science if scientists actually start showing the code they used to generate their graphs! (No more hiding that outlier point on the side of the graph…)

I’ll be putting up some tutorials and examples of how to use IPython Notebook for exploratory statistical data analysis soon, so stay posted!

Dr. Randy Olson is the Chief Data Scientist at FOXO Bioscience, where he is bringing advanced data science and machine learning technology to the life insurance industry.

Tagged with: , , ,
6 comments on “IPython Notebook: Finally, the research notebook I’ve always been looking for is here!
  1. Min RK says:

    Note that it is ‘IPython’, an abbreviation for ‘Interactive Python’, not ‘iPython’.

  2. Thanks for the kind words, Randal! I demoed the notebook two weeks ago during my visit to MSU (at Titus’ invitation) where I gave a couple of talks both on the entire IPython project and on its parallel computing capabilities. Sorry I missed you, but I’m very happy to see this kind of hands-on response from users!

    I just wanted to let you know that for statistical machinery, in addition to the basics contained in scipy.stats, you’d probably find both Pandas and Statsmodels quite useful. The provide a fair amount of tools for data analysis and statistics, and both projects are very open to new contributors.

    • Randy Olson says:

      Pandas looks right up my alley. Thank you for pointing me to it. Now if we can just get bootstrapping merged into scipy.stats, I could bury my custom library. 🙂

      Have you looked into rpy2? It provides a direct interface with the R libraries. A couple colleagues and I have it loading, running stats on, and plotting data in IPython Notebook, but the high-level interface is a little wonky. On the plus side, the low-level interface is really straightforward: rpy2.robjects.r(“any R command”).

      Thank you so much for making such a great tool for scientific computing. I’m not afraid to say that IPython Notebook has significantly changed how I do my research.

      I hope we have another chance to meet soon!

      P.S. Do you have a preferred medium for feedback about IPython Notebook?

      • Hi Randal,

        Yes, we’ve looked at rpy2: Jonathan Taylor, a friend from the stats dept at Stanford, just coded up the functionality to embed R in whole cells cleanly. I’m right in the middle of finishing up the syntactic support for that, and once we get it merged (a week or so, I hope), you’ll be able to type in one cell:

        %%R –inputs=X,Y –outputs=r
        … etc: rest of R code here

        and it will run all your R code nicely, using your python variables X and Y, and leaving you with an output variable ‘r’.

        So give it a few weeks, and we’ll have solid R integration.

        As for feedback, yes: ideally we have our discussions on our development mailing list. Bug/code-specific conversations tend to happen on the corresponding ticket or Pull Request on github, but for the ‘big picture’ discussions, the -dev list is the right venue. I only caught your blog by accident via a tweet of Titus’, but since I use twitter very rarely, that’s not a reliable channel in general.

        • Randy Olson says:

          That sounds perfect. I think that’ll pretty much erase the final item on the list of “reasons not to use Python/IPython Notebook for stats” that I’ve heard from my colleagues.

          Looking forward to the release. Perhaps I’ll put together another hands-on demo for the R integration in a few weeks, then.

3 Pings/Trackbacks for "IPython Notebook: Finally, the research notebook I’ve always been looking for is here!"
  1. […] As promised, here is the iPython Notebook tutorial I mentioned in my introduction to iPython Notebook. […]

  2. […] the following two tutorials from his student are helpful – here and […]

  3. […] May 2012, I started this blog to rave about the IPython Notebook, a new scientific computing tool that’s still an integral part of my research workflow today. […]