I attended a Software Carpentry workshop hosted by Titus Brown and Greg Wilson this week and was introduced to, among many other things, a piece of software that I’ve been looking for ever since I started my graduate program: IPython Notebook. It can easily be installed with the majority of the other packages you’ll need in the Anaconda Python distribution.
I do the majority of my post-experiment data analysis in Python nowadays, since it’s one of the few sanely-designed scripting languages out there with all the functionality I need. What I’ve been missing is a seamless user interface where I can both take notes about my research and perform my data analysis in the same location. IPython Notebook finally provides that.
Ever since I announced my conversion from RTF files to IPython Notebook as my primary means of taking research notes, I’ve received a lot of flack about how Python doesn’t support advanced statistical tests, such as bootstrapping confidence intervals, Mann-Whitney Wilcoxon RankSum tests, and ANOVA tests. After a day of searching with my lab mates, I finally turned up all the libraries I need:
- Bootstrapped confidence intervals: https://pypi.python.org/pypi/scikits.bootstrap
- MWW RankSum test: http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ranksums.html
- ANOVA: http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.f_oneway.html
If you can’t find a Python library for a statistical test you need, post here and we’ll try to find it. The IPython notebook has plenty of uses beyond a research notebook, too. For example, Titus Brown recently posted the IPython notebook that he used to generate all of the graphs in one of his recent papers. Imagine the implications for science if scientists actually start showing the code they used to generate their graphs! (No more hiding that outlier point on the side of the graph…)
I’ll be putting up some tutorials and examples of how to use IPython Notebook for exploratory statistical data analysis soon, so stay posted!