TPOT Automated Machine Learning Competition

Can AutoML beat humans on Kaggle? Automated Machine Learning (AutoML) is poised to make a transformative impact on data science in 2017. At the University of Pennsylvania, we’ve been working hard to develop TPOT, a state-of-the-art open source AutoML tool

Python 2.7 still reigns supreme in pip installs

The Python 2 vs. Python 3 divide has long been a thorn in the Python community’s side. On one hand, Python package developers face the challenge of supporting two incompatible versions of Python, which is time that could be better

Introducing TPOT, the Data Science Assistant

Some of you might have been wondering what the heck I’ve been up to for the past few months. I haven’t been posting much on my blog lately, and I haven’t been working on important problems like solving Where’s Waldo?

Python usage survey 2014

Remember that Python usage survey that went around the interwebs late last year? Well, the results are finally out and I’ve visualized them below for your perusal. This survey has been running for two years now (2013-2014), so where we

How to make beautiful data visualizations in Python with matplotlib

Want to learn more about data visualization with Python? Take a look at my Data Visualization Basics with Python video course on O’Reilly. It’s been well over a year since I wrote my last tutorial, so I figure I’m overdue.

Filling in Python’s gaps in statistics packages with Rmagic

Have you ever found yourself searching for a statistics package in Python, but it just isn’t available? This is the biggest reason I’ve heard when my colleagues say they’re unwilling to make the switch from R to Python for statistical

Statistical analysis made easy in Python with SciPy and pandas DataFrames

I finally got around to finishing up this tutorial on how to use pandas DataFrames and SciPy together to handle any and all of your statistical needs in Python. This is basically an amalgamation of my two previous blog posts

Using pandas DataFrames to process data from multiple replicate runs in Python

Per a recommendation in my previous blog post, I decided to follow up and write a short how-to on how to use pandas to process data from multiple replicate runs in Python. If you do research like mine, you’ll often

A short demo on how to use IPython Notebook as a research notebook

As promised, here’s the IPython Notebook tutorial I mentioned in my introduction to IPython Notebook. Downloading and installing IPython Notebook You can download IPython Notebook with the majority of the other packages you’ll need in the Anaconda Python distribution. From

IPython Notebook: Finally, the research notebook I’ve always been looking for is here!

I attended a Software Carpentry workshop hosted by Titus Brown and Greg Wilson this week and was introduced to, among many other things, a piece of software that I’ve been looking for ever since I started my graduate program: IPython

