Plotting like xkcd

Posted on September 10, 2016 • Tagged with python, visualization

I have been using ggplot2 for years, and it has been my favourite plotting system. One thing I have been missing is an xkcd theme until I discovered recently that ggplot, the plotting system for Python based on ggplot2, comes with an xkcd theme.

Applying the xkcd theme with ggplot doesn't need much explanation: It's exactly the same as applying themes like theme_bw(). Let's just plot something using Fisher's Iris data set 1:

ggplot(iris_data, aes(x='sepal_length', colour='class')) +
                stat_density() +
                theme_xkcd()

xkcd_density

ggplot(iris_data, aes(x='class', y='petal_width')) +
                geom_boxplot() +
                theme_xkcd()

xkcd_box

So, when will The Oatmeal theme be added?


  1. Fisher's Iris data set is provided by UCI Machine Learning Repository, University of California, Irvine. 


Interesting papers from KDD2016

Posted on September 07, 2016 • Tagged with interesting papers

Skinny-dip: Clustering in a Sea of Noise

This paper introduces SkinnyDip, a clustering algorithm that leverages the dip test of unimodality. SkinnyDip is noisy-robust and can detect clusters of varying shapes and density. In addition, SkinnyDip does not perform pair-wise distance calculation and its run-time grows linearly with the data. link

Smart Reply: Automated Response Suggestion for Email

This paper describes the system architecture and algorithms used to build Google's Smart Reply feature in Inbox. link

“Why Should I Trust you?” Explaining the Predictions of Any Classifier

It is always challenging to interpret complex models like random forest and neural network. This paper introduces a novel technique that can explain predictions of complex classifier by training simple, interpretable models (e.g., linear model) locally around the predictions. link

Overcoming Key Weaknesses of Distance-based Neighbourhood Methods using a Data Dependent Dissimilarity Measure

This paper proposes the mass-based dissimilarity, a dissimilarity measure that captures the key property of dissimilarity perceived by humans, i.e., two instances in a dense region are less similar to each other than two instances of the same pair-wise distance in a sparse region. link

XGBoost: A Scalable Tree Boosting System

In this paper, the authors of XGBoost explain in details the design of XGBoost and why it works well. link

Just One More: Modeling Binge Watching Behavior

Nowadays, many watch several episodes or even the whole season of TV shows in single watch sessions, which is referred as "binge watching". This paper introduces a statistical mixture model that characterizes such binge watching behavior in a real-world Video-on-Demand service. link