Skinny-dip: Clustering in a Sea of Noise
This paper introduces SkinnyDip, a clustering algorithm that leverages the dip test of
unimodality. SkinnyDip is noisy-robust and can detect clusters of varying shapes and density.
In addition, SkinnyDip does not perform pair-wise distance calculation and its run-time
grows linearly with the data. link
Smart Reply: Automated Response Suggestion for Email
This paper describes the system architecture and algorithms used to build Google's Smart Reply
feature in Inbox. link
“Why Should I Trust you?” Explaining the Predictions of Any Classifier
It is always challenging to interpret complex models like random forest and neural network.
This paper introduces a novel technique that can explain predictions of complex classifier
by training simple, interpretable models (e.g., linear model) locally around the predictions.
Overcoming Key Weaknesses of Distance-based Neighbourhood Methods using a Data Dependent Dissimilarity Measure
This paper proposes the mass-based dissimilarity, a dissimilarity measure that captures the key property of dissimilarity
perceived by humans, i.e., two instances in a dense region are less similar to each other than two
instances of the same pair-wise distance in a sparse region. link
XGBoost: A Scalable Tree Boosting System
In this paper, the authors of XGBoost explain in details the design of XGBoost and why it works
Just One More: Modeling Binge Watching Behavior
Nowadays, many watch several episodes or even the whole season of TV shows in single watch sessions,
which is referred as "binge watching". This paper introduces a statistical mixture model that characterizes
such binge watching behavior in a real-world Video-on-Demand service. link