Clustering Algorithms Resources
K-means
Yinyang K-Means: A Drop-In Replacement of the Classic K-Means with Consistent Speedup
- paper: http://jmlr.org/proceedings/papers/v37/ding15.html
- github: https://github.com/src-d/kmcuda
- code: http://research.csc.ncsu.edu/nc-caps/yykmeans.tar.bz2
Semi-supervised K-means++
k-Means Clustering Is Matrix Factorization
An efficient K-means algorithm for Massive Data
Boost K-Means
Compressive K-means
Convergence rate of stochastic k-means
Fast and Provably Good Seedings for k-Means using k-MC^2 and AFK-MC^2
- github: https://github.com/obachem/kmc2
- blog: https://www.infoq.com/news/2016/12/AFK-MC2-boosts-kMeans-seeding
An efficient K -means clustering algorithm for massive data
- keywords: Clustering, massive data, parallelization, unsupervised learning, K-means, K-means++, Mini-batch
- arxiv: https://arxiv.org/abs/1801.02949
Stream Clustering
Neural Network-based Clustering
Spectral Clustering
On Spectral Clustering: Analysis and an algorithm
- intro: NIPS 2001. Andrew Ng
- paper: https://papers.nips.cc/paper/2092-on-spectral-clustering-analysis-and-an-algorithm.pdf
- paper: http://ai.stanford.edu/~ang/papers/nips01-spectral.pdf
Hierarchical Clustering
Online Clustering
Papers
On Clustering Validation Techniques (2001)
- intro: “This paper introduces the fundamental concepts of clustering while it surveys the widely known clustering algorithms in a comparative way”
- paper: http://web.itu.edu.tr/sgunduz/courses/verimaden/paper/validity_survey.pdf
Stream Clustering
Neural network-based clustering using pairwise constraints
- arxiv: http://arxiv.org/abs/1511.06321
- homepage: http://yenchanghsu.github.io/NNclustering/
- github: https://github.com/yenchanghsu/NNclustering
PAC-Bayesian Online Clustering
Compressive Spectral Clustering
Interactive Bayesian Hierarchical Clustering
Practical Introduction to Clustering Data
Rényi divergence minimization based co-regularized multiview clustering
Consistent Algorithms for Clustering Time Series
Hybridization of Expectation-Maximization and K-Means Algorithms for Better Clustering Performance
mst_clustering: Clustering via Euclidean Minimum Spanning Trees
- paper: http://joss.theoj.org/papers/10.21105/joss.00012
- paper: https://github.com/openjournals/joss-papers/blob/master/joss.00012/10.21105.joss.00012.pdf
- github: https://github.com/jakevdp/mst_clustering
k2-means for fast and accurate large scale clustering
Context Aware Nonnegative Matrix Factorization Clustering
Clustering by fast search and find of density peaks
http://science.sciencemag.org/content/344/6191/1492
- slides: http://conference.mipt.ru/img/conference/material-design-2014/talks/Laio-talk.pdf
- github: https://github.com/thomasp85/densityClust
- github: https://github.com/GuipengLi/Dcluster
- blog: http://eric-yuan.me/clustering-fast-search-find-density-peaks/
Comment on “Clustering by fast search and find of density peaks”
https://arxiv.org/abs/1501.04267
Datasets
Clustering datasets
https://cs.joensuu.fi/sipu/datasets/
Books
**Introduction to Clustering and Unsupervised Learning | PACKT Books** |
- intro: 《Machine Learning with R - Second Edition》by Brett Lantz
- book: https://www.packtpub.com/books/content/introduction-clustering-and-unsupervised-learning
Blogs
Finding the K in K-means by Parametric Bootstrap
Random walk vectors for clustering
- part I – similarity between objects: http://int8.io/random-walk-vectors-for-clustering-part-i-similarity-between-objects/
- part II – perspective switch: http://int8.io/random-walk-vectors-for-clustering-part-ii-perspective-switch/
- part III: http://int8.io/random-walk-vectors-for-clustering-iii/
- final: http://int8.io/random-walk-vectors-for-clustering-final/
A comparison between PCA and hierarchical clustering
http://www.kdnuggets.com/2016/02/qlucore-comparison-pca-hierarchical-clustering.html
Visualization of Centroid Movements for K-Means Clustering
http://web.cecs.pdx.edu/~lane7/
K-Means Clustering on Handwritten Digits
http://johnloeber.com/docs/kmeans.html
Improved Seeding For Clustering With K-Means++ (★★★★★)
https://datasciencelab.wordpress.com/2014/01/15/improved-seeding-for-clustering-with-k-means/
Spectral Clustering – How Math is Redefining Decision Making
Visual comparison of machine learning algorithms: Clustering
http://haifengl.github.io/smile/index.html#clustering
Clustering Algorithms: From Start To State Of The Art
https://www.toptal.com/machine-learning/clustering-algorithms
Hierarchical clustering, using it to invest
Spectral Clustering: A quick overview
https://charlesmartin14.wordpress.com/2012/10/09/spectral-clustering/
Why K-Means is not always a good idea
https://datasciencemadesimpler.wordpress.com/2016/03/05/why-k-means-is-not-always-a-good-idea/
**High Quality, High Performance Clustering with HDBSCAN | SciPy 2016** |
- youtube: https://www.youtube.com/watch?v=AgPQ76RIi6A&list=PLYx7XA2nY5Gf37zYZMw6OqGFRPjB1jCy6&index=10
Projects
MusicMappr: Find patterns in your favorite songs and remix them on the fly!
- intro: MusicMappr finds chunks of songs that are similar, and clusters them accordingly. You can visualize these clusters and play them back at will. This is for music lovers who are curious about the structures inherent to their favorite songs.
- github: https://github.com/fatsmcgee/MusicMappr
TfKmeans: A implementation of k-means clustering in TensorFlow
CUDA K-Means Clustering: A CUDA implementation of the k-means clustering algorithm
- homepage: http://serban.org/software/kmeans/
- github: https://github.com/serban/kmeans
kmeans_cuda: CUDA implementation of k-means
K-means in TensorFlow
- blog: http://nxn.se/post/145634722580/k-means-in-tensorflow
- gist: https://gist.github.com/vals/a01a37b14c4918df7937b30d43327837
VAE-Clustering
- intro: Unsupervised clustering with (Gaussian mixture) VAEs
- github: https://github.com/RuiShu/vae-clustering