Machine Learning Resources

Published: 27 Aug 2015 Category: machine_learning

Tutorials

Machine Learning for Developers

http://xyclade.github.io/MachineLearning/

Logistic Regression Vs Decision Trees Vs SVM

Machine learning: A practical introduction

blog: http://www.infoworld.com/article/3010401/big-data/machine-learning-a-practical-introduction.html

Tutorials on Machine Learning (Tom Dietterich)

http://web.engr.oregonstate.edu/~tgd/projects/tutorials.html

Machine Learning Tutorials

intro: “This repository contains a topic-wise curated list of Machine Learning and Deep Learning tutorials, articles and other resources. Other awesome lists can be found in this list.”
homepage: http://ujjwalkarn.github.io/Machine-Learning-Tutorials/
github: https://github.com/ujjwalkarn/Machine-Learning-Tutorials/blob/master/README.md

A Visual Introduction to Machine Learning

part 1: http://www.r2d3.us/visual-intro-to-machine-learning-part-1/

Machine Learning – A gentle & structured introduction

A Comparison of Supervised Learning Algorithm

blog: http://blog.nycdatascience.com/students-work/a-comparison-of-supervised-learning-algorithm/

Statistical Learning and Kernel Methods

slides: http://matt.colorado.edu/compcogworkshop/talks/scholkopf.pdf

Getting Started with Machine Learning

https://www.infoq.com/articles/getting-started-ml

Getting Started with Machine Learning: For the absolute beginners and fifth graders

https://medium.com/@suffiyanz/getting-started-with-machine-learning-f15df1c283ea#.fqipdiyyn

Machine Learning Crash Course

part 1: https://ml.berkeley.edu/blog/2016/11/06/tutorial-1/
part 2: https://ml.berkeley.edu/blog/2016/12/24/tutorial-2/

Rules of Machine Learning: Best Practices for ML Engineering

http://martin.zinkevich.org/rules_of_ml/rules_of_ml.pdf

Machine Learning is Fun!

Machine Learning is Fun! - The world’s easiest introduction to Machine Learning

blog: https://medium.com/@ageitgey/machine-learning-is-fun-80ea3ec3c471#.yy4r6gf6b

Machine Learning is Fun! Part 2 - Using Machine Learning to generate Super Mario Maker levels

blog: https://medium.com/@ageitgey/machine-learning-is-fun-part-2-a26a10b68df3#.r3jx7zqro

Machine Learning is Fun! Part 3: Deep Learning and Convolutional Neural Networks

blog: https://medium.com/@ageitgey/machine-learning-is-fun-part-3-deep-learning-and-convolutional-neural-networks-f40359318721#.8jjnrfiix

Machine Learning is Fun! Part 4: Modern Face Recognition with Deep Learning

blog: https://medium.com/@ageitgey/machine-learning-is-fun-part-4-modern-face-recognition-with-deep-learning-c3cffc121d78#.fnu6ep6ac

Machine Learning Theory

Machine Learning Theory - Part 1: Introduction

https://mostafa-samir.github.io/ml-theory-pt1/

Machine Learning Theory - Part 2: Generalization Bounds

https://mostafa-samir.github.io/ml-theory-pt2/

Boosting

“Quick Introduction to Boosting Algorithms in Machine Learning”

http://www.analyticsvidhya.com/blog/2015/11/quick-introduction-boosting-algorithms-machine-learning/

An Empirical Comparison of Three Boosting Algorithms on Real Data Sets with Artificial Class Noise(AdaBoost vs. LogitBoost vs. BrownBoost)

paper: http://www.lancs.ac.uk/~eckley/papers/McDonaldHandEckley2003.pdf

A (small) introduction to Boosting

blog: https://codesachin.wordpress.com/2016/03/06/a-small-introduction-to-boosting/

Boosting and AdaBoost for Machine Learning

blog: http://machinelearningmastery.com/boosting-and-adaboost-for-machine-learning/

Gradient Boosting

Complete Guide to Parameter Tuning in Gradient Boosting (GBM) in Python

blog: http://www.analyticsvidhya.com/blog/2016/02/complete-guide-parameter-tuning-gradient-boosting-gbm-python/

Understanding Gradient Boosting, Part 1

blog: http://rcarneva.github.io/understanding-gradient-boosting-part-1.html

Gradient Boosting explained [demonstration]

blog: https://arogozhnikov.github.io/2016/06/24/gradient_boosting_explained.htmlhttps://arogozhnikov.github.io/2016/06/24/gradient_boosting_explained.html

A Kaggle Master Explains Gradient Boosting

http://blog.kaggle.com/2017/01/23/a-kaggle-master-explains-gradient-boosting/

Performance of various open source GBM implementations

intro: h2o VS. xgboost VS. lightgbm
github: https://github.com/szilard/GBM-perf

arboretum - Gradient Boosting on GPU

intro: Gradient Boosting powered by GPU(NVIDIA CUDA)
github: https://github.com/sh1ng/arboretum

Gradient Boosting from scratch

https://medium.com/mlreview/gradient-boosting-from-scratch-1e317ae4587d

XGBoost

XGBoost: A Scalable Tree Boosting System

arxiv: http://arxiv.org/abs/1603.02754

XGBoost: eXtreme Gradient Boosting

intro: Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Flink and DataFlow
github: https://github.com/dmlc/xgboost

GPU Accelerated XGBoost

blog: http://dmlc.ml/2016/12/14/GPU-accelerated-xgboost.html

Awesome XGBoost

intro: This page contains a curated list of examples, tutorials, blogs about XGBoost usecases.
github: https://github.com/dmlc/xgboost/blob/master/demo/README.md

Complete Guide to Parameter Tuning in XGBoost (with codes in Python)

LinXGBoost: Extension of XGBoost to Generalized Local Linear Models

arxiv: https://arxiv.org/abs/1710.03634
github: https://github.com/ldv1/LinXGBoost

Tree Boosting With XGBoost - Why Does XGBoost Win “Every” Machine Learning Competition?

intro: Master thesis
thesis page: https://brage.bibsys.no/xmlui/handle/11250/2433761

XGBoost: Scalable GPU Accelerated Learning

intro: describe the multi-GPU gradient boosting algorithm implemented in the XGBoost library
arxiv: https://arxiv.org/abs/1806.11248

LightGBM

LightGBM, Light Gradient Boosting Machine

intro: LightGBM is a fast, distributed, high performance gradient boosting (GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
github: https://github.com/Microsoft/LightGBM

pyLightGBM: Python binding for Microsoft LightGBM

github: https://github.com/ArdalanM/pyLightGBM

Benchmarking LightGBM: how fast is LightGBM vs xgboost?

blog: https://medium.com/data-design/benchmarking-lightgbm-how-fast-is-lightgbm-vs-xgboost-7b5484746ac4#.xwkhql46h

GPU-acceleration for Large-scale Tree Boosting

intro: University of California, Davis & Google Research
intro: GPU Accelerated LightGBM for Histogram-based GBDT Training
arxiv: https://arxiv.org/abs/1706.08359
github: https://github.com/huanzhang12/lightgbm-gpu

Lessons Learned From Benchmarking Fast Machine Learning Algorithms

intro: XGBoost and LightGBM
blog: https://blogs.technet.microsoft.com/machinelearning/2017/07/25/lessons-learned-benchmarking-fast-machine-learning-algorithms/

CatBoost

CatBoost is an open-source gradient boosting library with categorical features support

intro: CatBoost is a machine learning method based on gradient boosting over decision trees.
homepage: https://catboost.yandex/
github: https://github.com/catboost/catboost

Bootstrap

Coding, Visualizing, and Animating Bootstrap Resampling

http://minimaxir.com/2015/09/bootstrap-resample/

Can we trust the bootstrap in high-dimension?

arxiv: http://arxiv.org/abs/1608.00696

Cascades

Making faces with Haar cascades and mixed integer linear programming

Classifiers

Measuring Performance of Classifiers

blog: http://shahramabyari.com/2016/02/22/measuring-performance-of-classifiers/

Convex Optimization

Convex Optimization: Algorithms and Complexity

cvx-optim.torch: Torch library for convex optimization

github: https://github.com/bamos/cvx-optim.torch

Decision Tree

Soft Decision Trees

paper: http://www.cmpe.boun.edu.tr/~ethem/files/papers/icpr2012_softtree.pdf
project page: http://www.cs.cornell.edu/~oirsoy/softtree.html
github: https://github.com/oir/soft-tree

Canonical Correlation Forests

arxiv: http://arxiv.org/abs/1507.05444
code: https://bitbucket.org/twgr/ccf

Decision Trees Tutorial

blog: https://algobeans.com/2016/07/27/decision-trees-tutorial/

End-to-end Learning of Deterministic Decision Trees

intro: Heidelberg University
arxiv: https://arxiv.org/abs/1712.02743

Extremely Fast Decision Tree

arxiv: https://arxiv.org/abs/1802.08780
github: https://github.com/chaitanya-m/kdd2018

Generative Models

A note on the evaluation of generative models

arxiv: http://arxiv.org/abs/1511.01844

Markov Networks

Markov Logic Networks

paper: http://homes.cs.washington.edu/~pedrod/papers/mlj05.pdf

Markov Chains

Evolution, Dynamical Systems and Markov Chains

http://www.offconvex.org/2016/03/07/evolution-markov-chains/

Markov Chains: Explained Visually

blog: http://setosa.io/ev/markov-chains/

Matrix Computations

Randomized Numerical Linear Algebra for Large Scale Data Analysis

http://researcher.watson.ibm.com/researcher/view_group.php?id=5131

Sketching-based Matrix Computations for Machine Learning

http://xdata-skylark.github.io/libskylark/

Matrix Factorization

Neural Network Matrix Factorization

arxiv: http://arxiv.org/abs/1511.06443

Beyond Low Rank + Sparse: Multi-scale Low Rank Matrix Decomposition

arxiv: http://arxiv.org/abs/1507.08751
github: https://github.com/frankong/multi_scale_low_rank

k-Means Clustering Is Matrix Factorization

arxiv: http://arxiv.org/abs/1512.07548
note: http://blog.csdn.net/cyh_24/article/details/50408884

CuMF_SGD: Fast and Scalable Matrix Factorization

arxiv: https://arxiv.org/abs/1610.05838
github: https://github.com/CuMF/cumf_sgd

Gaussian Processes

The Gaussian Processes Web Site

blog: http://www.gaussianprocess.org/

Chained Gaussian Processes

jmlr: http://jmlr.org/proceedings/papers/v51/saul16.html
arxiv: http://arxiv.org/abs/1604.05263
github: https://github.com/SheffieldML/ChainedGP

Introduction to Gaussian Processes

slides: http://learning.mpi-sws.org/mlss2016/slides/gp_mlss16.pdf

Multi-label Learning

Neural Network Models for Multilabel Learning

paper: http://pan.baidu.com/s/1bnFdYFX
github: https://github.com/abhishek-kumar/NNForMLL

Conditional Bernoulli Mixtures for Multi-label Classification

homepage: http://www.chengli.io/publications/li2016conditional.html
paper: http://www.chengli.io/publications/li2016conditional.pdf
slides: http://www.chengli.io/publications/li2016conditional_slides.pdf
github: https://github.com/cheng-li/pyramid
wiki: https://github.com/cheng-li/pyramid/wiki/CBM

Multi-Label Learning with Label Enhancement

https://arxiv.org/abs/1706.08323

Multi-Task Learning

Multitask Learning

intro: 1997
paper: http://www.cs.cornell.edu/~caruana/mlj97.pdf

Multi-Task Learning: Theory, Algorithms, and Applications (2012)

slides: http://www.public.asu.edu/~jye02/Software/MALSAR/MTL-SDM12.pdf

Nearest Neighbors

Annoy: Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk

github: https://github.com/spotify/annoy

Hidden Markov Models (HMM)

tensorflow_hmm: A tensorflow implementation of an HMM layer

intro: Tensorflow and numpy implementations of the HMM viterbi and forward/backward algorithms
github: https://github.com/dwiel/tensorflow_hmm

Online Learning

Lecture Notes on Online Learning

notes: http://www-stat.wharton.upenn.edu/~rakhlin/courses/stat991/papers/lecture_notes.pdf

Scale-Free Online Learning

arxiv: http://arxiv.org/abs/1601.01974

Online Learning with Expert Advice

lecture notes: http://courses.cs.washington.edu/courses/cse599s/14sp/scribes/lecture6/lecture6.pdf

Stochastic Gradient Descent (SGD)

Stochastic Gradient Descent (v.2)

author: Leon Bottou
intro: SGD, ASGD, Stochastic Gradient SVM, Stochastic Gradient CRFs
homepage: http://leon.bottou.org/projects/sgd

Gradient descent with Python

blog: http://www.pyimagesearch.com/2016/10/10/gradient-descent-with-python/

Stochastic Gradient Descent (SGD) with Python

blog: http://www.pyimagesearch.com/2016/10/17/stochastic-gradient-descent-sgd-with-python/

Gradient Descent Learns Linear Dynamical Systems

blog: http://www.offconvex.org/2016/10/13/gradient-descent-learns-dynamical-systems/

Why is gradient descent robust to non-linearly separable data?

blog: https://medium.com/@vivek.yadav/why-is-gradient-descent-robust-to-non-linearly-separable-data-a50c543e8f4a#.rhtf3xi79

Boosted Regression Trees

DART: Dropouts meet Multiple Additive Regression Trees

Visualization

Visualising High-Dimensional Data

blog: http://blog.applied.ai/visualising-high-dimensional-data/
ipn(“t-SNE Demo”): https://s3-eu-west-1.amazonaws.com/appliedai.static/tsnedemo/htmlrenders/01_EndToEnd_DataViz.html

Interactive demonstrations for ML courses

blog: http://arogozhnikov.github.io/2016/04/28/demonstrations-for-ml-courses.html

Comprehensive Guide on t-SNE algorithm with implementation in R & Python

https://www.analyticsvidhya.com/blog/2017/01/t-sne-implementation-r-python/

Tricks

Machine Learning Trick of the Day

(1): Replica Trick: http://blog.shakirm.com/2015/07/machine-learning-trick-of-the-day-1-replica-trick/
(2): Gaussian Integral Trick: http://blog.shakirm.com/2015/08/machine-learning-trick-of-the-day-2-gaussian-integral-trick/
(3): Hutchinson’s Trick: http://blog.shakirm.com/2015/09/machine-learning-trick-of-the-day-3-hutchinsons-trick/
(4): Reparameterisation Tricks: http://blog.shakirm.com/2015/10/machine-learning-trick-of-the-day-4-reparameterisation-tricks/
(5): Log Derivative Trick: http://blog.shakirm.com/2015/11/machine-learning-trick-of-the-day-5-log-derivative-trick/

Debug Machine Learning

Debugging Machine Learning Tasks

arxiv: http://arxiv.org/abs/1603.07292

Tackle Unbalanced Classes

Classic strategies:

class re-sampling
cost-sensitive training

Dealing with Unbalanced Classes ,Svm, Random Forests And Decision Trees In Python

Fighting Class Unbalance Supervised ML Problem

http://www.erogol.com/fighting-class-unbalance-supervised-ml-problem/

Survey of resampling techniques for improving classification performance in unbalanced datasets

arxiv: http://arxiv.org/abs/1608.06048

Learning from Imbalanced Classes

Towards Competitive Classifiers for Unbalanced Classification Problems: A Study on the Performance Scores

arxiv: http://arxiv.org/abs/1608.08984
github: https://github.com/jonathanSS/ClassImbalanceStudies

This Machine Learning Project on Imbalanced Data Can Add Value to Your Resume

https://www.analyticsvidhya.com/blog/2016/09/this-machine-learning-project-on-imbalanced-data-can-add-value-to-your-resume/

Dealing with unbalanced data: Generating additional data by jittering the original image

7 Techniques to Handle Imbalanced Data

http://www.kdnuggets.com/2017/06/7-techniques-handle-imbalanced-data.html

Mathematics

Some Notes on Applied Mathematics for Machine

paper: http://research.microsoft.com/en-us/um/people/cburges/tech_reports/tr-2004-56.pdf

An extended collection of matrix derivative results for forward and reverse mode algorithmic differentiation

paper: https://people.maths.ox.ac.uk/gilesm/files/NA-08-01.pdf

Probability Cheatsheet

homepage: http://www.wzchen.com/probability-cheatsheet
github: https://github.com/wzchen/probability_cheatsheet

Probability Cheatsheet v2.0 http://static1.squarespace.com/static/54bf3241e4b0f0d81bf7ff36/t/55e9494fe4b011aed10e48e5/1441352015658/probability_cheatsheet.pdf

Kalman Filter

How Kalman Filters Work

Understanding the Basis of the Kalman Filter Via a Simple and Intuitive Derivation

paper: https://www.cl.cam.ac.uk/~rmf25/papers/Understanding%20the%20Basis%20of%20the%20Kalman%20Filter.pdf

L-BFGS

Code Stylometry

De-anonymizing Programmers via Code Stylometry

keywords: source code authorship, random forests
paper: http://www.princeton.edu/~aylinc/papers/caliskan-islam_deanonymizing.pdf

Recommendation / Recommender System

Master Recommender Systems

intro: Learn how to design, building and evaluate recommender systems for commerce and content.
course page: https://www.coursera.org/specializations/recommender-systems

Human Curation and Convnets: Powering Item-to-Item Recommendations on Pinterest

paper: https://engineering.pinterest.com/sites/engineering/files/article/fields/field_image/human-curation-convnets%20%281%29.pdf

Top-N Recommendation with Novel Rank Approximation

arxiv: http://arxiv.org/abs/1602.07783
github: https://github.com/sckangz/SDM16

On the Effectiveness of Linear Models for One-Class Collaborative Filtering

An Adaptive Matrix Factorization Approach for Personalized Recommender Systems

arxiv: http://arxiv.org/abs/1607.07607

Implementing your own Recommender Systems in Python using Stochastic Gradient Descent

blog: http://online.cambridgecoding.com/notebooks/mhaller/implementing-your-own-recommender-systems-in-python-using-stochastic-gradient-descent-4#implementing-your-own-recommender-systems-in-python-using-stochastic-gradient-descent

How to Write Your Own Recommendation System

blog(part 1): http://elliot.land/how-to-write-your-own-recommendation-system-part-1
blog(part 2): http://elliot.land/how-to-write-your-own-recommendation-system-part-2

Addressing Cold Start for Next-song Recommendation

intro: ACM Recsys 2016
paper: http://mac.citi.sinica.edu.tw/~yang/pub/chou16recsys.pdf
github: https://github.com/fearofchou/ALMM

Using Navigation to Improve Recommendations in Real-Time

paper: http://dl.acm.org/citation.cfm?id=2959174

Local Item-Item Models For Top-N Recommendation

paper: http://dl.acm.org/citation.cfm?id=2959185

Lessons learned from building real-life recommender systems

intro: Recsys 2016 tutorial
slides: http://www.slideshare.net/xamat/recsys-2016-tutorial-lessons-learned-from-building-reallife-recommender-systems
mirror: https://pan.baidu.com/s/1eSdWcue

Algorithms Aside: Recommendation As The Lens Of Life

paper: http://dl.acm.org/citation.cfm?doid=2959100.2959164

Pairwise Preferences Based Matrix Factorization and Nearest Neighbor Recommendation Techniques

paper: http://dl.acm.org/citation.cfm?id=2959142
datasets: http://www.inf.unibz.it/~kalloori/

Mendeley: Recommendations for Researchers

intro: RecSys 2016
slides: http://saulvargas.es/slides/recsys2016/#/

Past, Present and Future of Recommender Systems: an Industry Perspective

intro: RecSys 2016
slides: http://www.slideshare.net/xamat/past-present-and-future-of-recommender-systems-and-industry-perspective
mirror: https://pan.baidu.com/s/1kVQ4SKZ

TF-recomm: Tensorflow-based Recommendation systems

github: https://github.com/songgc/TF-recomm

List of Recommender Systems

github: https://github.com/grahamjenson/list_of_recommender_systems

Related Pins at Pinterest: The Evolution of a Real-World Recommender System

intro: Pinterest, Inc.
arxiv: https://arxiv.org/abs/1702.07969

Lifelong Learning

Lifelong Machine Learning

NELL (Never Ending Language Learner)

Toward an architecture for neverending language learning

NEIL (Never Ending Image Learner)

NEIL: Extracting Visual Knowledge from Web Data

Expert Gate: Lifelong Learning with a Network of Experts

arxiv: https://arxiv.org/abs/1611.06194

Lifelong Machine Learning and Computer Reading the Web

intro: KDD 2016 Tutorial
paper: https://www.cs.uic.edu/~liub/Lifelong-Machine-Learning-Tutorial-KDD-2016.pdf

Lifelong Machine Learning for Natural Language Processing

intro: EMNLP 2016 Tutorial
slides: http://www.emnlp2016.net/tutorials/chen-liu-t3.pdf

Zero-Shot Learning

An embarrassingly simple approach to zero-shot learning

Zero-Shot Learning - The Good, the Bad and the Ugly

arxiv: https://arxiv.org/abs/1703.04394

One Shot Learning

Matching Networks for One Shot Learning

arxiv: http://arxiv.org/abs/1606.04080
github: https://github.com/zergylord/oneshot

Maximum Entropy

Maximum entropy probability distribution

https://www.wikiwand.com/en/Maximum_entropy_probability_distribution

Metric Learning

Distance Metric Learning: A Comprehensive Survey

intro: 2006
paper: https://www.cs.cmu.edu/~liuy/frame_survey_v2.pdf

Large Scale Metric Learning from Equivalence Constraints

intro: CVPR 2012. KISSME
paper: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.384.2335&rep=rep1&type=pdf

Large Scale Strongly Supervised Ensemble Metric Learning, with Applications to Face Verification and Retrieval

intro: NEC Laboratories America
arxiv: https://arxiv.org/abs/1212.6094

Finance and Trading

Efficient Portfolio optimisation by Hybridised Machine Learning

intro: Thesis 2014
mirror: http://pan.baidu.com/s/1eQvSyZ4

Feature Selection for Portfolio Optimization

paper: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2548800

The Efficient Frontier: Markowitz portfolio optimization in Python

blog: https://blog.quantopian.com/markowitz-portfolio-optimization-2/

Self-Study Plan for Becoming a Quantitative Trader

Pyfolio – a new Python library for performance and risk analysis

blog: https://blog.quantopian.com/pyfolio/
github: https://github.com/quantopian/pyfolio

Application of Machine Learning: Automated Trading Informed by Event Driven Data

intro: MIT master thesis
paper: https://dspace.mit.edu/bitstream/handle/1721.1/105982/965785890-MIT.pdf

Python Programming for Finance

youtube: https://www.youtube.com/playlist?list=PLQVvvaa0QuDcOdF96TBtRtuQksErCEBYZ

Algorithmic trading in less than 100 lines of Python code

https://www.oreilly.com/learning/algorithmic-trading-in-less-than-100-lines-of-python-code

Designing an Algorithmic Trading Strategy with Python

https://www.youtube.com/watch?v=9XYjR6ge73M

Different Interpretation about Same Model

Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning

intro: ICML 2016
arxiv: https://arxiv.org/abs/1506.02142

Dropout as a Bayesian Approximation: Insights and Applications

http://mlg.eng.cam.ac.uk/yarin/PDFs/Dropout_as_a_Bayesian_approximation.pdf

k-Means Clustering Is Matrix Factorization

https://arxiv.org/abs/1512.07548

word embedding as matrix factorization

Neural Word Embedding as Implicit Matrix Factorization

https://levyomer.files.wordpress.com/2014/09/neural-word-embeddings-as-implicit-matrix-factorization.pdf

Deformable Part Models are Convolutional Neural Networks

intro: CVPR 2015
arxiv: https://arxiv.org/abs/1409.5403
paper: http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Girshick_Deformable_Part_Models_2015_CVPR_paper.pdf

k-Means is a Variational EM Approximation of Gaussian Mixture Models

https://arxiv.org/abs/1704.04812

Steepest descent with momentum for quadratic functions is a version of the conjugate gradient method

http://www.sciencedirect.com/science/article/pii/S0893608003001709

On the momentum term in gradient descent learning algorithms

http://www.sciencedirect.com/science/article/pii/S0893608098001166?np=y&npKey=142c3bf066ad1c36c5b4fd8713d0a8967413462675bae2f8d7b89933fa8cf228

EM as a coordinate descent

Backprop as Functor: A compositional perspective on supervised learning

intro: MIT
arxiv: https://arxiv.org/abs/1711.10455

Papers

Do we Need Hundreds of Classifiers to Solve Real World Classification Problems?

intro: evaluate 179 classifiers arising from 17 families (discriminant analysis, Bayesian, neural networks, support vector machines, decision trees, rule-based classifiers, boosting, bagging, stacking, random forests and other ensembles, generalized linear models, nearest-neighbors, partial least squares and principal component regression, logistic and multinomial regression, multiple adaptive regression splines and other methods), implemented in Weka, R (with and without the caret package), C and Matlab, including all the relevant classifiers available today
intro: “The random forest is clearly the best family of classifiers”
paper: http://www.jmlr.org/papers/volume15/delgado14a/delgado14a.pdf

Are Random Forests Truly the Best Classifiers?

intro: question the conclusion that random forests are the best classifiers
paper: http://jmlr.org/papers/volume17/15-374/15-374.pdf
notes: http://weibo.com/ttarticle/p/show?id=2309404007876694808654
my notes: jeez, I love the above two papers..

An Empirical Evaluation of Supervised Learning in High Dimensions

paper: http://lowrank.net/nikos/pubs/empirical.pdf

Machine learning: Trends, perspectives, and prospects

intro: M. I. Jordan and T. M. Mitchell. Science
paper: http://www.cs.cmu.edu/~tom/pubs/Science-ML-2015.pdf

Debugging Machine Learning Tasks

arxiv: http://arxiv.org/abs/1603.07292

LIME

“Why Should I Trust You?”: Explaining the Predictions of Any Classifier

intro: Local Interpretable Model-Agnostic Explanations (LIME)
homepage: http://homes.cs.washington.edu/~marcotcr/blog/lime/
arxiv: http://arxiv.org/abs/1602.04938
github: https://github.com/marcotcr/lime
github: https://github.com/marcotcr/lime-experiments
blog: https://www.oreilly.com/learning/introduction-to-local-interpretable-model-agnostic-explanations-lime
blog: http://dataskeptic.com/epnotes/trusting-machine-learning-models-with-lime.php
notes: https://blog.acolyer.org/2016/09/22/why-should-i-trust-you-explaining-the-predictions-of-any-classifier/

Datasets

Datasets for Machine Learning

blog: http://blog.webkid.io/datasets-for-machine-learning/

Books

Machine Learning plus Intelligent Optimization: THE LION WAY, VERSION 2.0

Level-Up Your Machine Learning

https://www.metacademy.org/roadmaps/cjrd/level-up-your-ml

An Introduction to the Science of Statistics: From Theory to Implementation (Preliminary Edition)

book: http://math.arizona.edu/~jwatkins/statbook.pdf

Python Machine Learning

github: https://github.com/rasbt/python-machine-learning-book

Machine Learning for Hackers

github: https://github.com/johnmyleswhite/ML_for_Hackers

A Course in Machine Learning

homepage: http://ciml.info/
github: https://github.com/hal3/ciml

An Introduction to Statistical Learning: with Applications in R

homepage: http://www-bcf.usc.edu/~gareth/ISL/
course page: https://lagunita.stanford.edu/courses/HumanitiesSciences/StatLearning/Winter2016/about
unofficial solutions: http://blog.princehonest.com/stat-learning/
github: https://github.com/asadoughi/stat-learning

Introduction to Machine Learning with Python

github(Notebooks and code): https://github.com/amueller/introduction_to_ml_with_python

Introduction to Machine Learning (Second Edition)

author: Ethem Alpaydin
book: https://static.aminer.org/upload/pdf/1821/326/1262/53e99a91b7602d9702304e89.pdf

Videos

Video resources for machine learning

http://dustintran.com/blog/video-resources-for-machine-learning/

Blogs

10 More lessons learned from building real-life Machine Learning systems — Part I

https://medium.com/@xamat/10-more-lessons-learned-from-building-real-life-ml-systems-part-i-b309cafc7b5e#.h7rh0gxlv

Machine Learning: classifier comparison using Plotly

http://nbviewer.jupyter.org/github/etpinard/plotly-misc-nbs/blob/master/ml-classifier-comp/ml-classifier-comp.ipynb

Fitting a model via closed-form equations vs. Gradient Descent vs Stochastic Gradient Descent vs Mini-Batch Learning. What is the difference?

github: https://github.com/rasbt/python-machine-learning-book/blob/master/faq/closed-form-vs-gd.md

A Friendly Introduction to Cross-Entropy Loss

https://rdipietro.github.io/friendly-intro-to-cross-entropy-loss/

How to choose algorithms for Microsoft Azure Machine Learning

blog: https://azure.microsoft.com/en-us/documentation/articles/machine-learning-algorithm-choice/

New to Machine Learning? Avoid these three mistakes

blog: https://medium.com/machine-intelligence-report/new-to-machine-learning-avoid-these-three-mistakes-73258b3848a4#.hi1iowlmf

Machine Learning Exercises In Python

Assessing Stability of K-Means Clusterings

blog: http://activisiongamescience.github.io/2016/08/19/Assessing-Stability-of-K-Means-Clusterings/

Cross-Validation Gone Wrong

Probabilistic Machine Learning in PyMC3

Bias in ML, and Teaching AI

Solutions for Skilltest Machine Learning : Revealed

https://www.analyticsvidhya.com/blog/2016/11/solution-for-skilltest-machine-learning-revealed/

Machine Learning Performance Improvement Cheat Sheet

intro: 32 Tips, Tricks and Hacks That You Can Use To Make Better Predictions.
blog: http://machinelearningmastery.com/machine-learning-performance-improvement-cheat-sheet/

What is better: gradient-boosted trees, or a random forest?

http://fastml.com/what-is-better-gradient-boosted-trees-or-random-forest/

A Practical Guide to Tree Based Learning Algorithms

https://sadanand-singh.github.io/posts/treebasedmodels/

Model evaluation, model selection, and algorithm selection in machine learning

Part I - The basics

http://sebastianraschka.com/blog/2016/model-evaluation-selection-part1.html

Part II - Bootstrapping and uncertainties

http://sebastianraschka.com/blog/2016/model-evaluation-selection-part2.html

Part III - Cross-validation and hyperparameter tuning

http://sebastianraschka.com/blog/2016/model-evaluation-selection-part3.html

ROC / AUC

ROC: Receiver Operating Characteristic

AUC: Area Under the Curve

Tutorials: Plotting AP and ROC curves

http://www.vlfeat.org/overview/plots-rank.html

Beautiful Properties Of The Roc Curve

http://jxieeducation.com/2016-09-27/Beautiful-Properties-Of-The-ROC-Curve/

On calculating AUC

blog: http://www.win-vector.com/blog/2016/10/on-calculating-auc/

ROC to precision-recall curve translator

https://rafalab.shinyapps.io/roc-precision-recall/

t-SNE

How to Use t-SNE Effectively

blog: http://distill.pub/2016/misread-tsne/
github: https://github.com/distillpub/post–misread-tsne

Libraries

LambdaNet: Purely functional artificial neural network library implemented in Haskell

github: https://github.com/jbarrow/LambdaNet

rustlearn: Machine learning crate for Rust

github: https://github.com/maciejkula/rustlearn

MILJS : Brand New JavaScript Libraries for Matrix Calculation and Machine Learning

arxiv: http://arxiv.org/abs/1503.05743v1
github: https://github.com/mil-tokyo
homepage: http://mil-tokyo.github.io/

machineJS: Automated machine learning- just give it a data file!

github: https://github.com/ClimbsRocks/machineJS

Machine Learning for iOS: Tools and resources to create really smart iOS applications

homepage: http://alexsosn.github.io/ml/2015/11/05/iOS-ML.html

DynaML: Scala Library/REPL for Machine Learning Research

homepage: http://mandar2812.github.io/DynaML/
github: https://github.com/mandar2812/DynaML/

Smile - Statistical Machine Intelligence and Learning Engine

intro: Smile is a fast and comprehensive machine learning system.
homepage: http://haifengl.github.io/smile/index.html
github: https://github.com/haifengl/smile

benchm-ml

intro: A minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc.) of the top machine learning algorithms for binary classification (random forests, gradient boosted trees, deep neural networks etc.).
github: https://github.com/szilard/benchm-ml

KeystoneML: Simplifying robust end-to-end machine learning on Apache Spark

intro: a software framework, written in Scala, from the UC Berkeley AMPLab designed to simplify the construction of large scale, end-to-end, machine learning pipelines with Apache Spark.
homepage: http://keystone-ml.org/
github: https://github.com/amplab/keystone

Talisman: A straightforward & modular NLP, machine learning & fuzzy matching library for JavaScript

homepage: http://yomguithereal.github.io/talisman/
github: https://github.com/Yomguithereal/talisman

PRMLT: Pattern Recognition and Machine Learning Toolbox

homepage: http://prml.github.io/
github: https://github.com/PRML/PRMLT

The Fido Project: An open source C++ machine learning library targeted towards embedded electronics and robotics

homepage: https://fidoproject.github.io/
github: https://github.com/FidoProject/Fido

rusty-machine: Machine Learning library for Rust

homepage: https://crates.io/crates/rusty-machine/
github: https://github.com/AtheMathmo/rusty-machine

RoBO - a Robust Bayesian Optimization framework

github: https://github.com/automl/RoBO
docs: http://robo-fork.readthedocs.io/en/latest/

Dlib: A toolkit for making real world machine learning and data analysis applications in C++

intro: Dlib is a modern C++ toolkit containing machine learning algorithms and tools for creating complex software in C++ to solve real world problems.
homepage: http://dlib.net/
github: https://github.com/davisking/dlib

Bayesian Networks and Bayesian Classifier Software

blog: http://www.kdnuggets.com/software/bayesian.html

ML-lib: An extensive machine learning library, made from scratch (Python)

github: https://github.com/christopherjenness/ML-lib

Top Machine Learning Projects for Julia

blog: http://www.kdnuggets.com/2016/08/top-machine-learning-projects-julia.html

Helit: My machine learning/computer vision library for all of my recent papers, plus algorithms that I just like.

github: https://github.com/thaines/helit

Gorgonia: a library that helps facilitate machine learning in Go

github: https://github.com/chewxy/gorgonia
blog: http://blog.chewxy.com/2016/09/19/gorgonia

GoLearn: Machine Learning for Go

github: https://github.com/sjwhitworth/golearn

Cortex: Machine learning in Clojure

intro: Neural networks, regression and feature learning in Clojure.
github: https://github.com/thinktopic/cortex

ELI5: A library for debugging machine learning classifiers and explaining their predictions

github: https://github.com/TeamHG-Memex/eli5

PHP-ML - Machine Learning library for PHP

github: https://github.com/php-ai/php-ml
github: https://github.com/php-ai/php-ml-examples
docs: http://php-ml.readthedocs.io/en/latest/

ml.js - Machine learning tools in JavaScript

https://github.com/mljs/ml

Propel

intro: A Machine Learning Framework for JavaScript / Differential Programming in JavaScript
homepage: http://propelml.org/
github: https://github.com/propelml/propel

Resources

Machine Learning Surveys: A list of literature surveys, reviews, and tutorials on Machine Learning and related topics

http://www.mlsurveys.com/

machine learning classifier gallery

http://home.comcast.net/~tom.fawcett/public_html/ML-gallery/pages/

Machine Learning and Computer Vision Resources

http://zhengrui.github.io/zerryland/ML-CV-Resource.html

A Huge List of Machine Learning And Statistics Repositories

http://blog.josephmisiti.com/a-huge-list-of-machine-learning-repositories/

Machine Learning in Python Course

https://www.springboard.com/learning-paths/machine-learning-python/

机器学习(Machine Learning)&深度学习(Deep Learning)资料(Chapter 1)

https://github.com/ty4z2008/Qix/blob/master/dl.md

The Spectator: Shakir’s Machine Learning Blog

http://blog.shakirm.com/

Useful Inequalities

http://www.lkozma.net/inequalities_cheat_sheet/ineq.pdf

Math for Machine Learning

http://www.umiacs.umd.edu/~hal/courses/2013S_ML/math4ml.pdf

Cheat Sheet: Algorithms for Supervised- and Unsupervised Learning

blog: http://eferm.com/machine-learning-cheat-sheet/

Annalyzin: Analytics For Layman, with Tutorials & Experiments

https://annalyzin.wordpress.com/

ALGORITHMS: AI, Data Mining, Clustering, Data Structures, Machine Learning, Neural, NLP, …

github: https://github.com/svaksha/pythonidae/blob/master/AI.md

Awesome Machine Learning: A curated list of awesome machine learning frameworks, libraries and software (by language)

github: https://github.com/josephmisiti/awesome-machine-learning

awesome-machine-learning-cn: 机器学习资源大全中文版

intro: 机器学习资源大全中文版，包括机器学习领域的框架、库以及软件
github: https://github.com/jobbole/awesome-machine-learning-cn

Machine and Deep Learning with Python

github: https://github.com/szwed/awesome-machine-learning-python

useR! 2016 Tutorial: Machine Learning Algorithmic Deep Dive

homepage: http://user2016.org/tutorials/10.html
github: https://github.com/ledell/useR-machine-learning-tutorial

Top-down learning path: Machine Learning for Software Engineers

intro: A complete daily plan for studying to become a machine learning engineer.
github: https://github.com/ZuzooVn/machine-learning-for-software-engineers

30 Top Videos, Tutorials & Courses on Machine Learning & Artificial Intelligence from 2016 https://www.analyticsvidhya.com/blog/2016/12/30-top-videos-tutorials-courses-on-machine-learning-artificial-intelligence-from-2016/

Machine Learning Problem Bible (MLPB)

github: https://github.com/ben519/MLPB

The most shared Machine Learning conten on Twitter from the past 7 days

Based on the millions of #machinelearning tweets already processed by The Herd Locker, noise is a little over 94% of the conversation. Tracking the 8,000 daily tweets that are tagged #machineLearning, the platform filters and ranks the most popular shared content in realtime. Machine learning’s zeitgeist, you might say. It’s been running for over a year, monitoring half a billion tweets a day, and will always be free to use. No ads. No BS. http://theherdlocker.com/tweet/popularity/machinelearning

Projects

Machine learning algorithms： Minimal and clean examples of machine learning algorithms

intro: A collection of minimal and clean implementations of machine learning algorithms.
github: https://github.com/rushter/MLAlgorithms

Plotting high-dimensional decision boundaries

github: https://github.com/tmadl/highdimensional-decision-boundary-plot

Flappy Learning: Program that learns to play Flappy Bird by machine learning (Neuroevolution)

blog: https://xviniette.github.io/FlappyLearning/
github: https://github.com/xviniette/FlappyLearning

Readings / Questions / Discussions

A Super Harsh Guide to Machine Learning

https://www.reddit.com/r/MachineLearning/comments/5z8110/d_a_super_harsh_guide_to_machine_learning/

(Quora): What are the top 10 data mining or machine learning algorithms?

https://www.quora.com/What-are-the-top-10-data-mining-or-machine-learning-algorithms/answer/Xavier-Amatriain

(Quora): What are the must read papers on data mining and machine learning?

https://www.quora.com/What-are-the-must-read-papers-on-data-mining-and-machine-learning

(Quora): What would be your advice to a software engineer who wants to learn machine learning? https://www.quora.com/What-would-be-your-advice-to-a-software-engineer-who-wants-to-learn-machine-learning-3/answer/Alex-Smola-1

Machine Learning FAQ

homepage: http://sebastianraschka.com/faq/index.html

MLNotes: Very concise notes on machine learning and statistics

github: https://github.com/johnmyleswhite/MLNotes