Machine Learning Resources

Published: 27 Aug 2015 Category: machine_learning

Tutorials

Machine Learning for Developers

http://xyclade.github.io/MachineLearning/

Logistic Regression Vs Decision Trees Vs SVM

Machine learning: A practical introduction

Tutorials on Machine Learning (Tom Dietterich)

http://web.engr.oregonstate.edu/~tgd/projects/tutorials.html

Machine Learning Tutorials

A Visual Introduction to Machine Learning

Machine Learning – A gentle & structured introduction

A Comparison of Supervised Learning Algorithm

Statistical Learning and Kernel Methods

Getting Started with Machine Learning

https://www.infoq.com/articles/getting-started-ml

Getting Started with Machine Learning: For the absolute beginners and fifth graders

https://medium.com/@suffiyanz/getting-started-with-machine-learning-f15df1c283ea#.fqipdiyyn

Machine Learning Crash Course

Rules of Machine Learning: Best Practices for ML Engineering

http://martin.zinkevich.org/rules_of_ml/rules_of_ml.pdf

Machine Learning is Fun!

Machine Learning is Fun! - The world’s easiest introduction to Machine Learning

Machine Learning is Fun! Part 2 - Using Machine Learning to generate Super Mario Maker levels

Machine Learning is Fun! Part 3: Deep Learning and Convolutional Neural Networks

Machine Learning is Fun! Part 4: Modern Face Recognition with Deep Learning

Machine Learning Theory

Machine Learning Theory - Part 1: Introduction

https://mostafa-samir.github.io/ml-theory-pt1/

Machine Learning Theory - Part 2: Generalization Bounds

https://mostafa-samir.github.io/ml-theory-pt2/

Boosting

“Quick Introduction to Boosting Algorithms in Machine Learning”

http://www.analyticsvidhya.com/blog/2015/11/quick-introduction-boosting-algorithms-machine-learning/

An Empirical Comparison of Three Boosting Algorithms on Real Data Sets with Artificial Class Noise(AdaBoost vs. LogitBoost vs. BrownBoost)

A (small) introduction to Boosting

Boosting and AdaBoost for Machine Learning

Gradient Boosting

Complete Guide to Parameter Tuning in Gradient Boosting (GBM) in Python

Understanding Gradient Boosting, Part 1

Gradient Boosting explained [demonstration]

A Kaggle Master Explains Gradient Boosting

http://blog.kaggle.com/2017/01/23/a-kaggle-master-explains-gradient-boosting/

Performance of various open source GBM implementations

arboretum - Gradient Boosting on GPU

Gradient Boosting from scratch

https://medium.com/mlreview/gradient-boosting-from-scratch-1e317ae4587d

XGBoost

XGBoost: A Scalable Tree Boosting System

XGBoost: eXtreme Gradient Boosting

  • intro: Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Flink and DataFlow
  • github: https://github.com/dmlc/xgboost

GPU Accelerated XGBoost

Awesome XGBoost

Complete Guide to Parameter Tuning in XGBoost (with codes in Python)

LinXGBoost: Extension of XGBoost to Generalized Local Linear Models

Tree Boosting With XGBoost - Why Does XGBoost Win “Every” Machine Learning Competition?

XGBoost: Scalable GPU Accelerated Learning

LightGBM

LightGBM, Light Gradient Boosting Machine

  • intro: LightGBM is a fast, distributed, high performance gradient boosting (GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
  • github: https://github.com/Microsoft/LightGBM

pyLightGBM: Python binding for Microsoft LightGBM

Benchmarking LightGBM: how fast is LightGBM vs xgboost?

GPU-acceleration for Large-scale Tree Boosting

Lessons Learned From Benchmarking Fast Machine Learning Algorithms

CatBoost

CatBoost is an open-source gradient boosting library with categorical features support

Bootstrap

Coding, Visualizing, and Animating Bootstrap Resampling

http://minimaxir.com/2015/09/bootstrap-resample/

Can we trust the bootstrap in high-dimension?

Cascades

Making faces with Haar cascades and mixed integer linear programming

Classifiers

Measuring Performance of Classifiers

Convex Optimization

Convex Optimization: Algorithms and Complexity

cvx-optim.torch: Torch library for convex optimization

Decision Tree

Soft Decision Trees

Canonical Correlation Forests

Decision Trees Tutorial

End-to-end Learning of Deterministic Decision Trees

Extremely Fast Decision Tree

Generative Models

A note on the evaluation of generative models

Markov Networks

Markov Logic Networks

Markov Chains

Evolution, Dynamical Systems and Markov Chains

http://www.offconvex.org/2016/03/07/evolution-markov-chains/

Markov Chains: Explained Visually

Matrix Computations

Randomized Numerical Linear Algebra for Large Scale Data Analysis

http://researcher.watson.ibm.com/researcher/view_group.php?id=5131

Sketching-based Matrix Computations for Machine Learning

http://xdata-skylark.github.io/libskylark/

Matrix Factorization

Neural Network Matrix Factorization

Beyond Low Rank + Sparse: Multi-scale Low Rank Matrix Decomposition

k-Means Clustering Is Matrix Factorization

CuMF_SGD: Fast and Scalable Matrix Factorization

Gaussian Processes

The Gaussian Processes Web Site

Chained Gaussian Processes

Introduction to Gaussian Processes

Multi-label Learning

Neural Network Models for Multilabel Learning

Conditional Bernoulli Mixtures for Multi-label Classification

Multi-Label Learning with Label Enhancement

https://arxiv.org/abs/1706.08323

Multi-Task Learning

Multitask Learning

Multi-Task Learning: Theory, Algorithms, and Applications (2012)

Nearest Neighbors

Annoy: Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk

Hidden Markov Models (HMM)

tensorflow_hmm: A tensorflow implementation of an HMM layer

Online Learning

Lecture Notes on Online Learning

Scale-Free Online Learning

Online Learning with Expert Advice

Stochastic Gradient Descent (SGD)

Stochastic Gradient Descent (v.2)

Gradient descent with Python

Stochastic Gradient Descent (SGD) with Python

Gradient Descent Learns Linear Dynamical Systems

Why is gradient descent robust to non-linearly separable data?

Boosted Regression Trees

DART: Dropouts meet Multiple Additive Regression Trees

Visualization

Visualising High-Dimensional Data

Interactive demonstrations for ML courses

Comprehensive Guide on t-SNE algorithm with implementation in R & Python

https://www.analyticsvidhya.com/blog/2017/01/t-sne-implementation-r-python/

Tricks

Machine Learning Trick of the Day

Debug Machine Learning

Debugging Machine Learning Tasks

Tackle Unbalanced Classes

Classic strategies:

  1. class re-sampling
  2. cost-sensitive training

Dealing with Unbalanced Classes ,Svm, Random Forests And Decision Trees In Python

Fighting Class Unbalance Supervised ML Problem

http://www.erogol.com/fighting-class-unbalance-supervised-ml-problem/

Survey of resampling techniques for improving classification performance in unbalanced datasets

Learning from Imbalanced Classes

Towards Competitive Classifiers for Unbalanced Classification Problems: A Study on the Performance Scores

This Machine Learning Project on Imbalanced Data Can Add Value to Your Resume

https://www.analyticsvidhya.com/blog/2016/09/this-machine-learning-project-on-imbalanced-data-can-add-value-to-your-resume/

Dealing with unbalanced data: Generating additional data by jittering the original image

7 Techniques to Handle Imbalanced Data

http://www.kdnuggets.com/2017/06/7-techniques-handle-imbalanced-data.html

Mathematics

Some Notes on Applied Mathematics for Machine

An extended collection of matrix derivative results for forward and reverse mode algorithmic differentiation

Probability Cheatsheet

Probability Cheatsheet v2.0 http://static1.squarespace.com/static/54bf3241e4b0f0d81bf7ff36/t/55e9494fe4b011aed10e48e5/1441352015658/probability_cheatsheet.pdf

Kalman Filter

How Kalman Filters Work

Understanding the Basis of the Kalman Filter Via a Simple and Intuitive Derivation

L-BFGS

Code Stylometry

De-anonymizing Programmers via Code Stylometry

Recommendation / Recommender System

Master Recommender Systems

Human Curation and Convnets: Powering Item-to-Item Recommendations on Pinterest

Top-N Recommendation with Novel Rank Approximation

On the Effectiveness of Linear Models for One-Class Collaborative Filtering

An Adaptive Matrix Factorization Approach for Personalized Recommender Systems

Implementing your own Recommender Systems in Python using Stochastic Gradient Descent

How to Write Your Own Recommendation System

Addressing Cold Start for Next-song Recommendation

Using Navigation to Improve Recommendations in Real-Time

Local Item-Item Models For Top-N Recommendation

Lessons learned from building real-life recommender systems

Algorithms Aside: Recommendation As The Lens Of Life

Pairwise Preferences Based Matrix Factorization and Nearest Neighbor Recommendation Techniques

Mendeley: Recommendations for Researchers

Past, Present and Future of Recommender Systems: an Industry Perspective

TF-recomm: Tensorflow-based Recommendation systems

List of Recommender Systems

Related Pins at Pinterest: The Evolution of a Real-World Recommender System

Lifelong Learning

Lifelong Machine Learning

NELL (Never Ending Language Learner)

Toward an architecture for neverending language learning

NEIL (Never Ending Image Learner)

NEIL: Extracting Visual Knowledge from Web Data

Expert Gate: Lifelong Learning with a Network of Experts

Lifelong Machine Learning and Computer Reading the Web

Lifelong Machine Learning for Natural Language Processing

Zero-Shot Learning

An embarrassingly simple approach to zero-shot learning

Zero-Shot Learning - The Good, the Bad and the Ugly

One Shot Learning

Matching Networks for One Shot Learning

Maximum Entropy

Maximum entropy probability distribution

https://www.wikiwand.com/en/Maximum_entropy_probability_distribution

Metric Learning

Distance Metric Learning: A Comprehensive Survey

Large Scale Metric Learning from Equivalence Constraints

Large Scale Strongly Supervised Ensemble Metric Learning, with Applications to Face Verification and Retrieval

Finance and Trading

Efficient Portfolio optimisation by Hybridised Machine Learning

Feature Selection for Portfolio Optimization

The Efficient Frontier: Markowitz portfolio optimization in Python

Self-Study Plan for Becoming a Quantitative Trader

Pyfolio – a new Python library for performance and risk analysis

Application of Machine Learning: Automated Trading Informed by Event Driven Data

Python Programming for Finance

Algorithmic trading in less than 100 lines of Python code

https://www.oreilly.com/learning/algorithmic-trading-in-less-than-100-lines-of-python-code

Designing an Algorithmic Trading Strategy with Python

https://www.youtube.com/watch?v=9XYjR6ge73M

Different Interpretation about Same Model

Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning

Dropout as a Bayesian Approximation: Insights and Applications

http://mlg.eng.cam.ac.uk/yarin/PDFs/Dropout_as_a_Bayesian_approximation.pdf

k-Means Clustering Is Matrix Factorization

https://arxiv.org/abs/1512.07548

word embedding as matrix factorization

Neural Word Embedding as Implicit Matrix Factorization

https://levyomer.files.wordpress.com/2014/09/neural-word-embeddings-as-implicit-matrix-factorization.pdf

Deformable Part Models are Convolutional Neural Networks

k-Means is a Variational EM Approximation of Gaussian Mixture Models

https://arxiv.org/abs/1704.04812

Steepest descent with momentum for quadratic functions is a version of the conjugate gradient method

http://www.sciencedirect.com/science/article/pii/S0893608003001709

On the momentum term in gradient descent learning algorithms

http://www.sciencedirect.com/science/article/pii/S0893608098001166?np=y&npKey=142c3bf066ad1c36c5b4fd8713d0a8967413462675bae2f8d7b89933fa8cf228

EM as a coordinate descent

Backprop as Functor: A compositional perspective on supervised learning

Papers

Do we Need Hundreds of Classifiers to Solve Real World Classification Problems?

  • intro: evaluate 179 classifiers arising from 17 families (discriminant analysis, Bayesian, neural networks, support vector machines, decision trees, rule-based classifiers, boosting, bagging, stacking, random forests and other ensembles, generalized linear models, nearest-neighbors, partial least squares and principal component regression, logistic and multinomial regression, multiple adaptive regression splines and other methods), implemented in Weka, R (with and without the caret package), C and Matlab, including all the relevant classifiers available today
  • intro: “The random forest is clearly the best family of classifiers”
  • paper: http://www.jmlr.org/papers/volume15/delgado14a/delgado14a.pdf

Are Random Forests Truly the Best Classifiers?

An Empirical Evaluation of Supervised Learning in High Dimensions

Machine learning: Trends, perspectives, and prospects

Debugging Machine Learning Tasks

LIME

“Why Should I Trust You?”: Explaining the Predictions of Any Classifier

Datasets

Datasets for Machine Learning

Books

Machine Learning plus Intelligent Optimization: THE LION WAY, VERSION 2.0

Level-Up Your Machine Learning

https://www.metacademy.org/roadmaps/cjrd/level-up-your-ml

An Introduction to the Science of Statistics: From Theory to Implementation (Preliminary Edition)

Python Machine Learning

Machine Learning for Hackers

A Course in Machine Learning

An Introduction to Statistical Learning: with Applications in R

Introduction to Machine Learning with Python

Introduction to Machine Learning (Second Edition)

Videos

Video resources for machine learning

http://dustintran.com/blog/video-resources-for-machine-learning/

Blogs

10 More lessons learned from building real-life Machine Learning systems — Part I

https://medium.com/@xamat/10-more-lessons-learned-from-building-real-life-ml-systems-part-i-b309cafc7b5e#.h7rh0gxlv

Machine Learning: classifier comparison using Plotly

http://nbviewer.jupyter.org/github/etpinard/plotly-misc-nbs/blob/master/ml-classifier-comp/ml-classifier-comp.ipynb

Fitting a model via closed-form equations vs. Gradient Descent vs Stochastic Gradient Descent vs Mini-Batch Learning. What is the difference?

A Friendly Introduction to Cross-Entropy Loss

https://rdipietro.github.io/friendly-intro-to-cross-entropy-loss/

How to choose algorithms for Microsoft Azure Machine Learning

New to Machine Learning? Avoid these three mistakes

Machine Learning Exercises In Python

Assessing Stability of K-Means Clusterings

Cross-Validation Gone Wrong

Probabilistic Machine Learning in PyMC3

Bias in ML, and Teaching AI

Solutions for Skilltest Machine Learning : Revealed

https://www.analyticsvidhya.com/blog/2016/11/solution-for-skilltest-machine-learning-revealed/

Machine Learning Performance Improvement Cheat Sheet

What is better: gradient-boosted trees, or a random forest?

http://fastml.com/what-is-better-gradient-boosted-trees-or-random-forest/

A Practical Guide to Tree Based Learning Algorithms

https://sadanand-singh.github.io/posts/treebasedmodels/

Model evaluation, model selection, and algorithm selection in machine learning

Part I - The basics

http://sebastianraschka.com/blog/2016/model-evaluation-selection-part1.html

Part II - Bootstrapping and uncertainties

http://sebastianraschka.com/blog/2016/model-evaluation-selection-part2.html

Part III - Cross-validation and hyperparameter tuning

http://sebastianraschka.com/blog/2016/model-evaluation-selection-part3.html

ROC / AUC

ROC: Receiver Operating Characteristic

AUC: Area Under the Curve

Tutorials: Plotting AP and ROC curves

http://www.vlfeat.org/overview/plots-rank.html

Beautiful Properties Of The Roc Curve

http://jxieeducation.com/2016-09-27/Beautiful-Properties-Of-The-ROC-Curve/

On calculating AUC

ROC to precision-recall curve translator

https://rafalab.shinyapps.io/roc-precision-recall/

t-SNE

How to Use t-SNE Effectively

Libraries

LambdaNet: Purely functional artificial neural network library implemented in Haskell

rustlearn: Machine learning crate for Rust

MILJS : Brand New JavaScript Libraries for Matrix Calculation and Machine Learning

machineJS: Automated machine learning- just give it a data file!

Machine Learning for iOS: Tools and resources to create really smart iOS applications

DynaML: Scala Library/REPL for Machine Learning Research

Smile - Statistical Machine Intelligence and Learning Engine

benchm-ml

  • intro: A minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc.) of the top machine learning algorithms for binary classification (random forests, gradient boosted trees, deep neural networks etc.).
  • github: https://github.com/szilard/benchm-ml

KeystoneML: Simplifying robust end-to-end machine learning on Apache Spark

Talisman: A straightforward & modular NLP, machine learning & fuzzy matching library for JavaScript

PRMLT: Pattern Recognition and Machine Learning Toolbox

The Fido Project: An open source C++ machine learning library targeted towards embedded electronics and robotics

rusty-machine: Machine Learning library for Rust

RoBO - a Robust Bayesian Optimization framework

Dlib: A toolkit for making real world machine learning and data analysis applications in C++

Bayesian Networks and Bayesian Classifier Software

ML-lib: An extensive machine learning library, made from scratch (Python)

Top Machine Learning Projects for Julia

Helit: My machine learning/computer vision library for all of my recent papers, plus algorithms that I just like.

Gorgonia: a library that helps facilitate machine learning in Go

GoLearn: Machine Learning for Go

Cortex: Machine learning in Clojure

ELI5: A library for debugging machine learning classifiers and explaining their predictions

PHP-ML - Machine Learning library for PHP

ml.js - Machine learning tools in JavaScript

https://github.com/mljs/ml

Propel

Resources

Machine Learning Surveys: A list of literature surveys, reviews, and tutorials on Machine Learning and related topics

http://www.mlsurveys.com/

machine learning classifier gallery

http://home.comcast.net/~tom.fawcett/public_html/ML-gallery/pages/

Machine Learning and Computer Vision Resources

http://zhengrui.github.io/zerryland/ML-CV-Resource.html

A Huge List of Machine Learning And Statistics Repositories

http://blog.josephmisiti.com/a-huge-list-of-machine-learning-repositories/

Machine Learning in Python Course

https://www.springboard.com/learning-paths/machine-learning-python/

机器学习(Machine Learning)&深度学习(Deep Learning)资料(Chapter 1)

https://github.com/ty4z2008/Qix/blob/master/dl.md

The Spectator: Shakir’s Machine Learning Blog

http://blog.shakirm.com/

Useful Inequalities

http://www.lkozma.net/inequalities_cheat_sheet/ineq.pdf

Math for Machine Learning

http://www.umiacs.umd.edu/~hal/courses/2013S_ML/math4ml.pdf

Cheat Sheet: Algorithms for Supervised- and Unsupervised Learning

Annalyzin: Analytics For Layman, with Tutorials & Experiments

https://annalyzin.wordpress.com/

ALGORITHMS: AI, Data Mining, Clustering, Data Structures, Machine Learning, Neural, NLP, …

Awesome Machine Learning: A curated list of awesome machine learning frameworks, libraries and software (by language)

awesome-machine-learning-cn: 机器学习资源大全中文版

Machine and Deep Learning with Python

useR! 2016 Tutorial: Machine Learning Algorithmic Deep Dive

Top-down learning path: Machine Learning for Software Engineers

30 Top Videos, Tutorials & Courses on Machine Learning & Artificial Intelligence from 2016 https://www.analyticsvidhya.com/blog/2016/12/30-top-videos-tutorials-courses-on-machine-learning-artificial-intelligence-from-2016/

Machine Learning Problem Bible (MLPB)

The most shared Machine Learning conten on Twitter from the past 7 days

  • Based on the millions of #machinelearning tweets already processed by The Herd Locker, noise is a little over 94% of the conversation. Tracking the 8,000 daily tweets that are tagged #machineLearning, the platform filters and ranks the most popular shared content in realtime. Machine learning’s zeitgeist, you might say. It’s been running for over a year, monitoring half a billion tweets a day, and will always be free to use. No ads. No BS. http://theherdlocker.com/tweet/popularity/machinelearning

Projects

Machine learning algorithms: Minimal and clean examples of machine learning algorithms

Plotting high-dimensional decision boundaries

Flappy Learning: Program that learns to play Flappy Bird by machine learning (Neuroevolution)

Readings / Questions / Discussions

A Super Harsh Guide to Machine Learning

https://www.reddit.com/r/MachineLearning/comments/5z8110/d_a_super_harsh_guide_to_machine_learning/

(Quora): What are the top 10 data mining or machine learning algorithms?

https://www.quora.com/What-are-the-top-10-data-mining-or-machine-learning-algorithms/answer/Xavier-Amatriain

(Quora): What are the must read papers on data mining and machine learning?

https://www.quora.com/What-are-the-must-read-papers-on-data-mining-and-machine-learning

(Quora): What would be your advice to a software engineer who wants to learn machine learning? https://www.quora.com/What-would-be-your-advice-to-a-software-engineer-who-wants-to-learn-machine-learning-3/answer/Alex-Smola-1

Machine Learning FAQ

MLNotes: Very concise notes on machine learning and statistics

Machine Learning Problem Bible (MLPB)

List of machine learning concepts

What is the relation between Logistic Regression and Neural Networks and when to use which?