Big Data Resources
Courses
MIT 6.S897: Large-Scale Systems(Matei Zaharia)
- instructor: Matei Zaharia
- homepage: http://people.csail.mit.edu/matei/courses/2015/6.S897/
Papers
Learning to Hash for Indexing Big Data - A Survey
Random Forests for Big Data
Big data analytics: a survey
A Comparison of Big Data Frameworks on a Layered Dataflow Model
A survey of machine learning for big data processing
A Big Data Analysis Framework Using Apache Spark and Deep Learning
- intro: IEEE ICDM 2017 (International Conference on Data Mining) Workshop on Data Science and Big Data Analytics (DSBDA)
- intro: University of Delhi & Manav Rachna University & CMU]
- arxiv: https://arxiv.org/abs/1711.09279
Projects
Open Big Data Group
Open Big Data Group
- intro: This website contains a collection of libraries to be used in processing massive data size in highly distributed and paralleled environment
- homepage: http://openbigdatagroup.github.io/
PLDA: Parallel C++ implementation of Latent Dirichlet Allocation
PSVM: Parallelizing Support Vector Machines on Distributed Computers
- homepage: http://openbigdatagroup.github.io/psvm/
- paper: http://papers.nips.cc/paper/3202-parallelizing-support-vector-machines-on-distributed-computers.pdf
- github: https://github.com/openbigdatagroup/psvm
PFP: Parallel FP-Growth for Query Recommendation
Pspectralclustering: A parallel C++ implementation of Parallel Spectral Clustering
- homepage: http://openbigdatagroup.github.io/pspectralclustering/
- github: https://github.com/openbigdatagroup/pspectralclustering
Speedo: Parallelizing Stochastic Gradient Descent for Deep Convolutional Neural Network
- homepage: http://openbigdatagroup.github.io/speedo/
- github: https://github.com/openbigdatagroup/speedo
Videos
Awesome Big Data Algorithms
Blog
Uncovering Big Bias with Big Data