Big Data is everywhere. Several industries ranging from technology to finance to governments want to use Big Data Analysis techniques for knowledge discovery. Tools and techniques have been developed recently to handle various aspects of Big Data Analysis such as data aggregate tools, scalable storage, efficient retrieval, and faster analysis. It is important to get the awareness of this recent development in the area of Big Data Analytic tools and techniques. This course will look at the software and algorithms designed for coping with Big Data. We will train participant to get practical knowledge for Big Data Analysis.

Data Collections Data Science basics Faster Retrieval Techniques Efficient Data Analysis MapReduce and Hadoop
Databases and operationalizing Big Data Basic data models Analyzing data in motion Finding structures in data Distributed and parallel computing using R
Scalable Storage Advanced data models Security Concern in Big Data Applications Big Data case studies Big Data Applications


NOTE : Due to overwhelming number of applications registration is closed now.


We are excited to announce our first speakers :

Sonal Gupta
B.Tech. IIT Delhi

TITLE : Fuzzy Matching - Big Data for Analytics and Data Quality
Sonal is the founder, CEO at Nube Technologies (, a startup which makes tools for big data wrangling. Nube’s product Reifier is built on Apache Spark and machine learning to fuzzily match different mentions of an entity across sources. Reifier helps enterprises in getting 360 view of customer and product data, data quality, fraud and security and data management for downstream analytics. Sonal is a regular speaker at international big data and machine learning conferences. Previously, she also open sourced HIHO for Hadoop ETL and Crux reporting for Hbase at Github. Sonal holds a BTech from IIT Delhi.

Shweta Gupta
Senior Research Engineer, SnapDeal

TITLE: Scala and Spark for building Machine Learning on Big Data
Shweta has 5-6 years of experience in ML domain. She is working on deep learning (NLP area) focusing on big Data. She is interested in exploring scalable machine learning algorithm so that it can be applied to solve real world scenario.In Snapdeal, over a course of 16 months, She worked vividly on recommender systems, sentiment analysis using deep NLP. Prior to that, She worked as RA in IIT Roorkee RSL lab for a year, working on optimization problems.

Program Information

Virtual Environment Setup

  • Virtual Box
  • Installation
  • Setting HBase & Hadoop Environment


  • CRUD Operations
  • CAP Theorems


  • HDFS File System
  • Map Reduce Program
  • Java Program
  • HDFS code


  • Intro to Spark
  • Comparison
  • Filter Map

Data Collection Module

  • Web Crawler
  • Article Scraper
  • Comment Extractor
  • News Keyword Extractor
  • Twitter API

Data Processing

  • Sampling
  • Normalization
  • Transformation

Data Mining

  • Clustering
  • Classification
  • Association Rule Mining
  • WEKA

Data Stream

  • Intro to Data Stream
  • Intro to Apache Storm
  • Use Cases
  • Single Server Setup

Parallel Programming

  • Intro to Thread
  • Message Passing Interface (MPI)
  • GPU Programming
  • CUDA Programming

R Programming

  • Introduction
  • Language Constructs
  • Visualization
  • Classification and Regression


Dr.Dhaval Patel
Computer Science Department
IIT Roorkee

(01332-28) 5700