This course is divided into three modules:

Module 1: Fundamental Big Data Analysis & Science

This module provides an in-depth overview of essential topic areas pertaining to data science and analysis techniques relevant and unique to Big Data with an emphasis on how analysis and analytics need to be carried out individually and collectively in support of the distinct characteristics, requirements and challenges associated with Big Data datasets. The following primary topics are covered:

  • Data Science, Data Mining & Data Modeling
  • Big Data Dataset Categories
  • Exploratory Data Analysis (EDA) (including numerical summaries, rules & data reduction)
  • EDA analysis types (including univariate, bivariate & multivariate)
  • Essential Statistics (including variable categories & relevant mathematics)
  • Statistics Analysis (including descriptive, inferential, correlation, covariance & hypothesis testing)
  • Data Munging & Machine Learning
  • Variables & Basic Mathematical Notations
  • Statistical Measures & Statistical Inference
  • Distributions & Data Processing Techniques
  • Data Discretization, Binning, Clustering
  • Visualization Techniques & Numerical Summaries
  • Correlation for Big Data
  • Time Series Analysis for Big Data

Module 2: Advanced Big Data Analysis & Science

This module delves into a range of advanced data analysis practices and analysis techniques that are explored within the context of Big Data. The course content focuses on topics that enable participants to develop a thorough understanding of statistical, modeling, and analysis techniques for data patterns, clusters, and text analytics, as well as the identification of outliers and errors that affect the significance and accuracy of predictions made on Big Data datasets. The following primary topics are covered:

  • Statistical Models, Model Evaluation Measures (including cross-validation, bias-variance, confusion matrix & f-score)
  • Machine Learning Algorithms, Pattern Identification (including association rules & apriori algorithm)
  • Advanced Statistical Techniques (including parametric vs. non-parametric, clustering vs. non-clustering distance-based, supervised vs. semi-supervised)
  • Linear Regression & Logistic Regression for Big Data
  • Decision Trees for Big Data
  • Classification Rules for Big Data
  • K Nearest Neighbor (kNN) for Big Data
  • Naïve Bayes for Big Data
  • Association Rules for Big Data
  • K-means for Big Data
  • Text Analytics for Big Data
  • Outlier Detection for Big Data

Module 3: Big Data Analysis & Science Lab

This course module covers a series of exercises and problems designed to test the participant's ability to apply knowledge of topics covered previously in course modules 4 and 5. Completing this lab will help highlight areas that require further attention, and will further prove hands-on proficiency in Big Data analysis and science practices as they are applied and combined to solve real-world problems.

As a hands-on lab, this course incorporates a set of detailed exercises that require participants to solve various inter-related problems, with the goal of fostering a comprehensive understanding of how different data analysis techniques can be applied to solve problems in Big Data environments and used to make significant, relevant predictions that offer increased business value.

For instructor-led delivery of this lab course, the Certified Trainer works closely with participants to ensure that all exercises are carried out completely and accurately. Attendees can voluntarily have exercises reviewed and graded as part of the class completion.