BigData And Hadoop

BigData And Hadoop

Overview

Apache Hadoop enables organizations to analyze massive volumes of structured and unstructured data and is currently very hot trend across the software tech industry. Hadoop is being adopted as the default enterprise data hub by most of the enterprise. Hence Hadoop is being tagged by many as one of the most desired tech skills for 2015 and coming years.

What participants will learn?

This course will provide you an excellent kick start in building your fundamentals in developing big data solutions using Hadoop platform and its ecosystem tools. The course is well balanced between theory and hands-on lab, spread on real world uses cases like retail data analysis, sentiment analysis, log analysis, real time trend analysis etc.

The participants will learn below topics through sessions and hands-on exercises

  • Understand Big Data, Hadoop 2.0 architecture, the Hadoop Ecosystem
  • Deep Dive into HDFS and YARN Architecture
  • Writing map reduce programs
  • Advanced Map Reduce features & algorithms
  • How to leverage Hive & Pig for structured and unstructured data analysis
  • Data import and export using Sqoop and Flume and create workflows using Oozie
  • Hadoop Best Practices, Sizing and capacity planning
  • Creating reference architectures for big data solutions
  • An introduction to the world of NoSQL

Duration: 3 Days

Intended Audience

Architects and developers, who wish to write, build and maintain Apache Hadoop jobs.

Prerequisites

The participants should have basic knowledge of java, SQL and Linux. It is advised to refresh
these skills to obtain maximum benefit from this workshop.

Detailed Course Outline

  • What is Big Data & Why Hadoop?
  • Big Data Characteristics, Challenges with traditional system
  • Hadoop Overview & it’s Ecosystem
  • Anatomy of Hadoop Cluster, Installing and Configuring Hadoop
  • Setting up hadoop cluster (Single Node)
  • HDFS and YARN
  • HDFS Architecture, Name Nodes, Data Nodes and Secondary Name Node
  • Understanding HDFS HA and Federation architecture
  • YARN Architecture, Resource Manager, Node Manager and Application Master
  • Hands-On Exercise
  • Map Reduce Anatomy (MR2)
  • How Map Reduce Works?
  • Writing Mapper, Reducer and Driver using Java APIs,
  • Understanding Hadoop Data Type, Input& Output Formats
  • Hands On Exercises
  • Developing Map Reduce Programs
  • Setting up Eclipse Development Environment, Creating Map Reduce Projects, Debugging and Unit Testing
  • Developing a map reduce algorithm on real world scenario
  • Hands On Exercises
  • Advanced Map Reduce Concepts
  • Combiner, Partitioner, Counter, Setup and cleanup, Distributed Cache
  • Passing parameters, Multiple Inputs, Chaining multiple jobs
  • Applying Compression, Speculative Execution, Zero Reducers
  • Handling small files and bad records, Handling Binary data like images, documents etc.
  • Map and Reduce Side Joins, data partitioning
  • Sqoop & Flume
  • Importing and Exporting data from RDBMS using Sqoop
  • Importing and Exporting data from non-RDBMS sources using Flume
  • Hands On Exercise using Sqoop
  • Structured Data Analysis using Hive
  • Hive Architecture, Internal & External Tables
  • Writing queries – Joins, Union, Partitioning, Buckets
  • Writing UDFs, reading different data formats
  • Hands On Exercise
  • Hands On Exercise – Tweets Analysis
  • Semi or Unstructured Data Analysis using Pig
  • Pig Basics, Loading data files
  • Writing queries – SPLIT, FILTER, JOIN, GROUP, SAMPLE, ILLUSTRATE etc.
  • Writing UDFs
  • Understanding Oozie workflow definitions
  • Hadoop Best Practices, Advanced Tips & Techniques
  • Managing HDFS and YARN
  • Hadoop Cluster sizing, capacity planning and optimization
  • Hadoop Deployment options
  • The NoSQL Movement
  • Introduction to HBase