0 STUDENTS ENROLLED

    Module 1

    Understanding Hadoop

        • The Three Vs of Big Data, Six Key Hadoop DATA TYPES, Sentiment Use Case
        • Getting Twitter Feeds into Hadoop, Use HCatalog to Define a Schema, Use Hive to Determine Sentiment, View Spikes in Tweet Volume, View Sentiment by Country, Geolocation Use Case
        • The Geolocation Data, Getting the Raw Data into Hadoop, The Truck Data, Getting the Truck Data into Hadoop, HCatalog Stores a Shared Schema
        • Data Analysis, Use Hive to Compute Truck Mileage, About Hadoop, Relational Databases vs. Hadoop, About Hadoop 2.x
        • New in Hadoop 2.x, The Hadoop Ecosystem, The Hortonworks Data Platform (HDP), The Path to ROI
        • Lab: Start an HDP 2.1 Cluster

    Module 2

    Lab: Start an HDP 2.1 Cluster

        • About HDFS, Hadoop and RDBMS differentiate, HDFS Components, The NameNode, The DataNodes, DataNode Failure, HDFS Commands

    Module 3

    Inputting Data into HDFS

        • Examples of HDFS Commands, HDFS File Permissions, Options for Data Input, The Hadoop Client, Web HDFS, A Flume Example
        • Overview of Sqoop, The Sqoop Import Tool, Importing a Table, Importing Specific Columns, Importing from a Query, The Sqoop Export Tool, Exporting to a Table.
        • Lab: Importing RDBMS Data into HDFS
        • Lab: Exporting HDFS Data to an RDBMS

    Module 4

    The MapReduce Framework

        • Understanding MapReduce, The Key/Value Pairs of MapReduce, WordCount in MapReduce
        • Demo: Understanding MapReduce
        • Lab: Running a MapReduce Job

    Module 5

    Introduction to Pig

        • About Pig, Pig Latin
        • The Grunt Shell
        • Demo: Understanding Pig
        • Pig Latin Relation Names
        • Pig Latin Field Names& Data Types
        • Pig Complex Types
        • Defining a Schema
        • Lab: Getting Started with Pig
        • The GROUP Operator, GROUP ALL, Relations without a Schema, The FOREACH…GENERATE Operator, Specifying Ranges in FOREACH, Field Names in FOREACH, FOREACH with Groups, The FILTER Operator, The LIMIT Operator
        • Lab: Exploring Data with Pig

    Module 6

    Advanced Pig Programming

        • The ORDER BY Operator, The CASE Operator, Parameter Substitution, DISTINCT, PARALLEL, FLATTEN, Operator, Performing an Inner and outer Join, Invoking a UDF, Tips for Optimizing Pig Scripts
        • Lab: Joining Datasets
        • Preparing Data for Hive

    Module 7

    Hive Programming

        • About Hive, Comparing Hive to SQL, Hive Architecture, Submitting Hive Queries, Defining a Hive-Managed Table, Defining an External Table, Defining a Table LOCATION, Loading Data into Hive, Performing Queries
        • Understanding Hive Tables, Hive Partitions, Hive Buckets, Skewed Tables, Demo: Understanding Partitions and Skew, Using Distribute By, Storing Results to a File, Specifying MapReduce Properties
        • Lab: Analyzing Big Data with Hive
        • Lab: Understanding MapReduce in Hive
        • Hive Join Strategies, Shuffle Joins, Map (Broadcast) Joins, Sort-Merge-Bucket Joins, Invoking a Hive UDF, Computing ngrams in Hive
          Demo: Computing programs

    Module 8

    Using Hcatalog

        • About Hcatalog, HCatalog in the Ecosystem
        • Defining a New Schema
        • Using HCatLoader with Pig
        • Using HCatStorer with Pig, The Pig SQL Command
        • Lab: Using HCatalog with Pig

    Module 9

    Advanced Hive Programming

        • Performing a Multi-Table/File Insert
        • Understanding Views, Defining Views, Using Views, The TRANSFORM Clause, The OVER Clause, Using Windows, Hive Analytics Function Lab: Advanced Hive Programming
        • Hive File Formats, Hive SerDes, Hive ORC Files, Computing Table Statistics, Hive Cost-Based Optimization (CBO), Using Hive CBO, Vectorization, Using HiveServer2, Understanding Hive on Tez, Using Tez for Hive Queries
        • Demo: Hive Optimizations
        • Hive Optimization Tips, Hive Query Tunings, Lab: Streaming Data with Hive and Python

    Module 10

     Hadoop 2 and YARN

        • About HDFS Federation, Multiple Federated NameNodes, Multiple Namespaces
        • Overview of HDFS HA, Quorum Journal Manager, Configuring Automatic Failover
        • About YARN, Open-source YARN Use Cases
        • The Components of YARN
        • The life cycle of a YARN Application
        • A Cluster View Example

    Module 11

    Defining Workflow with Oozie

      • Submitting a Workflow Job, Fork and Join Nodes
      • Defining an Oozie Coordinator Job
      • Schedule a Job Based on Time
      • Schedule Based on Data Availability
      • Lab: Defining an Oozie Workflow

     

    Upcoming Batches:

    Starts Duration
    (Normal Track)
    Duration
    (Fast Track)
    Days  
    21st Jan
    Mon-Fri
    3rd Feb
    Sat & Sun
    4th Feb
    Mon-Fri
    16th Feb
    Sat & Sun

    Course Reviews

    No Reviews found for this course.

    0 Responses on Hadoop"

    Leave a Message

    Contact Us for Fee
    • Contact for Fee
    • 1 week, 3 days
  • Toll free: 1800 3070 2228
  • Mobile: +91-9582786406 / 07
  • Email Us : info@tgcindia.com
  • Whatsapp : +91 98100 31162
  • Drop Us A Query


    Show Buttons
    Hide Buttons

    Request a Call Back