Big Data Hadoop

About Big Data Hadoop:-

Hadoop is 100\% open or free source. It’s a collection of frameworks for storing and processing big data. Instead of relying on expensive proprietary hardware and different systems to store and process data Hadoop enables distributed parallel processing of huge amounts of data across inexpensive , industry standard servers that both store and process the data and can scale without limits. With Hadoop no data is too big and in today’s hyper connected world where more and more data is being created every day Hadoop breakthrough advantages mean that businesses and organizations can now find value in data that was recently considered useless. The Big Data Hadoop course is informative, comprehensive, concise and easy to understand consisting of immense quality solely focussed towards turning out the students into experts in Big Data.

Jmicro Technology Trainings in association with industry experts provides complete practical Big Data Hadoop Training in chennai. We are the oldest Big Data Hadoop training institute in chennai. Every module of Big Data Hadoop training is designed as per the current industry need. Companies like American Express , Citibank ,TCS ,Paytm ,Impetus ,Cognizant are hiring a lot in Big Data Hadoop domain.

 

Big Data Hadoop Course Objective:-

On completion of the Big Data Hadoop training candidates are expected to meet the following objectives.

  • Why Hadoop and Hadoop fundamental concepts
  • Deep understanding of MapReduce and HDFS concepts
  • Hadoop cluster planning and initial configuration
  • Installation and deployment of Hadoop
  • Learn to write complex MapReduce programme
  • Complete understanding of HBase the Hadoop Database
  • Understand Data loading techniques using sqoop and flume
  • Understanding of various Hadoop concepts like Hue,Zookeeper,Cloudera manager, Impala etc.
  • Perform Data Analytics on real time projects like Facebook and Twitter
  • Understand Data Analytics through Hive

Who should do this course?

The course is for candidates who have passion and determination to build career in Big Data. Apart from that it is suitable for candidates who are Data Analyst, Java developers, IT Managers, Any graduate / post graduate with an urge to learn Hadoop

Prerequisite:-

Candidates with basic core java knowledge and analytical skills are ideal for this training.

 

Big Data Hadoop Training - Daily (2 Hours)
Fast Track Class(4-6 Hours)
Weekend class(Saturday & Sunday)

Hadoop Developer  Course :-

1. Introduction to Hadoop and Big Data

  • Introduction to Big Data
  • Introduction to Hadoop
  • Why Hadoop & Hadoop Fundamental Concepts
  • History of Hadoop with Hadoopable problems
  • Scenarios where Hadoop is used
  • Available version Hadoop 1.x & 2.x
  • Overview of batch processing and real time data analytics using Hadoop
  • Hadoop vendors - Apache , Cloudera , Hortonworks
  • Hadoop services - HDFS , MapReduce , YARN
  • Introduction to Hadoop Ecosystem components ( Hive, Hbase, Pig, Sqoop, Flume, Zookeeper, Oozie, Kafka, Spark )

 

2. Cluster setup ( Hadoop 1.x )

  • Linux VM installation on system for Hadoop cluster using Oracle Virtual Box
  • Preparing nodes for Hadoop and VM settings
  • Install Java and configure password less SSH across nodes
  • Basic Linux commands
  • Hadoop 1.x single node deployment
  • Hadoop Daemons - NameNode, JobTacker, DataNode, TaskTracker, Secondary NameNode
  • Hadoop configuration files and running
  • Important web URLs and Logs for Hadoop
  • Run HDFS and Linux commands
  • Hadoop 1.x multi-mode deployment
  • Run sample jobs in Hadoop single and multi-node clusters

 

3. HDFS Concepts

  • HDFS Design Goals
  • Understand  Blocks and how to configure block size
  • Block replication and replication factor
  • Understand Hadoop Rack Awareness and configure racks in Hadoop
  • File read and write anatomy in HDFS
  • Enable HDFS Tash
  • Configure HDFS Name and space Quota
  • Configure and use WebHDFS ( Rest API For HDFS )
  • Health monitoring using FSCK command
  • Understand NameNode Safemode, File system image and edits
  • Configure Secondary NameNode and use checkpointing process to provide NameNode failover
  • HDFS DFSAdmin and File system shell commands
  • Hadoop NameNode / DataNode directory structure
  • HDFS permissions model
  • HDFS Offline Image Viewer

 

4. MapReduce Concepts

  • Introduction to MapReduce
  • MapReduce Architecture
  • Understanding the concept of Mappers & Reducers
  • Anatomy of MapReduce program
  • Phases of a MapReduce progam
  • Data-types in Hadoop MapReduce
  • Driver, Mapper and Reducer classes
  • InputSplit and RecordReader
  • Input format and Output format in Hadoop
  • Concepts of Combiner and Partitioner
  • Running and Monitoring MapReduce jobs
  • Writing your own MapReduce job using MapReduce API

 

5. Cluster setup ( Hadoop 2.x )

  • Hadoop 1.x Limitations
  • Design Goals for Hadoop 2.x
  • Introduction to Hadoop 2.x
  • Introduction to YARN
  • Components of YARN - Resource Manager, Node Manager, Application Master
  • Deprecated properties
  • Hadoop 2.x Single node deployment
  • Hadoop 2.x Multi node deployment

 

6. HDFS High Availability and Federation

  • Introduction to HDFS Federation
  • Understand Name service ID and Block pools
  • Introduction to HDFS High Availability
  • Failover mechanisms in Hadoop 1.x
  • Concept of Active and StandBy NameNode
  • Configuring Journal Nodes and avoiding split brain scenario
  • Automatic and manual failover techniques in HA using Zookeeper and ZKFC
  • HDFS HAadmin commands

 

7. YARN - Yet Another Resource Negotiator

  • YARN Architecture
  • Yarn Components - Resource Manager, Node Manager, Job History Server, Application Time LIne Server, MR Application Master
  • YARN Application execution flow
  • Running and Monitoring YARN Applications
  • Understand and Configure Capacity / Fair Schedulers in YARN
  • Define and configure Queues
  • Job History Server / Application Time Line Server
  • YARN Rest API
  • Writng and executing YARN applications

 

8. Hive

  • Problems with No-SQL Database
  • Introduction & Installation Hive
  • Data Types & Introduction to SQL
  • Hive-SQL: DML & DDL
  • Hive-SQL: Views & Indexes
  • Hive User Defined Functions
  • Configuration to HBase
  • Hive Thrift Service 
  • Introduction to HCatalog
  • Install and configure HCatalog services

 

9. Apache Flume 

  • Introduction to Flume
  • Flume Architecture and Installation
  • Define Flume agents - Sink, Source and Channel
  • Flume Use cases

 

10. Apache Pig

  • Introduction to Pig
  • Pig Installation
  • Accessing Pig Grunt Shell
  • Pig data Types
  • Pig Commands
  • Pig Relational Operators
  • Pig User Defined Functions
  • Configure PIG to use HCatalog

 

11. Apache Sqoop

  • Introduction to Sqoop
  • Sqoop Architecture and installation
  • Import Data using Sqoop in HDFS
  • Import all tables in Sqoop
  •  Export data from HDFS

 

12. Apache Zookeeper

  • Introduction to Apache Zookeeper
  • Zookeeper stand alone installation
  • Zookeeper Clustered installation
  • Understand Znodes and Ephemeral nodes
  • Manage Znodes using Java API
  • Zookeeper four letter word commands

 

13. Apache Oozie

  • Introduction to Oozie
  • Oozie Architecture
  • Oozie server installation and configuration
  • Design Workflows, Coordinator Jobs, Bundle Jobs in Oozie

 

14. Apache Hbase

  • Introduction to Hbase
  • Hbase Architecture
  • HBase components - Hbase master and Region servers
  • Hbase installation and configurations
  • Create sample tables and queries on HBase

 

15. Apache Spark / Storm / Kafka

  • Real Time data Analytics
  • Introduction to Spark / Storm / Kafka

 

16. Cluster Monitoring and Management tools

  • Cloudera Manager
  • Apache Ambari
  • Ganglia
  • JMX monitoring and Jconsole
  • Hadoop User Experience ( HUE )

 

 

Hadoop Administrator Course :-

1.The Motivation & Limitation for Hadoop

  • Problems with TraditionalLarge-Scale Systems
  • Why Hadoop&Hadoop Fundamental Concepts
  • History of Hadoop with Hadoopable problems
  • Motivation & Limitation of Hadoop
  • Available version Hadoop 1.x & 2.x
  • Available Distributions of Hadoop (Cloudera, Hortonworks)
  • Hadoop Projects & Components
  • The Hadoop Distributed File System (HDFS)

2.Hadoop Ecosystem& Cluster
Hadoop Ecosystem projects & Components overview 

  • HDFS – File System
  •  HBase – The Hadoop Database
  •  Cassandra – No-SQL Database
  •  Hive – SQL Engine
  • Mahout

Hadoop Architecture overview Cluster Daemons&Its Functions

  •  Name Node
  •  Secondary Node
  •  Data Nodes 

3. Planning Hadoop Cluster& Initial Configuration

  • General Planning Considerations
  • Choosing the Right Hardware
  • Network Considerations
  • Configuring Nodes
  • Planning for Cluster & Its Management
  • Types of Deployment
  • Cloudera Manager 

4. Installation &Deployment of Hadoop

  • Installing Hadoop (Cloudera)
  • Installation – Pig, Hive, HBase, Cassandra etc
  • Specifying the Hadoop Configuration
  • Performing Initial HDFS Configuration
  • Performing Initial YARN and MapReduce Configuration
  • Hadoop Logging&Cluster Monitoring 

5. Load Data and Run Application

  • Ingesting Data from External Sources withFlume
  • Ingesting Data from Relational Databaseswith Sqoop
  • REST Interfaces
  • Best Practices for Importing Data 

6. Manage, Maintain, Monitor, and troubleshoot of cluster

  • General System Monitoring
  • Monitoring Hadoop Clusters
  • Common Troubleshooting Hadoop Clusters
  • Common Misconfigurations
  • Managing Running Jobs
  • Scheduling Hadoop Jobs

7. Upgrade, Rolling andBackup

  • Cluster Upgrading
  • Checking HDFS Status
  • Adding and Removing Cluster Nodes
  • Name Node Meta Data Backup
  • Data Backup
  • Distributed Copy
  • Parallel Data Ingestion 

8. Conclusion&FAQs
Note:

  •  Every Topic has practical session
  •  Hadoop uses different components which discussed in required

sessions

  •  Hue
  • Cloudera Manager
  • Zookeeper
  •  Impala
  • etc 

Sign Up

Testimonials

  • Completed Big Data Hadoop Training from Jmicro Technology Trainings and get placed in American Express. The course material and the practical knowledge on all the frameworks of Hadoop helps me a lot during my interview and if anyone want to start their career in Hadoop then Jmicro Technology is no doubt the best institute to join.

    Shruti
    Big Data Hadoop - Placed in American Express
  • Its a wonderful opportunity to learn internet marketing from one of the best internet marketing expert

    Pawan Sehrawat
    Internet Marketing
  • Jmicro Technology Trainings is the best hadoop institute in chennai

    Manish
    Big Data Hadoop
  • The Training provided at Jmicro Technology is designed as per the current market need.After completion of the training from Jmicro Technology I feel myself more confident.There is lot of focus on the interview preparation also giving the students the real exposure of Interview

    Mr. Jogendra
    Software Testing Batch
  • Before Joining Jmicro Technology I always thought its very difficult to enter into the IT MNC's but after joining Jmicro Technology I realize its not that difficult if one gets proper training and guidance by Industry experts

    Ms. Daizy Teotia
    Software Testing Batch
  • Jmicro Technology provides a professional environment of learning with lots of focus on practicle training along with strong theoritical base.The faculties are very cooperative and highly experts in their area. Its great to be a student of Jmicro Technology

    Mr. Vibhav
    Software Testing Batch