SR TECHNOLOGIES
#101, Santhinilaya, Behind HUDA Maitrivanam
PH: +91-9676126684
HADOOP Development & Admin Course
Hadoop Introduction:-
- What is Hadoop? Why Hadoop?
- Hadoop History?
- Different types of Components in Hadoop?
ð HDFS, MapReduce, PIG, Hive, SQOOP, HBASE, OOZIE, Flume, Zookeeper and so on…
- What is the scope of Hadoop?
Hadoop Distributed File System (HDFS) (for Storing the Data):-
ð Introduction of HDFS
ð Features of HDFS
ð Daemons of Hadoop
- Name Node
- Secondary Name Node
- Job Tracker
- Data Node
- Task Tracker
ð Basic Configuration for HDFS
ð Data Organization and Replication
ð Rack Awareness, Heartbeat Signal
ð How to Store the Data into HDFS
ð Accessing HDFS (Introduction of Basic UNIX commands)
ð CLI commands
MapReduce using Java (Processing the Data):-
ð Introduction of MapReduce.
ð MapReduce Architecture
ð Data flow in MapReduce
- Splits
- Mapper
- Portioning
- Sort and shuffle
- Combiner
- Reducer
ð Basic Configuration of MapReduce
ð MapReduce life cycle
ð Writing and Executing the Basic MapReduce Program using Java
ð File Input Formats
ð Joins
- Map-side Joins
- Reducer-side Joins
PIG:-
ð Introduction to Apache PIG
ð MapReduce vs PIG
ð Basic PIG programming
ð Modes of Execution in PIG
- Local Mode and
- MapReduce Mode
ð Execution Mechanisms
- Grunt Shell
- Script
- Embedded
ð Operators in PIG
ð PIG UDF’s
SQOOP:-
ð Introduction to SQOOP
ð Connect to mySql database
ð SQOOP commands
- Import
- Export
- Eval
- Codegen and etc…
ð Joins in SQOOP
HIVE:-
ð Introduction to HIVE
ð HIVE Architecture
ð Tables in HIVE
- Managed Tables
- External Tables
ð Partition
ð Joins in HIVE
ð HIVE UDF’s and UADF’s
HBASE:-
ð Introduction to HBASE and Basic Configurations of HBASE
ð HBASE Architecture
ð SQL vs NOSQL
ð How HBASE is differ from RDBMS
ð Client side buffering or bulk uploads
Cluster Setup:--
ð Downloading and installing the Hadoop
ð Creating Cluster
ð Increasing Decreasing the Cluster size
ð Monitoring the Cluster Health
ð Starting and Stopping the Nodes
Introduction about OOZIE, FLUME and ZOOKEEPER and some sample programs.
Topics Covered in Hadoop Developer
- Introduction to Big Data and Hadoop
- Hadoop ecosystem concepts
- Hadoop MapReduce concepts and features
- Developing MapReduce applications
- Pig concepts
- Hive concepts
- Real-time queries with Impala
- Real life use cases
Introduction to Big Data and Hadoop
- What is Big Data?
- What is Hadoop?
- Why Hadoop?
- History of Hadoop
- Hadoop ecosystem
- HDFS
- MapReduce
- Install Hadoop
- Single Node Hadoop Setup
- Test run Hadoop commands
- Hands on
- Understanding the Cluster
- Writing files to HDFS
- Reading files from HDFS
- Rack awareness
- 5 daemons
- Deep Dive into MapReduce
- Before MapReduce
- MapReduce overview
- Architecture MapReduce
- Word count problem
- Word count flow and solution
- MapReduce flow
- Developing the MapReduce Application
- Data Types
- File Formats
- Explain the Driver, Mapper and Reducer code
- Configuring development environment – Eclipse
- Writing unit test
- Running locally
- Running on cluster
- Hands on
Monitoring MapReduce Job Status
- Job submission
- Job initialization
- Task assignment
- Job completion
- Job scheduling
- Job failures
- Shuffle and sort
- Hands on
MapReduce Types and Formats
- MapReduce types
- Input Formats – Input splits records, text input, binary input, multiple inputs database input
- Output Formats – text output, binary output, multiple outputs, lazy output and database output
- Hands on
MapReduce Features
- Sorting
- Joins – Map side and reduce side
- MapReduce combiner
- MapReduce partitioner
- MapReduce distributed cache
- Hands on
Hive
- Fundamentals
- Concepts
- Hands-on
Pig
- Fundamentals
- Concepts
- Hands-on
Sqoop
- Fundamentals
- Concepts
- Hands-on
Flume
- Fundamentals
- Concepts
- Hands-on
Case Studies
- Real time use case explanation