Description

Type

Training
Level

Beginner

Location

Bangalore
Duration

4 Days

Xebia is an official training partner of Cloudera, the leader in Apache Hadoop-based software and services.

This four days hands-on data analyst training, focusing on Apache Pig and Hive and Cloudera Impala, will teach you to apply traditional data analytics and business intelligence skills to Big Data.

Learn the tools data professionals need to access, manipulate, and analyze complex data sets using SQL and familiar scripting languages.

Facilities

Bangalore (Karnātaka)

See map

Start date

On request

About this course

Course objectives

The fundamentals of Apache Hadoop and data ETL (extract, transform, load), ingestion, and processing with Hadoop tools,
How to apply the fundamentals of familiar scripting languages to the Hadoop cluster with Apache Pig.
You will have hands-on experience in:

Joining multiple data sets and analyzing disparate data with Pig,
Organizing data into tables, performing transformations, and simplifying complex queries with Hive,
Making multi-structures data accessible with Hive.
You will have the skills to:

Perform real-time interactive analyses on massive data sets stored in HDFS or HBase using SQL with Impala,
Pick the best analysis tool for a given task in Hadoop
Enable real-time interactive analysis of the data stored in Hadoop via a native SQL environment with Cloudera Impala.

Who is it intended for?

This course is best suited to data analysts, business analysts, developers and administrators who have experience with SQL and basic UNIX or Linux commands.

Prior knowledge of Java and Apache Hadoop is not required.

Questions & Answers

Add your question

Our advisors and other users will be able to reply to you

Who would you like to address this question to?

All
Students
Centre

Fill in your details to get a reply

We will only publish your name and question

Reviews

Subjects

Apache Hadoop
Distributed Data Processing: YARN
MapReduce
And Spark
Data Processing and Analysis: Pig
Hive
And Impala
Sqoop

Teachers and trainers (1)

Xebia Xebia

Trainer

Course programme

Course Outline: Introduction

Hadoop Fundamentals
The Motivation for Hadoop
Hadoop Overview
Data Storage: HDFS
Distributed Data Processing: YARN, MapReduce, and Spark
Data Processing and Analysis: Pig, Hive, and Impala
Data Integration: Sqoop
Other Hadoop Data Tools
Exercise Scenarios Explanation

Introduction to Pig

What Is Pig?
Pig’s Features
Pig Use Cases
Interacting with Pig

Basic Data Analysis with Pig

Pig Latin Syntax
Loading Data
Simple Data Types
Field Definitions
Data Output
Viewing the Schema
Filtering and Sorting Data
Commonly-Used Functions

Processing Complex Data with Pig

Storage Formats
Complex/Nested Data Types
Grouping
Built-In Functions for Complex Data
Iterating Grouped Data

Multi-Dataset Operations with Pig

Techniques for Combining Data Sets
Joining Data Sets in Pig
Set Operations
Splitting Data Sets

Pig Troubleshooting and Optimization

Troubleshooting Pig
Logging
Using Hadoop’s Web UI
Data Sampling and Debugging
Performance Overview
Understanding the Execution Plan
Tips for Improving the Performance of Your Pig Jobs

Introduction to Hive and Impala

What Is Hive?
What Is Impala?
Schema and Data Storage
Comparing Hive to Traditional Databases
Hive Use Cases

Querying with Hive and Impala

Databases and Tables
Basic Hive and Impala Query Language Syntax
Data Types
Differences Between Hive and Impala Query Syntax
Using Hue to Execute Queries
Using the Impala Shell

Data Management

Data Storage
Creating Databases and Tables
Loading Data
Altering Databases and Tables
Simplifying Queries with Views
Storing Query Results

Data Storage and Performance

Partitioning Tables
Choosing a File Format
Managing Metadata
Controlling Access to Data

Relational Data Analysis with Hive and Impala

Joining Datasets
Common Built-In Functions
Aggregation and Windowing

Working with Impala

How Impala Executes Queries
Extending Impala with User-Defined Functions
Improving Impala Performance

Analyzing Text and Complex Data with Hive

Complex Values in Hive
Using Regular Expressions in Hive
Sentiment Analysis and N-Grams
Conclusion

Hive Optimization

Understanding Query Performance
Controlling Job Execution Plan
Bucketing
Indexing Data

Extending Hive

SerDes
Data Transformation with Custom Scripts
User-Defined Functions
Parameterized Queries

Choosing the Best Tool for the Job

Comparing MapReduce, Pig, Hive, Impala, and Relational Databases
Which to Choose?

See related categories

Contact us

Authorised Cloudera Data Analyst Training | 4 Days

Questions & Answers

Reviews

Subjects

Course programme

Add similar courses
and compare them to help you choose.

Authorised Cloudera Data Analyst Training | 4 Days

Questions & Answers

Reviews

Subjects

Course programme

Add similar coursesand compare them to help you choose.

Add similar courses
and compare them to help you choose.