Cloudera Certified Administrator of Apache Hadoop – CCAH

Hadoop Certification - CCAH - Introduction

 

CCAH is categorized into 4 major areas. Here are the categories and weightage.

  • HDFS (17%)
    • Describe the function of HDFS Daemons
    • Describe the normal operation of an Apache Hadoop cluster, both in data storage and in data processing.
    • Identify current features of computing systems that motivate a system like Apache Hadoop.
    • Classify major goals of HDFS Design
    • Given a scenario, identify appropriate use case for HDFS Federation
    • Identify components and daemon of an HDFS HA-Quorum cluster
    • Analyze the role of HDFS security (Kerberos)
    • Determine the best data serialization choice for a given scenario
    • Describe file read and write paths
    • Identify the commands to manipulate files in the Hadoop File System Shell
  • YARN and MapReduce version 2 (MRv2) (17%)
    • Understand how upgrading a cluster from Hadoop 1 to Hadoop 2 affects cluster settings
    • Understand how to deploy MapReduce v2 (MRv2 / YARN), including all YARN daemons
    • Understand basic design strategy for MapReduce v2 (MRv2)
    • Determine how YARN handles resource allocations
    • Identify the workflow of MapReduce job running on YARN
    • Determine which files you must change and how in order to migrate a cluster from MapReduce version 1 (MRv1) to MapReduce version 2 (MRv2) running on YARN.
  • Hadoop Cluster Planning (16%)
    • Principal points to consider in choosing the hardware and operating systems to host an Apache Hadoop cluster.
    • Analyze the choices in selecting an OS
    • Understand kernel tuning and disk swapping
    • Given a scenario and workload pattern, identify a hardware configuration appropriate to the scenario
    • Given a scenario, determine the ecosystem components your cluster needs to run in order to fulfill the SLA
    • Cluster sizing: given a scenario and frequency of execution, identify the specifics for the workload, including CPU, memory, storage, disk I/O
    • Disk Sizing and Configuration, including JBOD versus RAID, SANs, virtualization, and disk sizing requirements in a cluster
    • Network Topologies: understand network usage in Hadoop (for both HDFS and MapReduce) and propose or identify key network design components for a given scenario
  • Hadoop Cluster Installation and Administration (25%)
    • Given a scenario, identify how the cluster will handle disk and machine failures
    • Analyze a logging configuration and logging configuration file format
    • Understand the basics of Hadoop metrics and cluster health monitoring
    • Identify the function and purpose of available tools for cluster monitoring
    • Be able to install all the ecoystem components in CDH 5, including (but not limited to): Impala, Flume, Oozie, Hue, Cloudera Manager, Sqoop, Hive, and Pig
    • Identify the function and purpose of available tools for managing the Apache Hadoop file system
  • Resource Management (10%)
    • Understand the overall design goals of each of Hadoop schedulers
    • Given a scenario, determine how the FIFO Scheduler allocates cluster resources
    • Given a scenario, determine how the Fair Scheduler allocates cluster resources under YARN
    • Given a scenario, determine how the Capacity Scheduler allocates cluster resources
  • Monitoring and Logging (15%)
    • Understand the functions and features of Hadoop’s metric collection abilities
    • Analyze the NameNode and JobTracker Web UIs
    • Understand how to monitor cluster Daemons
    • Identify and monitor CPU usage on master nodes
    • Describe how to monitor swap and memory allocation on all nodes
    • Identify how to view and manage Hadoop’s log files
    • Interpret a log file

As part of this blog/course we will cover all the topics in detail.

 

6 Comments

  • sunny Posted September 7, 2016 8:36 pm

    Hi Durga,

    I have a question about external tables in hive.
    If we drop external table only meta data will be lost.
    Is there any way to back up external table.
    Can we create CTAS table from external table.

    If data for external table table was lost then how to retain it.

    Thanks,
    Pavani.

     
    • Durga Viswanatha Raju Gadiraju Posted September 7, 2016 11:23 pm

      There is no concept of automatic back up in Hive.
      You can CTAS from external table.

      If you have a copy of table in another location or outside HDFS, you have to copy again.

       
  • sunny Posted September 8, 2016 9:43 pm

    Hi Durga,

    -When we run a hive job or pig job or sqoop job , where exactly it runs in hadoop cluster.?
    -If you have a hive partition on state, Let’s say you have 10 partitions on 10 states, what if you get 11th state ? where does that 11th state data goes.
    -How to update your record in hive when loading data from RDBMS into hive using sqoop ?
    -Ideally how many no of buckets we can declare at max.

    Thanks in advance 🙂

     
  • sunny Posted September 9, 2016 2:00 am

    Hi Durga,
    -When we run a hive job or pig job or sqoop job , where exactly it runs in hadoop cluster.?
    -If you have a hive partition on state, Let’s say you have 10 partitions on 10 states, what if you get 11th state ? where does that 11th state data goes.
    -How to update your record in hive when loading data from RDBMS into hive using sqoop ?
    -Ideally how many no of buckets we can declare at max.
    Thanks in advance 🙂

     
    • Durga Viswanatha Raju Gadiraju Posted September 9, 2016 2:27 am

      These are not straight forward questions to answer. You have to complete the course. I have not uploaded the content yet. Meanwhile you can watch videos

       
  • Srinatha T Posted November 4, 2016 8:24 am

    Hi Durga,

    I am newbie to the world of bigdata and in our team we just started to implement Hadoop. I want to change my domain from Linux administration to Hadoop admin. I was thinking of doing certification also, so which certification i should go for Hortonworks(HDPCA) or cloudera(CCAH)? people say that doing certification on one of this will work but i wanted to know which is widely used in the market and i will have better job opportunity. Looking forward to your valuable feedback regarding this.

    I appreciate all your hard work regarding making the videos and giving us easy way to get started with Hadoop. Its really useful and for newbies like me it helps really a lot. Thanks a ton for that.

    Thanks,
    Srinath T

     

Add Comment

Leave a Reply

shares

Big Data Introduction - YouTube live video

Please click here

Subscribe!