Are you a Business Analyst or Business Intelligence Developer or Data Warehouse Architect and want to understand tools in Big Data while preparing one of the industry recognized certification called CCA Data Analyst conducted by Cloudera? If answer is yes, then this course is for you.
As part of this course we will be providing step by step training to learn tools like Sqoop, Hive, Impala, Hue etc and make you well versed to take the certification.
Let us understand required skills for CCA Data Analyst by going to the official page.
Prepare the Data
Use Extract, Transfer, Load (ETL) processes to prepare data for queries.
- Import data from a MySQL database into HDFS using Sqoop
- Export data to a MySQL database from HDFS using Sqoop
- Move data between tables in the metastore
- Transform values, columns, or file formats of incoming data before analysis
Provide Structure to the Data
Use Data Definition Language (DDL) statements to create or alter structures in the metastore for use by Hive and Impala.
- Create tables using a variety of data types, delimiters, and file formats
- Create new tables using existing tables to define the schema
- Improve query performance by creating partitioned tables in the metastore
- Alter tables to modify existing schema
- Create views in order to simplify queries
Use Query Language (QL) statements in Hive and Impala to analyze data on the cluster.
- Prepare reports using SELECT commands including unions and subqueries
- Calculate aggregate statistics, such as sums and averages, during a query
- Create queries against multiple data sources by using join commands
- Transform the output format of queries by using built-in functions
- Perform queries across a group of rows using windowing functions
Exam delivery and cluster information
CCA159 is a hands-on, practical exam using Cloudera technologies. Each user is given their own CDH5 (currently 5.10.1) cluster pre-loaded with Spark, Impala, Crunch, Hive, Pig, Sqoop, Kafka, Flume, Kite, Hue, Oozie, DataFu, and many others (See a full list). In addition the cluster also comes with Python 2.7 and 3.4, Perl 5.16, Elephant Bird, Cascading 2.6, Brickhouse, Hive Swarm, Scala 2.11, Scalding, IDEA, Sublime, Eclipse, and NetBeans.
Even though as part of Cluster provided, you might get all above tools – following are the ones which are relevant to the exam.
If you can access Hue on the exam cluster, then providing solutions might become a bit easier.
Let us give you overview about the approach of the content.
- We know that most of the targeted audience for Data Analyst might not be very comfortable with Linux. However, as part of the exam you guys have to either use Hue or Command line to come up with the solutions.
- Content will be very hands-on, we will be connecting to Linux based environment using Terminal and running commands and scripts.
- You will be given lab access and you will be practicing on multi-node cluster.
- We will also give you certification simulator, so that you are comfortable to take the exam by connecting to remote host.
- We will be providing live support using our community based forums. Links will be available at the bottom of each page of the course.