There are several Hadoop and Spark Developer Certifications available today to test developer skills in Hadoop eco system and Spark. Before getting into details of certifications, first we need to understand different technologies available in Hadoop and Spark eco system.
What is Hadoop and Spark?
Hadoop has two primary components HDFS and Map Reduce. HDFS, which stands for Hadoop Distributed File System is used to distribute larger files on to multiple nodes and Map Reduce is to process data using distributed computing. Spark is in memory distributed computing framework does not have any file system and can work with many file systems such as native OS file system, cloud based file system (eg: s3) as well as HDFS.
Both of these have several tools in their eco system.
|Eco System||Tool or Technology||Description|
|Hadoop||HDFS||A command line utility to copy files into HDFS|
|Hadoop||Map Reduce||Java based technology to process data at scale. It works with other programming languages as well|
|Hadoop||Hive||A logical database with features such as creating tables, loading data and process data in Hadoop. It typically works with Map Reduce and can be integrated with Spark as well.|
|Hadoop||Pig||A data flow language to process data using map reduce|
|Hadoop||Sqoop||A command line tool to export and import data between Hadoop and Relational Databases|
|Hadoop||Flume||An agent based tool to integrate streaming data into Hadoop|
|Hadoop||Oozie||A map reduce based workflow tool|
|Spark||Core||In memory distributed computing framework which provides APIs in the form of actions and transformations. It can work with Scala, Python as well as Java.|
|Spark||DataFrames||An interface to define structure on top of data and then process data with any developer friendly tools like SQL, Hive etc|
|Spark||Streaming||To process data while data is being streamed into target data stores|
|Spark||MLLib||Machine learning algorithms using Spark APIs|
|Spark||GraphX||Graph libraries using Spark APIs|
Why should one get certified?
- To promote skills
- Demonstrate industry recognized validation for your expertise.
- Meet global standards required to ensure compatibility between Spark and Hadoop
- Stay up to date with the latest advances in Big Data technologies such as Spark and Hadoop
What are certifications that are available? What are the skills tested in each certification exam?
There are several certifications provided by leading vendors in the market. Here are the list of the certifications mapped with vendor who provides certifications.
|Vendor||Certification exam||Skills tested|
|Cloudera||CCA Spark and Hadoop Developer||HDFS, Sqoop, Flume, Spark with Python and Scala, Hive, Impala, avro|
|Cloudera||CCP Data Engineer||Sqoop, Flume, Hive, Oozie|
|Hortonworks||HDP Certified Developer – Java (HDPCD:Java)||Java Map Reduce APIs|
|Hortonworks||HDP Certified Developer||HDFS, Sqoop, Flume, Hive, Pig|
|Hortonworks||HDP Certified Developer – Spark (HDPCD:Spark)||Core Spark and Data Frames|
|MapR||MapR Certified Hadoop Developer||Map Reduce APIs – Java as well as Streaming|
|MapR||MapR Certified HBase Developer||HBase using Java|
|MapR||MapR Certified Spark Developer||Core Spark, Spark Streaming, Spark DataFrames, Spark MLLib|
|Databricks||Databricks Certified Developer||Core Spark, Spark Streaming, Spark DataFrames, Spark MLLib, GraphX|
What is the cost and duration of each of the certification?
Cost might change and vendors might give discounts. Better to check price from respective vendor before signing up.
|Cloudera||CCA Spark and Hadoop Developer||USD $295||2 hours|
|Cloudera||CCP Data Engineer||USD $400||4 hours|
|Hortonworks||HDP Certified Developer – Java (HDPCD:Java)||USD $250||2 hours|
|Hortonworks||HDP Certified Developer||USD $250||2 hours|
|Hortonworks||HDP Certified Developer – Spark (HDPCD:Spark)||USD $250||2 hours|
|MapR||MapR Certified Hadoop Developer||USD $250||2 hours|
|MapR||MapR Certified HBase Developer||USD $250||2 hours|
|MapR||MapR Certified Spark Developer||USD $250||2 hours|
|Databricks||Databricks Certified Developer||USD $300|
Format of the exam
What is the exam format? How many questions exam might have?
70% is the cut-off for most of the certification examinations. However Cloudera and Hortonworks certifications are scenario based.
|Vendor||Certification exam||Exam format and number of questions|
|Cloudera||CCA Spark and Hadoop Developer||Programming – 10 to 12 scenarios|
|Cloudera||CCP Data Engineer||Programming – 5 to 8 scenarios|
|Hortonworks||HDP Certified Developer – Java (HDPCD:Java)||Programming – 1 scenario|
|Hortonworks||HDP Certified Developer||Programming – 7 to 10 scenarios|
|Hortonworks||HDP Certified Developer – Spark (HDPCD:Spark)||Programming –|
|MapR||MapR Certified Hadoop Developer||Objective – 60 to 80 questions|
|MapR||MapR Certified HBase Developer||Objective – 60 to 80 questions|
|MapR||MapR Certified Spark Developer||Objective – 60 to 80 questions|
|Databricks||Databricks Certified Developer||Objective|
Is Java required?
Java is becoming less relevant in the Big Data space. Python and Scala are gaining lot of momentum. Most of the Spark based applications are implemented using Python or Scala. They are relatively easy to implement. That being said except below certifications, java is not necessary to be certified. Also these certifications are getting less relevant compared to other Spark based certifications.
- MapR Certified Hadoop Developer
- MapR Certified HBase Developer
- HDP Certified Developer – Java (HDPCD:Java)
Choose the certification exam from available options
- If one is already certified in HDPCD:Java or CCDH (discontinued) – then better to go with CCA Spark and Hadoop developer. It covers Sqoop, Flume, Spark, Hive and Impala
- If one is already certified in HDPCD (Sqoop, Flume, Pig etc), then HDPCD:Spark or MapR Certified Spark Developer or Databricks Certified Developer are better choices over CCA Spark and Hadoop Developer as many topics are redundant between it and HDPCD (Sqoop, Flume, Pig etc). My first choice would be HDPCD:Spark as it is subjective exam. But it is relatively new and there can be some challenges for the first time.
- If there are no certifications yet, then it will be good idea to go with CCA Spark and Hadoop Developer as it covers many Hadoop eco system tools as well as Core Spark.
How to register for Certification?
- Cloudera Certifications
- For all Hortonworks Certifications
- Create an account at www.examslocal.com.
- Once you are registered and logged in, select “Schedule an Exam”, and then enter “Hortonworks” in the “Search Here” field.
- Locate and select the “Hortonworks : HDP Certified Developer (HDPCD) – English” exam.
- Choose the date and time that you want to attempt the exam, and that is it!
Where can one take the certification?
- Most of the certifications are proctored. One can give the certification any where in silent area with 64 bit computer, web cam and good internet.
- Visit respective course URL for more details
What is typical duration for certification exam?
- It is subjective from person to person, certification to certification
- For freshers with degree in CS or IT it might take at least 2 to 3 months for any certification
- For experience folks, it might take around a week to 2 months (depending up on their background and core competencies)
What are available resources from itversity?
Following are free resources that are available on itversity in the preparation of the exam
- Start with published curriculum for each of the certification
- Take classes or prepare in the direction of the curriculum published
- Use 1:3 model – for every 1 hour theory, 3 hour practice. It might vary for some people.
- Take practice exams if they are available
What are the job opportunities after doing that as a fresher? Will certification help?
- There are no positions by title for Hadoop developer or Big Data Developer for fresher
- Typically job description says “Software Developer” or “Software Engineer” and then company will decide which technology candidate should be placed depending up on the projects they have
- There is no role such as Hadoop Developer or Big Data developer
- However some small companies might emphasize Big Data or Hadoop in their job description to attract good talent
- Certifications can instill confidence in freshers while facing the technical interviews
- For startups certification might help up to some extent for freshers