Mode: Hybrid (Classroom in Hyderabad for up to 25 students and Online via zoom for up to 75 students)
Duration: 3 to 4 months
Depending up on the feedback from the people who attended earlier kick off session, we are going to give overview about Data Engineering and Big Data, so that you can make well thought out decision. Please click here to fill the form and sign up for upcoming kickoff session on March 9th morning 8 AM India time. Also early bird discount is extended till March 9th end of day.
1. Be a pro
In this module we will emphasize on skills required to be pro in IT.
- Basics of Computer
- Windows Overview
- Editors Overview
- Presentation and Communication skills using Microsoft Office
- Emphasis on typing skills
2. Database Essentials
Database skills is key for any IT professional to excel. As part of this module we will focus on data modeling to writing advanced SQL queries
- Overview of Relational Databases
- Creating tables and manipulating data
- Basics SQL
- Analytical Functions
- Relating RDBMS with NoSQL
- Writing queries in MongoDB
3. Programming Essentials
Data Engineers should have good grasp of fundamentals of programming. We will be teaching Scala as well as Python as programming languages. There is no need of object oriented concepts. But if any one want to explore we will support.
- Data Types
- Basic programming constructs
- Pre-defined functions (string manipulation)
- User defined functions (including lambda functions)
- Basic I/O operations
- Database operations
- Externalizing properties
4. Linux Fundamentals
After Programming and Databases, understanding operating system on which these technologies run is very important to excel in any IT role.
- Overview of Operating System
- Logging into linux (including passwordless)
- Basic linux commands
- Editors such as vi/vim
- Regular expressions
- Processing information using awk/sed
- Basics of shell scripting
- Troubleshooting the issues
5. Overview of Big Data eco system
In this small but effective module we will give brief overview of all the technologies in Big Data eco system along with weightages.
- File Systems Overview
- Processing Engines Overview
- HDFS commands
6. Databases in Big Data
Now let us understand about how to create databases in Big Data.
- Hive Overview
- Creating databases, tables and loading data
- Queries in Hive
- Hive based engines
- File formats
- Integration of Spark SQL with Scala/Python – Overview
7. Building applications at scale
This is the most important part of the training. In this we will use Scala and/or Python and build applications using Spark.
- Overview of Spark
- Reading data from file systems
- Processing data using Core Spark API
- Processing data using Data Sets and/or Data Frames
- Processing data using Spark SQL
- Saving data to file systems
- Development life cycle
- Execution life cycle
- Troubleshooting and performance tuning
8. Data Ingestion
In this we will see how we can get data into HDFS from different sources.
- Copying data between RDBMS and HDFS using Sqoop
- Copying data between RDBMS and Hive using Sqoop
- Real time data ingestion using Flume
- Data Ingestion using Kafka
- Copying data between RDBMS and HDFS using Spark JDBC
9. Streaming Analytics
In this module we will get data in near real time and load into HDFS while processing it.
- Integrating data from Flume to Kafka
- Getting golden copy of data using Flume to HDFS
- Integration of Kafka and Spark Streaming
- Apply analytics rules on inflight data using Spark Steaming APIs
After processing the data we need to visualize it.
- Overview of BI and Visualization tools
- Setting up Tableau Desktop
- Connecting to different data sources
- Creating reports
- Creating dashboards
11. Big Data on Cloud
For many clients, having dedicated clusters is not necessary. Quite often we should be able to use pay-as-you-go model of Cloud and process data at regular intervals.
- Overview of Cloud
- Understanding AWS (Amazon Web Services)
- Setting up EC2 instances
- AWS CLI (Command Line Interface)
- Creating AWS EMR cluster using both web console as well as CLI
- Step execution
- Running Spark Jobs
- Deploying applications using Azkaban
12. Job Marketing and Social Networking tips
Towards the end of the training we will give some tips related to job marketing and social networking tips
- Understanding job portals
- Building LinkedIn Profile
- Using github
- Supporting on Stack Overflow
- Blogging on technology trends
- Lab access for 1 year
- Dedicated Slack group
- Lifetime access to training videos
- Practice exercises
- 24×7 community based support
Cost (15th March is final date):
- $600 for users from the US and other users abroad
- INR 28,000 + GST for Indian users
- Student Discount:
- $300 for students from the US and other students abroad
- INR 14,000 + GST for Indian students
- To avail the ‘Student Discount’, send an email to firstname.lastname@example.org with the following details:
- Email ID.
- College/University ID or a recent grade sheet
- Bulk discounts are also available for corporate companies.
Schedule and timings: 4 days a week
- US: Monday to Thursday, 9:30PM to 10:45PM Eastern Time (Some sessions may further extend by up to 15 minutes)
- India: Tuesday to Friday, corresponding AM time (Sessions could begin at 7AM or 8AM, based on US time-zone changes. Users will be informed accordingly.)
- All US and Indian holidays and long weekends will be honored.
Introduction to Data Engineering and Big Data
FAQs about the Bootcamp
- Why is it called a ‘Data Engineering’ bootcamp?
Data Engineering, by definition, is the practice of processing data for an enterprise. Through the course of this bootcamp, a user will learn this essential skill and will be equipped to process both streaming data and data in offline batches.
- Why does one need to learn Linux Fundamentals, Database Essentials, or languages like Python and Scala? What have they to do with Data Engineering?
Linux Fundamentals, Database Essentials and Programming are key to successful careers in the world of IT. For data engineering, a good understanding of Linux and SQL commands —besides the knowledge of programming languages like Python and/or Scala— is not only valuable, but essential. Their particular benefits include:
- Linux is the standard operating system used in various enterprises across the globe.
- Understanding Linux commands and a bit of shell scripting make one comfortable in enterprise world.
- Linux knowledge also helps in increase in productivity by improving troubleshooting, debugging and automating validations.
- Database Essentials such as SQL are relevant for a wide range of roles in the IT industry.
- Python is the preferred programming language in Data Science and Data Engineering, having a wide array of easy-to-use libraries to work with data.
- Scala, a programming language structurally similar to Python, is necessary to implement powerful Data Engineering tools like Spark
- Do I really need to know two programming languages, Python as well as Scala? Also, why not Java?
It is in no regard mandatory, but knowledge in both Python and Scala could significantly improve an IT professional’s access to opportunities in the world of Big Data. Java, however, is not considered flexible enough anymore to develop innovative data processing frameworks. Relatively difficult to use, Java is not as popular as Python or Scala for Data Engineering.
- Who’s the best suited audience for the bootcamp? Who can gain the most value from it?
- Traditional ETL and Data Warehouse developers
- Mainframes professionals hoping to switch careers to open systems. The bootcamp is sure to add great value to their prior experience in dealing with heavy volumes of data
- Testing professionals, to transition to development roles
- Entry level professionals, to learn essential skills that are relevant to their industries
- Is the bootcamp certified? Is the certification recognized elsewhere?
- The bootcamp offers a course completion certificate from ITVersity
- As of now, the certificate is not widely recognized in the corporate world
- However, ITversity offers certification-oriented content to help users prepare for industry recognized certifications in various Big Data technologies
- I am not a programmer, but have experience in the IT industry. Would I find Data Engineer training relevant?
Yes, Data Engineering can help further any career in IT, as long as one is open to learning basic programming skills, Linux Fundamentals and Database Essentials.
- How long would it take for a beginner to gain the skills to be employable?
It should take about four months, subject to one’s discipline and curiosity to learn.
- Does ITVersity offer assistance in job-seeking at the end of the bootcamp?
- Not only do we guarantee skills and industry-readiness by the end of the bootcamp, but also have experts give users specific tips to seek out relevant job opportunities
- ITVersity also engages with clients in staffing, and helps connect partnering corporates to rightly skilled individuals
- Training does not guarantee employment, but we make the best attempts we can to ensure each dedicated user gains the relevant skills, and is recognized for them
- What are the Certifications covered as part of this course?
- CCA 175 Spark and Hadoop Developer (100%)
- HDPCD:Spark (100%)
- HDPCD (70%)
- Databricks/O’reilly Certified Spark Developer (90%)
- MapR Certified Spark Developer (90%)