Setup Single Node Hadoop and Spark lab

Intro

This blog covers step by step instructions to setup single node lab for Hadoop and Spark eco system using Cloudera Distribution.

Almost all the leading vendors provide virtual machine image to learn all the tools in Big Data eco system including but not limited to Hadoop and Spark. But virtual machine requires higher configuration such as 16 GB RAM, i7 Quad Core, SSD etc. Even if there is 16 GB RAM setting up all the tools on it need not be feasible. There are many cloud providers who are providing infrastructure using pay-as-you go model and also credits to explore their environment. It might be feasible for some folks to set up cluster on cloud. Also this exercise will make people understand the basics behind setting up the clusters.

This lesson covers how to set up single node lab on cloud (for eg: AWS). Apart from provisioning the instances rest of the steps to setup single node lab are same as provided here to install Cloudera Distribution of Hadoop.

  • Signup to the cloud
  • Provision ec2 instance from AWS
  • Setup MySQL Database
  • Setup Pre-requisites (OS level) for Hadoop
  • Install Cloudera Manager
  • Install Cloudera Distribution of Hadoop
  • Validate HDFS, YARN+MR2
  • Validate Hive, Pig, Sqoop etc
  • Setup retail_db database (for sqoop)
  • Setup gen_logs (for streaming)
 

2 Comments

  • siva Posted August 3, 2016 11:23 am

    sir… i have hadoop 0.20 am working on it and hive,pig,hbase,sqoop i can run on this…
    my laptop configuration 4Gb ram…
    i want to migrate to latest hadoop… how can i …
    i cant install 8 gb ram due to some problems… can i upgrade to hadoop ?plz help

     
    • Durga Viswanatha Raju Gadiraju Posted August 3, 2016 11:35 am

      It might work, but it will waste lot of your time in troubleshooting the issues.

       

Add Comment

Leave a Reply

shares

Big Data Introduction - YouTube live video

Please click here

Subscribe!