Setup Spark on Windows – for development

As part of this blog we will see instructions about setting up Spark and Scala IDE for Eclipse on windows laptop. It is primarily intended for learning or development purposes.

IntroductionPre-requisitesInstall Spark on WindowsSetup Hadoop binary files on WindowsSetup Scala Run sample scala applicationSetup build tool (sbt)Setup Scala IDE for EclipseWhat's Next?Stay Connected

We will see following topics to cover step by step process for setting up Spark to develop applications.

  • Pre-requisites
  • Installing Spark on Windows
  • Setting up Scala IDE for Eclipse
  • Using build tools (sbt)
  • Develop and run sample application
  • What’s Next?

After setting up the environment either one will be able to practice Spark to learn or develop applications. One can use Mac or Windows or Linux environment. But this blog primarily focus on Windows.

To learn details about setting up environment on Mac, one can visit the course – Big Data Engineer Immersion and go through relevant topics.

There are 2 ways to setup Spark on Windows (or other platforms)

  • There are typical two ways to install/run Apache Spark on Windows
    • Run Spark-Shell in RPEL on windows through spark-shell
      • This approach is quick to setup, but tedious while building applications.
      • Having IDE can accelerate development process significantly
      • Running apache spark through eclipse/Any other IDE
      • It is tedious, but recommended to have IDE such Eclipse, IntelliJ etc
  • Pre-requisites
    • Setup Java with JDK (1.8+ is recommended)
    • Install Eclipse with Maven Plugin (eg: STS)
    • Add Scala IDE plugin
    • sbt is more popular build tool than Maven when it comes to Scala based application development
    • We will cover Maven for this blog
    • We have content about sbt on Mac – first few topics of Apache Spark with Scala
    • We can follow similar steps for Windows as well

Let us see how Spark can be setup on Windows

  • Go to http://spark.apache.org/downloads.html
  • Use the correct spark release along with Hadoop pre-build and correct Scala version(2.10/2.11)
  • Choose download type as Direct Download
  • Then click the Download Spark link marked by arrow

01downloadspark

  • This will download Spark pre build binary in download folder
  • Copy the extracted folder in C Drive
  • Create SPARK_HOME Environment Variable pointing to this folder
  • Search for environment variables (in Windows 8 or later using search beside start menu)

02sparkenvironmentvariable

  • Validate Spark (by running spark-shell)

03launchingsparkshell

Typically we use IDEs such as Eclipse or Intellij with Scala plugin to develop and validate the programs before deploying it in production. Here are the steps involved

  • Develop programs or applications using Scala IDE
  • Validate program in both local or standalone mode
  • Spark have ability to read data from different file systems as it uses HDFS
  • To read data using Spark, we need to have some additional configuration for HDFS APIs to work
  • If additional configuration is not done, we might run into this error
ERROR Shell: Failed to locate the winutils binary in the hadoop binary pathjava.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries
  • Here are the steps to setup winutils binary for HDFS APIs to overcome the issue
    • Download winutils.exe and save it to a directory of your choice, say c:\hadoop\bin
    • Set HADOOP_HOME to reflect the directory with winutils (without bin)
    • Set HADOOP_HOME=c:\hadoop
    • Set PATH environment variable to include %HADOOP_HOME%\bin
    • HADOOP_HOME is new Environment Variable and PATH needs to be appended
    • Both are System Variables

Here are the instructions to setup scala

  • Download scala binaries
  • Install (untar) scala binaries
  • Update environment variable PATH
  • Launch scala interpreter/CLI and run simple scala program
  • Copy below code snippet and paste in scala interpreter/CLI
object hw {
  def main(args: Array[String]) {
    println("Hello World!")
  }
}

Following are the steps to create simple scala application

  • Make sure you are in right directory
  • Create src/main/scala mkdir -p src/main/scala
  • create file hw.scala under src/main/scala
  • Paste above code, save and exit
  • Run using scala src/main/scala/hw.scala

Here are the instructions to setup sbt

  • Download sbt
  • Install sbt
  • Go to the directory where you have scala source code
  • Create build.sbt
  • Package and run using sbt

Make sure you are in the right directory (application home directory)

  • Copy below snippet to build.sbt
  • Make sure you create necessary source directories (mandatory)
  • src/main/scala for source code
  • src/main/resources for resources such as conf/properties files
name := "hw"
version := "1.0"
scalaVersion := "2.11.8"
  • Build by running sbt package
  • sbt package will download all the dependencies for the first time, hence it might take few minutes
  • A jar file will be created under target/scala-<major_version>/<application_name>_<scala_major_version>-<application_version>.jar
  • Example: target/scala-2.11/hw_2.11-1.0.jar
  • Run using sbt sbt run
  • Run using scala scala target/scala-2.11/hw_2.11-1.0.jar

We are in the process of setting up development environment on our PC/Mac so that we can develop the modules assigned. Following tasks are completed.

  • Make sure Java is installed
  • Setup Eclipse (as part of Setup Eclipse with Maven)
  • Setup Scala
  • Setup sbt
  • Validate all the components

Before setting up Scala IDE let us understand the advantages of having IDE

  • The Scala IDE for Eclipse project lets you edit Scala code in Eclipse.
  • Syntax highlighting
  • Code completion
  • Debugging, and many other features
  • It makes Scala development in Eclipse a pleasure.

Steps to install Scala IDE for Eclipse

  • Launch Eclipse
  • Go to “Help” in top menu -> “Eclipse Marketplace”
  • Search for Scala IDE
  • Click on Install
  • Once installed restart Eclipse
  • Go to File -> New -> and see whether “Scala Application” is available or not.

Once Scala IDE is installed, it is time to validate that we can develop scala based applications

  • Create new workspace “bigdata-spark-scala”
  • Launch “Scala Interpreter” and validate scala code snippets
  • Create simple “Scala Application”
  • Run simple “Scala Application”

Once the environment is set

For free IT content on vast array of topics

 

4 Comments

  • Venkat Posted October 18, 2016 1:19 am

    Hi Durga, This is based on Mac OS. I’m looking for the same on Windows environment.
    Thanks.

     
    • Durga Viswanatha Raju Gadiraju Posted November 5, 2016 5:21 pm

      Except the error, rest of the steps are same for Mac and Windows.

       
  • kanthi Posted November 24, 2016 12:55 am

    I have followed same instructions for windows( Scala IDE,eclipse and sbt) which is working fine except spark.
    To access spark API’s(dependencies) which version of spark binaries should be installed?

     

Add Comment

Leave a Reply

shares

Big Data Introduction - YouTube live video

Please click here

Subscribe!