Getting Started with Spark

What is Spark?

Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs.

Install Spark
- If you do not currently have the Java JDK (version 7 or higher) installed, download it and follow the steps to install it for your operating system.
- Visit the Spark downloads page, select a pre-built package, and download Spark. Double-click the archive file to expand its contents ready for use.
- Move the expanded folder into a location suitable for your experiments!
Write some code!
- Lets write some Scala code and run it on Spark!
- If you still haven’t written any scala code, look at this previous blog post.
- This is the example code that we will run in Spark this time!
Package a jar containing your application
- At the root of your project execute: sbt package
- You should see something like the following in the console: Packaging ... playing-with-spark/target/scala-2.11/learning-with-spark_2.11-1.0.jar
Use spark-submit to run your application YOUR_SPARK_HOME/bin/spark-submit --class "com.learning.spark.LetterCounter" --master local[4] target/scala-2.11/learning-with-spark_2.11-1.0.jar
You should see in the output: Lines with a: 14, Lines with b: 9
Keep on learning about the Spark API with the Spark Programming Guide
For running applications on a cluster, go to deployment overview
Spark includes several Scala examples in the examples directory
- You can use YOUR_SPARK_HOME/bin/run-example EXAMPLE_NAME to run the Scala Examples