Practical session, introduction to Spark

Apache Spark

Purpose

Getting up and running with
Getting experience with non-trivial installation
Using IntelliJ IDEA.
Writing and running your own first Spark program

For a general introduction, see the slides to Session 2 on Apache Spark. Here is a useful tutorial: https://www.tutorialspoint.com/spark_sql/spark_introduction.htm . Configuring Spark dependency in InjelliJ IDEA http://spark.apache.org/docs/latest/rdd-programming-guide.html

Preparations

As for Hadoop, you will run Spark standalone on your computers (and independently of your previous Hadoop installation to keep things simple). Running Spark on a cluster of many computers is harder to set up (and you will need a cluster of computers), but after that, the coding and running of code is the same. Installing Spark Standalone to a Cluster http://spark.apache.org/docs/latest/spark-standalone.html

Follow these preparations to install Spark on your Linux or Windows-machine. If you are on MacOS, it runs BSD Unix under the hood, so most Linux-commands should work in a Terminal window on your Mac too.

Tasks

Getting started with Apache Spark

Steps

1 - Install Spark:

https://wiki.uib.no/info310/index.php/Spark_preparations

Linux: [1]

2 – Install IntelliJ IDE

               https://www.jetbrains.com/idea/

3 – Install Scala plugin in IntelliJ

https://docs.scala-lang.org/getting-started-intellij-track/getting-started-with-scala-in-intellij.html

4 - Linking spark with intellij

Scala: [2] Python: [3]

http://spark.apache.org/docs/latest/rdd-programming-guide.html

5 – Do some exercise tasks

Extra links:

Apache Spark with Python:

Install and Setup Apache Spark 2.2.0 Python in Windows - PySpark: [4]
Setup Jupyter Notebook for Apache Spark: [5]

Anonymous

Search

Practical session, introduction to Spark

Namespaces

More

Page actions

Contents

Apache Spark

Purpose

Preparations

Tasks

Steps

Navigation

Pages

Navigation

Wiki tools

Wiki tools

Anonymous

Search

Practical session, introduction to Spark

Apache Spark

Purpose

Preparations

Tasks

Steps

Navigation

Wiki tools

Page tools