Practical session, introduction to Spark: Difference between revisions

Latest revision as of 09:22, 18 August 2020

In every practical session, I expect you to install all needed software beforehand to focus on the tasks.

Apache Spark

Purpose

Getting up and running with
Getting experience with non-trivial installation
Using IntelliJ IDEA.
Writing and running your own first Spark program

For a general introduction, see the slides to Session 2 on Apache Spark. Here is a useful tutorial: https://www.tutorialspoint.com/spark_sql/spark_introduction.htm . Configuring Spark dependency in InjelliJ IDEA http://spark.apache.org/docs/latest/rdd-programming-guide.html

Preparations

As for Hadoop, you will run Spark standalone on your computers (and independently of your previous Hadoop installation to keep things simple). Running Spark on a cluster of many computers is harder to set up (and you will need a cluster of computers), but after that, the coding and running of code is the same. Installing Spark Standalone to a Cluster http://spark.apache.org/docs/latest/spark-standalone.html

Follow these preparations to install Spark on your Linux or Windows-machine. If you are on MacOS, it runs BSD Unix under the hood, so most Linux-commands should work in a Terminal window on your Mac too.

Tasks

Getting started with Apache Spark

Installation Steps

Spark:

Windows: [1] or [2]

Linux: [3] Jupyter for Apche Spark in LINUX [4]

Mac: [5]

2 – Install IntelliJ IDE

               https://www.jetbrains.com/idea/

3 – Install Scala plugin in IntelliJ

https://docs.scala-lang.org/getting-started-intellij-track/getting-started-with-scala-in-intellij.html

4 - Linking spark with IntelliJ

Scala: [6] Python: [7]

http://spark.apache.org/docs/latest/rdd-programming-guide.html

5 – Do some exercise tasks

Extra links:

Apache Spark with Python:

Install and Setup Apache Spark 2.2.0 Python in Windows - PySpark: [8]
Setup Jupyter Notebook for Apache Spark: [9]

@@ Line 1: / Line 1: @@
+In every practical session, I expect you to install all needed software beforehand to focus on the tasks.
 ==Apache Spark==
 ===Purpose===

Anonymous

Search

Practical session, introduction to Spark: Difference between revisions

Namespaces

More

Page actions

Latest revision as of 09:22, 18 August 2020

Contents

Apache Spark

Purpose

Preparations

Tasks

Installation Steps

Navigation

Pages

Navigation

Wiki tools

Wiki tools

Anonymous

Search

Practical session, introduction to Spark: Difference between revisions

Latest revision as of 09:22, 18 August 2020

Apache Spark

Purpose

Preparations

Tasks

Installation Steps

Navigation

Wiki tools

Page tools