Practical session, introduction to Spark: Difference between revisions

From info319
No edit summary
No edit summary
 
(17 intermediate revisions by the same user not shown)
Line 1: Line 1:
In every practical session, I expect you to install all needed software beforehand to focus on the tasks.
==Apache Spark==
==Apache Spark==
===Purpose===
===Purpose===
Line 6: Line 8:
* Writing and running your own first Spark program
* Writing and running your own first Spark program


For a general introduction, see the slides to [[:File:S4-Spark-intro.pdf | Session 2 on Apache Spark]]. Here is a useful tutorial: https://www.tutorialspoint.com/spark_sql/spark_introduction.htm . Configuring Spark dependency in InjelliJ IDEA http://spark.apache.org/docs/latest/rdd-programming-guide.html
For a general introduction, see the slides to [[:File:S2-Spark-intro.pdf | Session 2 on Apache Spark]]. Here is a useful tutorial: https://www.tutorialspoint.com/spark_sql/spark_introduction.htm . Configuring Spark dependency in InjelliJ IDEA http://spark.apache.org/docs/latest/rdd-programming-guide.html


===Preparations===
===Preparations===
Line 15: Line 17:
===Tasks===
===Tasks===
* [[Running Spark | Getting started with Apache Spark]]
* [[Running Spark | Getting started with Apache Spark]]
=== Installation Steps ===
Spark:
Windows: [https://medium.com/@ligz/installing-standalone-spark-on-windows-made-easy-with-powershell-7f7309799bc7] or [https://www.knowledgehut.com/blog/big-data/how-to-install-apache-spark-on-windows]
Linux: [https://medium.com/devilsadvocatediwakar/installing-apache-spark-on-ubuntu-8796bfdd0861]
Jupyter for Apche Spark in LINUX [https://medium.com/@singhpraveen2010/install-apache-spark-and-configure-with-jupyter-notebook-in-10-minutes-ae120ebca597]
Mac: [https://medium.com/@roshinijohri/spark-with-jupyter-notebook-on-macos-2-0-0-and-higher-c61b971b5007]
2 – Install IntelliJ IDE
                https://www.jetbrains.com/idea/
3 – Install Scala plugin in IntelliJ
https://docs.scala-lang.org/getting-started-intellij-track/getting-started-with-scala-in-intellij.html
4 - Linking spark with IntelliJ
Scala: [https://www.c-sharpcorner.com/article/working-with-spark-and-scala-in-intellij-idea-part-i/]
Python: [https://medium.com/@gauravmshah/pyspark-on-intellij-with-packages-auto-complete-5e3208504707]
http://spark.apache.org/docs/latest/rdd-programming-guide.html
5 – Do some exercise tasks
Extra links:
Apache Spark with Python:
*Install and Setup Apache Spark 2.2.0 Python in Windows - PySpark: [https://www.youtube.com/watch?v=WQErwxRTiW0]
*Setup Jupyter Notebook for Apache Spark: [https://bigdata-madesimple.com/guide-to-install-spark-and-use-pyspark-from-jupyter-in-windows/]

Latest revision as of 09:22, 18 August 2020

In every practical session, I expect you to install all needed software beforehand to focus on the tasks.

Apache Spark

Purpose

  • Getting up and running with
  • Getting experience with non-trivial installation
  • Using IntelliJ IDEA.
  • Writing and running your own first Spark program

For a general introduction, see the slides to Session 2 on Apache Spark. Here is a useful tutorial: https://www.tutorialspoint.com/spark_sql/spark_introduction.htm . Configuring Spark dependency in InjelliJ IDEA http://spark.apache.org/docs/latest/rdd-programming-guide.html

Preparations

As for Hadoop, you will run Spark standalone on your computers (and independently of your previous Hadoop installation to keep things simple). Running Spark on a cluster of many computers is harder to set up (and you will need a cluster of computers), but after that, the coding and running of code is the same. Installing Spark Standalone to a Cluster http://spark.apache.org/docs/latest/spark-standalone.html

Follow these preparations to install Spark on your Linux or Windows-machine. If you are on MacOS, it runs BSD Unix under the hood, so most Linux-commands should work in a Terminal window on your Mac too.

Tasks

Installation Steps

Spark:

Windows: [1] or [2]

Linux: [3] Jupyter for Apche Spark in LINUX [4]

Mac: [5]

2 – Install IntelliJ IDE

               https://www.jetbrains.com/idea/


3 – Install Scala plugin in IntelliJ


https://docs.scala-lang.org/getting-started-intellij-track/getting-started-with-scala-in-intellij.html

4 - Linking spark with IntelliJ

Scala: [6] Python: [7]


http://spark.apache.org/docs/latest/rdd-programming-guide.html


5 – Do some exercise tasks

Extra links:

Apache Spark with Python:

  • Install and Setup Apache Spark 2.2.0 Python in Windows - PySpark: [8]
  • Setup Jupyter Notebook for Apache Spark: [9]