Practical session, introduction to Spark: Difference between revisions
No edit summary |
No edit summary |
||
| Line 16: | Line 16: | ||
* [[Running Spark | Getting started with Apache Spark]] | * [[Running Spark | Getting started with Apache Spark]] | ||
=== Steps === | === Installation Steps === | ||
*** Spark: | |||
Windows: [https://medium.com/@ligz/installing-standalone-spark-on-windows-made-easy-with-powershell-7f7309799bc7] | Windows: [https://medium.com/@ligz/installing-standalone-spark-on-windows-made-easy-with-powershell-7f7309799bc7] or [https://www.knowledgehut.com/blog/big-data/how-to-install-apache-spark-on-windows] | ||
Linux: [https://medium.com/devilsadvocatediwakar/installing-apache-spark-on-ubuntu-8796bfdd0861] | Linux: [https://medium.com/devilsadvocatediwakar/installing-apache-spark-on-ubuntu-8796bfdd0861] | ||
| Line 33: | Line 33: | ||
https://docs.scala-lang.org/getting-started-intellij-track/getting-started-with-scala-in-intellij.html | https://docs.scala-lang.org/getting-started-intellij-track/getting-started-with-scala-in-intellij.html | ||
4 - Linking spark with | 4 - Linking spark with IntelliJ | ||
Scala: [https://www.c-sharpcorner.com/article/working-with-spark-and-scala-in-intellij-idea-part-i/] | Scala: [https://www.c-sharpcorner.com/article/working-with-spark-and-scala-in-intellij-idea-part-i/] | ||
Revision as of 11:39, 16 September 2019
Apache Spark
Purpose
- Getting up and running with
- Getting experience with non-trivial installation
- Using IntelliJ IDEA.
- Writing and running your own first Spark program
For a general introduction, see the slides to Session 2 on Apache Spark. Here is a useful tutorial: https://www.tutorialspoint.com/spark_sql/spark_introduction.htm . Configuring Spark dependency in InjelliJ IDEA http://spark.apache.org/docs/latest/rdd-programming-guide.html
Preparations
As for Hadoop, you will run Spark standalone on your computers (and independently of your previous Hadoop installation to keep things simple). Running Spark on a cluster of many computers is harder to set up (and you will need a cluster of computers), but after that, the coding and running of code is the same. Installing Spark Standalone to a Cluster http://spark.apache.org/docs/latest/spark-standalone.html
Follow these preparations to install Spark on your Linux or Windows-machine. If you are on MacOS, it runs BSD Unix under the hood, so most Linux-commands should work in a Terminal window on your Mac too.
Tasks
Installation Steps
- Spark:
Linux: [3]
2 – Install IntelliJ IDE
https://www.jetbrains.com/idea/
3 – Install Scala plugin in IntelliJ
4 - Linking spark with IntelliJ
http://spark.apache.org/docs/latest/rdd-programming-guide.html
5 – Do some exercise tasks
Extra links:
Apache Spark with Python:
