Practical session, introduction to Spark

From info319
Revision as of 08:28, 24 August 2018 by Vimala (talk | contribs) (Created page with "==Apache Spark== ===Purpose=== * Getting up and running with * Getting experience with non-trivial installation * Using IntelliJ IDEA. * Writing and running your own first Sp...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Apache Spark

Purpose

  • Getting up and running with
  • Getting experience with non-trivial installation
  • Using IntelliJ IDEA.
  • Writing and running your own first Spark program

For a general introduction, see the slides to Session 2 on Apache Spark. Here is a useful tutorial: https://www.tutorialspoint.com/spark_sql/spark_introduction.htm . Configuring Spark dependency in InjelliJ IDEA http://spark.apache.org/docs/latest/sql-programming-guide.html.

Preparations

As for Hadoop, you will run Spark standalone on your computers (and independently of your previous Hadoop installation to keep things simple). Running Spark on a cluster of many computers is harder to set up (and you will need a cluster of computers), but after that, the coding and running of code is the same. Installing Spark Standalone to a Cluster http://spark.apache.org/docs/latest/spark-standalone.html

Follow these preparations to install Spark on your Linux or Windows-machine. If you are on MacOS, it runs BSD Unix under the hood, so most Linux-commands should work in a Terminal window on your Mac too.

Tasks