Practical session, using Spark for emergency datasources: Difference between revisions

From info319
No edit summary
No edit summary
 
(11 intermediate revisions by the same user not shown)
Line 1: Line 1:
'''Practical session:'''
'''Practical session:'''
 
In every practical session, I expect you to install all needed software beforehand to focus on the tasks.
===Tasks:===
===Tasks:===


* '''1) Running Apache Hadoop and MapReduce:'''
*'''1) Running Apache Hadoop and MapReduce:'''
**[[Running Hadoop | Getting started with Hadoop]]
**[[Running Hadoop | Getting started with Hadoop]]


* '''2) Querying and analyzing open data source with Apache Spark.'''
* '''2) Apache Spark cluster setup in Azure HDInsight and data processing.'''
**'''Steps:'''
** Apache Spark cluster setup [https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-provision-linux-clusters]
*** 1) Download data from: https://data.sfgov.org/Public-Safety/Fire-Department-Calls-for-Service/nuek-vuh3. or Session file: [[:File:data.zip]].
** Load data and run queries on sensor data [https://docs.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-load-data-run-query]
*** 2) Setup an account in Data bricks: https://databricks.com/try-databricks.
 
*** 3) Create a cluster in Databricks.
 
*** 4) Import files from zip folder to workspace.
 
*** 5) Open '''Fire incidents exploration - RunMe''' file in cloud.databricks browser.


'''Solve Variables error:''' [[variables.png]]
'''Solve Variables error:''' [[:File:Variables.pdf]]

Latest revision as of 09:22, 18 August 2020

Practical session: In every practical session, I expect you to install all needed software beforehand to focus on the tasks.

Tasks:

  • 2) Apache Spark cluster setup in Azure HDInsight and data processing.
    • Apache Spark cluster setup [1]
    • Load data and run queries on sensor data [2]



Solve Variables error: File:Variables.pdf