Apache Hadoop and MapReduce: Difference between revisions
(Created page with "==Hadoop and MapReduce== ===Purpose=== * Getting up and running with Apache Hadoop and MapReduce * Getting experience with non-trivial installation * Writing and running your...") |
No edit summary |
||
(2 intermediate revisions by the same user not shown) | |||
Line 14: | Line 14: | ||
===Tasks=== | ===Tasks=== | ||
* [[Running Hadoop | Getting started with Hadoop]] | * [[Running Hadoop | Getting started with Hadoop]] | ||
* [[Practical session, using Spark for emergency datasources | Running Spark cluster in the cloud]] |
Latest revision as of 12:44, 17 September 2018
Hadoop and MapReduce
Purpose
- Getting up and running with Apache Hadoop and MapReduce
- Getting experience with non-trivial installation
- Writing and running your own first program
For a general introduction, see the slides to Session 3. We will follow this tutorial closely: https://hadoop.apache.org/docs/r2.8.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html .
Preparations
You will run Hadoop standalone on your computers. Running Hadoop on a cluster of many computers is harder to set up (and you will need a cluster of computers), but after that, the coding and running of code is the same. So the jobs we will run on small datasets on a single computer will scale to large datasets on clusters of many powerful computers.
Follow these preparations to install Hadoop on your Linux or Windows-machine. If you are on MacOS, it runs BSD Unix under the hood, so most Linux-commands should work in a Terminal window on your Mac too.