Apache Hadoop and MapReduce

Hadoop and MapReduce

Purpose

Getting up and running with Apache Hadoop and MapReduce
Getting experience with non-trivial installation
Writing and running your own first program

For a general introduction, see the slides to Session 3. We will follow this tutorial closely: https://hadoop.apache.org/docs/r2.8.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html .

Preparations

You will run Hadoop standalone on your computers. Running Hadoop on a cluster of many computers is harder to set up (and you will need a cluster of computers), but after that, the coding and running of code is the same. So the jobs we will run on small datasets on a single computer will scale to large datasets on clusters of many powerful computers.

Follow these preparations to install Hadoop on your Linux or Windows-machine. If you are on MacOS, it runs BSD Unix under the hood, so most Linux-commands should work in a Terminal window on your Mac too.

Anonymous

Search

Apache Hadoop and MapReduce

Namespaces

More

Page actions

Contents

Hadoop and MapReduce

Purpose

Preparations

Tasks

Navigation

Pages

Navigation

Wiki tools

Wiki tools

Anonymous

Search

Apache Hadoop and MapReduce

Hadoop and MapReduce

Purpose

Preparations

Tasks

Navigation

Wiki tools

Page tools