<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>http://info319.wiki.uib.no/index.php?action=history&amp;feed=atom&amp;title=Hadoop_preparations</id>
	<title>Hadoop preparations - Revision history</title>
	<link rel="self" type="application/atom+xml" href="http://info319.wiki.uib.no/index.php?action=history&amp;feed=atom&amp;title=Hadoop_preparations"/>
	<link rel="alternate" type="text/html" href="http://info319.wiki.uib.no/index.php?title=Hadoop_preparations&amp;action=history"/>
	<updated>2026-04-28T08:28:24Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.44.2</generator>
	<entry>
		<id>http://info319.wiki.uib.no/index.php?title=Hadoop_preparations&amp;diff=161&amp;oldid=prev</id>
		<title>Vimala: Created page with &quot;===Downloading=== Create a Hadoop folder on your computer. * &#039;&#039;&#039;Linux:&#039;&#039;&#039; Anywhere should do, but keep it out of the folders Linux already use for special purposes. I have cre...&quot;</title>
		<link rel="alternate" type="text/html" href="http://info319.wiki.uib.no/index.php?title=Hadoop_preparations&amp;diff=161&amp;oldid=prev"/>
		<updated>2018-09-17T12:01:08Z</updated>

		<summary type="html">&lt;p&gt;Created page with &amp;quot;===Downloading=== Create a Hadoop folder on your computer. * &amp;#039;&amp;#039;&amp;#039;Linux:&amp;#039;&amp;#039;&amp;#039; Anywhere should do, but keep it out of the folders Linux already use for special purposes. I have cre...&amp;quot;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;===Downloading===&lt;br /&gt;
Create a Hadoop folder on your computer.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Linux:&amp;#039;&amp;#039;&amp;#039; Anywhere should do, but keep it out of the folders Linux already use for special purposes. I have created a root folder called &amp;#039;&amp;#039;/opt&amp;#039;&amp;#039; and given myself full permission:&lt;br /&gt;
    sudo mkdir /opt&lt;br /&gt;
    sudo chmod u+rwx /opt&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Windows&amp;#039;&amp;#039;&amp;#039; has limits on file path lengths and Hadoop does not like spaces in paths, so &amp;#039;&amp;#039;C:\Program Files&amp;#039;&amp;#039; will not work. I created a root folder called &amp;#039;&amp;#039;C:\Programs&amp;#039;&amp;#039; and gave my self full rights to it (must be done as &amp;#039;&amp;#039;Administrator&amp;#039;&amp;#039;: open &amp;#039;&amp;#039;File Explorer&amp;#039;&amp;#039;, go to the root folder &amp;#039;&amp;#039;C:\&amp;#039;&amp;#039;, right click the &amp;#039;&amp;#039;Programs&amp;#039;&amp;#039; icon, and choose &amp;#039;&amp;#039;Run as administrator&amp;#039;&amp;#039;).&lt;br /&gt;
&lt;br /&gt;
Download an Apache Hadoop-archive from:&lt;br /&gt;
    http://hadoop.apache.org/releases.html&lt;br /&gt;
for example this one (I tried a 3.0.x beta version first, but I ran into trouble):&lt;br /&gt;
    http://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.8.1/hadoop-2.8.1.tar.gz&lt;br /&gt;
We will not need the source code archive, but it can be useful to look at some of the code examples, so download it if you want.&lt;br /&gt;
&lt;br /&gt;
===Unpacking===&lt;br /&gt;
Unpack the archive into your &amp;#039;&amp;#039;Hadoop installation folder&amp;#039;&amp;#039;, which should be a sub-folder of the one you just created:&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Windows:&amp;#039;&amp;#039;&amp;#039; On Windows, I unpacked the archive into &amp;#039;&amp;#039;C:\Programs\hadoop-2.8.1&amp;#039;&amp;#039;. &lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Linux:&amp;#039;&amp;#039;&amp;#039; Copy the &amp;#039;&amp;#039;hadoop-2.8.1.tar.gz&amp;#039;&amp;#039;-file into your new folder (e.g., &amp;#039;&amp;#039;/opt&amp;#039;&amp;#039;), and unpack it into , e.g., &amp;#039;&amp;#039;/opt/hadoop-2.8.1&amp;#039;&amp;#039;:&lt;br /&gt;
    cd /opt&lt;br /&gt;
    tar zxf hadoop-2.8.1.tar.gz&lt;br /&gt;
&lt;br /&gt;
On &amp;#039;&amp;#039;&amp;#039;Windows&amp;#039;&amp;#039;&amp;#039; you will need two additional executable files: &amp;#039;&amp;#039;hadoop.dll&amp;#039;&amp;#039; and &amp;#039;&amp;#039;winutils.exe&amp;#039;&amp;#039; (for an explanation see https://wiki.apache.org/hadoop/WindowsProblems). Downloading executables is always risky, so continue at your own peril. I downloaded them from here: https://github.com/steveloughran/winutils/tree/master/hadoop-2.8.1 and put then in the &amp;#039;&amp;#039;.../bin&amp;#039;&amp;#039; subfolder of my Hadoop installation folder (i.e., under &amp;#039;&amp;#039;C:\Programs\hadoop-2.8.1\bin&amp;#039;&amp;#039;).&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;(Apparently, there are both 32- and 64-bit versions of &amp;#039;&amp;#039;winutils.exe&amp;#039;&amp;#039;, according to https://hernandezpaul.wordpress.com/2016/01/24/apache-spark-installation-on-windows-10/ . The file I have linked above works on x64.)&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;Linux tip&amp;#039;&amp;#039;&amp;#039;: Over time, you may make many installations into your &amp;#039;&amp;#039;/opt&amp;#039;&amp;#039; folder, sometimes installing different versions of the same package (&amp;#039;&amp;#039;hadoop-2.8.1&amp;#039;&amp;#039;, &amp;#039;&amp;#039;hadoop-2.8.2&amp;#039;&amp;#039;, &amp;#039;&amp;#039;hadoop-2.8.3&amp;#039;&amp;#039; etc. To make this easier to manage, you can create an &amp;#039;&amp;#039;/opt/latest&amp;#039;&amp;#039; folder with links to the most recent version of each package:&lt;br /&gt;
    mkdir /opt/latest&lt;br /&gt;
    ln -s /opt/hadoop-2.8.3 /opt/latest/hadoop&lt;br /&gt;
Now you can use &amp;#039;&amp;#039;/opt/latest/hadoop&amp;#039;&amp;#039; in all your scripts and environment variables, so that upgrading to new versions becomes faster (and the names become shorter too).&lt;br /&gt;
&lt;br /&gt;
===Java===&lt;br /&gt;
You need Java and a Java Development Kit (JDK). I have used a recent version of Java 8. To check if you have a Java SDK and which version it is, do:&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Linux:&amp;#039;&amp;#039;&amp;#039; &lt;br /&gt;
    which javac&lt;br /&gt;
    javac -version&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Windows:&amp;#039;&amp;#039;&amp;#039; In a &amp;#039;&amp;#039;Command Prompt&amp;#039;&amp;#039; window, do&lt;br /&gt;
    javac -version&lt;br /&gt;
&lt;br /&gt;
To install a recent Java 8 Development Kit:&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Linux:&amp;#039;&amp;#039;&amp;#039; &lt;br /&gt;
    sudo apt install openjdk-8-jdk&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Windows:&amp;#039;&amp;#039;&amp;#039; Download an installer from http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html . It is best to install Java too into a folder with no space in its name, like &amp;#039;&amp;#039;C:\Programs\jdk1.8.0_121&amp;#039;&amp;#039;.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;MacOS:&amp;#039;&amp;#039;&amp;#039; Use an online tutorial for this. (The above link has installers for MacOS too, but you want a Java that is as tightly integrated into your operating system as possible.)&lt;br /&gt;
&lt;br /&gt;
In a console (or command prompt, or terminal) window, check that it works:&lt;br /&gt;
    javac -version&lt;br /&gt;
&lt;br /&gt;
===Environment variables===&lt;br /&gt;
You need to have three environment variables correctly set: JAVA_HOME, PATH, and HADOOP_CLASSPATH. To see if JAVA_HOME is set:&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Linux:&amp;#039;&amp;#039;&amp;#039; &amp;#039;&amp;#039;echo $JAVA_HOME&amp;#039;&amp;#039;&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Windows:&amp;#039;&amp;#039;&amp;#039; &amp;#039;&amp;#039;echo %JAVA_HOME%&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
If it is set correctly, the JAVA_HOME folder will have a &amp;#039;&amp;#039;lib/&amp;#039;&amp;#039; subfolder that contains a file &amp;#039;&amp;#039;tools.jar&amp;#039;&amp;#039;. The JAVA_HOME path shall not contain the substring &amp;#039;&amp;#039;/jre&amp;#039;&amp;#039; anywhere (but it contains a subfolder called &amp;#039;&amp;#039;jre&amp;#039;&amp;#039;). If it is not set, you need to find out where the JDK has been installed to: &lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Linux:&amp;#039;&amp;#039;&amp;#039; Check &amp;#039;&amp;#039;/usr/java/default&amp;#039;&amp;#039; or &amp;#039;&amp;#039;/usr/lib/jvm&amp;#039;&amp;#039;.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Windows:&amp;#039;&amp;#039;&amp;#039; Check &amp;#039;&amp;#039;C:\Programs-or-Program Files\jdk-or-java-something...&amp;#039;&amp;#039;.&lt;br /&gt;
&lt;br /&gt;
To set JAVA_HOME:&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Linux:&amp;#039;&amp;#039;&amp;#039; Add this line to your &amp;#039;&amp;#039;~/.bashrc&amp;#039;&amp;#039;-file:&lt;br /&gt;
    export JAVA_HOME=/path/to/your/jdk/installation/folder&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Windows:&amp;#039;&amp;#039;&amp;#039; Here it is hidden away. On Windows 10, in the &amp;#039;&amp;#039;Start&amp;#039;&amp;#039; menu, open &amp;#039;&amp;#039;Settings&amp;#039;&amp;#039; (the cog wheel), go to &amp;#039;&amp;#039;System -&amp;gt; About -&amp;gt; System info -&amp;gt; Advanced system settings -&amp;gt; Environment Variables&amp;#039;&amp;#039;. Here you can add and edit environment variables.&lt;br /&gt;
&lt;br /&gt;
On &amp;#039;&amp;#039;&amp;#039;Windows&amp;#039;&amp;#039;&amp;#039; there is one more problem: Because Hadoop does not like spaces in paths, you should not set &amp;#039;&amp;#039;JAVA_HOME&amp;#039;&amp;#039; to for example &amp;#039;&amp;#039;C:\Program Files\jdk1.8.0_121&amp;#039;&amp;#039;. Instead, Windows offers an alternative way to write paths like this: &amp;#039;&amp;#039;C:\Progra~1\jdk1.8.0_121&amp;#039;&amp;#039;. Here, &amp;#039;&amp;#039;Progra&amp;#039;&amp;#039; is the six first characters in the name of your folder, and the &amp;#039;&amp;#039;1&amp;#039;&amp;#039; is used to distinguish between them if you have several folders in &amp;#039;&amp;#039;C:&amp;#039;&amp;#039; that starts with &amp;#039;&amp;#039;Progra&amp;#039;&amp;#039;.&lt;br /&gt;
&lt;br /&gt;
Although you strictly do not need it, you may also want to set HADOOP_HOME in the same way, pointing to your &amp;#039;&amp;#039;Hadoop installation folder&amp;#039;&amp;#039;. It is good practice to always set environment variables like JAVA_HOME, HADOOP_HOME, etc. even when you do not need them immediately: other well-behaved packages you install later may be able to use them if they are set, thus saving you time and avoiding errors. Each such variable should point to an installation folder with a bin folder inside it, but not to the inner bin-folder itself. &lt;br /&gt;
&lt;br /&gt;
===Modifying your PATH===&lt;br /&gt;
You now need to change PATH to include JAVA_HOME/bin , and possibly also HADOOP_HOME/bin.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Linux:&amp;#039;&amp;#039;&amp;#039; Add this line to the end of ~/.bashrc:&lt;br /&gt;
    export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$PATH&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Windows:&amp;#039;&amp;#039;&amp;#039; You must go into the &amp;#039;&amp;#039;Environment variables&amp;#039;&amp;#039; tool again and edit &amp;#039;&amp;#039;PATH&amp;#039;&amp;#039;.&lt;br /&gt;
&lt;br /&gt;
Finally, you need to set HADOOP_CLASSPATH:&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Linux:&amp;#039;&amp;#039;&amp;#039; Add this line to the end of ~/.bashrc:&lt;br /&gt;
    export HADOOP_CLASSPATH=$JAVA_HOME/lib/tools.jar&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Windows:&amp;#039;&amp;#039;&amp;#039; You must go into the &amp;#039;&amp;#039;Environment variables&amp;#039;&amp;#039; tool and create a new HADOOP_CLASSPATH variable.&lt;br /&gt;
&lt;br /&gt;
To put the new environment variable in effect:&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Linux:&amp;#039;&amp;#039;&amp;#039; &lt;br /&gt;
    source ~/.bashrc&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Windows:&amp;#039;&amp;#039;&amp;#039; Close the Command Prompt window and open a new one.&lt;br /&gt;
&lt;br /&gt;
If things are right, these two commands should give similar output:&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Linux:&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
    echo $HADOOP_CLASSPATH&lt;br /&gt;
    ls $HADOOP_CLASSPATH&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Windows:&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
    echo %HADOOP_CLASSPATH%&lt;br /&gt;
    dir %HADOOP_CLASSPATH%&lt;br /&gt;
&lt;br /&gt;
==Running Hadoop==&lt;br /&gt;
Go to the Hadoop installation folder and check that it works:&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Linux:&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
    cd /opt/hadoop-2.8.1&lt;br /&gt;
    bin/hadoop&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Windows:&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
    cd C:\Programs\hadoop-2.8.1&lt;br /&gt;
    bin\hadoop&lt;br /&gt;
These commands should give you a list of possible uses of Hadoop.&lt;br /&gt;
&lt;br /&gt;
(If you added HADOOP_HOME/bin to your PATH you can run &amp;#039;&amp;#039;hadoop&amp;#039;&amp;#039; and &amp;#039;&amp;#039;hdfs&amp;#039;&amp;#039; from anywhere on your computer, without the &amp;#039;&amp;#039;bin/&amp;#039;&amp;#039;.)&lt;/div&gt;</summary>
		<author><name>Vimala</name></author>
	</entry>
</feed>