Hands-on Hadoop Tutorial

  • Uploaded by: sanasri87
  • 0
  • 0
  • May 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Hands-on Hadoop Tutorial as PDF for free.

More details

  • Words: 480
  • Pages: 13
Hands-On Hadoop Tutorial Chris Sosa Wolfgang Richter May 23, 2008

General Information  Hadoop

uses HDFS, a distributed file system based on GFS, as its shared filesystem

 HDFS

architecture divides files into large chunks (~64MB) distributed across data servers

 HDFS

has a global namespace

General Information (cont’d) 

Provided a script for your convenience – Run source /localtmp/hadoop/setupVars from centurtion064 – Changes all uses of {somePath}/command to just command



Goto http://www.cs.virginia.edu/~cbs6n/hadoop for web access. These slides and more information are also available there.



Once you use the DFS (put something in it), relative paths are from /usr/{your usr id}. E.G. if your id is tb28 … your “home dir” is /usr/tb28

Master Node  Hadoop

currently configured with centurion064 as the master node

 Master

node

– Keeps track of namespace and metadata about items – Keeps track of MapReduce jobs in the system

Slave Nodes  Centurion064

node

 Slave

also acts as a slave

nodes

– Manage blocks of data sent from master node – In terms of GFS, these are the chunkservers  Currently

centurion060 is also

Hadoop Paths 

Hadoop is locally “installed” on each machine – Installed location is in /localtmp/hadoop/hadoop-0.15.3 – Slave nodes store their data in /localtmp/hadoop/hadoop-dfs (this is automatically created by the DFS) – /localtmp/hadoop is owned by group gbg (someone in this group must administer this or a cs admin)



Files are divided into 64 MB chunks (this is configurable)

Starting / Stopping Hadoop  For

the purposes of this tutorial, we assume you have run the setupVars from earlier

 start-all.sh

– starts all slave nodes and master node  stop-all.sh – stops all slave nodes and master node

Using HDFS (1/2) 

hadoop dfs – – – – – – – – – – – – – – – –

[-ls <path>] [-du <path>] [-cp <src> ] [-rm <path>] [-put ] [-copyFromLocal ] [-moveFromLocal ] [-get [-crc] <src> ] [-cat <src>] [-copyToLocal [-crc] <src> ] [-moveToLocal [-crc] <src> ] [-mkdir <path>] [-touchz <path>] [-test -[ezd] <path>] [-stat [format] <path>] [-help [cmd]]

Using HDFS (2/2) 

Want to reformat?



Easy – hadoop namenode –format



Basically we see most commands look similar – hadoop “some command” options – If you just type hadoop you get all possible commands (including undocumented ones – hooray)

To Add Another Slave 

This adds another data node / job execution site to the pool

– Hadoop dynamically uses filesystem underneath it – If more space is available on the HDD, HDFS will try to use it when it needs to



Modify the slaves file

– In centurion064:/localtmp/hadoop/hadoop0.15.3/conf – Copy code installation dir to newMachine:/localtmp/hadoop/hadoop-0.15.3 (very small) – Restart Hadoop

Configure Hadoop



Can configure in {$installation dir}/conf – hadoop-default.xml for global – hadoop-site.xml for site specific (overrides global)

That’s it for Configuration!

Real-time Access

Related Documents

Hadoop Primer
November 2019 33
Es26-le03-handson
June 2020 4
Hadoop-mapred_tutorial
July 2020 24
Ebs Handson With Bpel
October 2019 41
Bluegene-hadoop
December 2019 28

More Documents from ""