@kyanny's blog

My life. Opinions are my own.

hadoop-0.19.0 を Macbook にインストールする

まず Java 1.6 系を入れる。

Search - Apple Developer から Apple - Support - Downloads をダウンロードしてきてダブクリダブクリ。

インストールされた Java 1.6 はどこへ入ってるかというと、 SEのネタ帳 -Mac OSX に Java(JDK)1.6 と NetBeans 6.5 インストール方法 にあるとおり /System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home とかに入るので ${JAVA_HOME}/bin にパスを通しておく。 /usr/bin/java は 1.5 なので必ず入れる必要があるっぽい。

hadoop は http://hadoop.apache.org/core/releases.html#Download あたりから、ミラーをたどって適当に落としてくる。インストールとか不要っぽい。 tarball を解凍してほぼすぐに使える。

conf/hadoop-env.sh を編集して $JAVA_HOME だけ設定してあげる。

export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home

http://hadoop.apache.org/core/docs/current/quickstart.html#Local に書いてあるとおりコマンドを実行すると、なにやらだーっと流れていく。

$ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
Picked up _JAVA_OPTIONS: -Duser.language=en
Picked up _JAVA_OPTIONS: -Duser.language=en
09/01/07 01:07:54 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
09/01/07 01:07:54 INFO mapred.FileInputFormat: Total input paths to process : 3
09/01/07 01:07:54 INFO mapred.JobClient: Running job: job_local_0001
09/01/07 01:07:54 INFO mapred.FileInputFormat: Total input paths to process : 3
09/01/07 01:07:55 INFO mapred.MapTask: numReduceTasks: 1
09/01/07 01:07:55 INFO mapred.MapTask: io.sort.mb = 100
09/01/07 01:07:55 INFO mapred.MapTask: data buffer = 79691776/99614720
09/01/07 01:07:55 INFO mapred.MapTask: record buffer = 262144/327680
09/01/07 01:07:55 INFO mapred.MapTask: Starting flush of map output
09/01/07 01:07:55 INFO mapred.MapTask: Index: (0, 2, 6)
09/01/07 01:07:55 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
09/01/07 01:07:55 INFO mapred.LocalJobRunner: file:/Users/kyanny/hadoop-0.19.0/input/capacity-scheduler.xml:0+2065
09/01/07 01:07:55 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000000_0' done.
09/01/07 01:07:55 INFO mapred.MapTask: numReduceTasks: 1
09/01/07 01:07:55 INFO mapred.MapTask: io.sort.mb = 100
09/01/07 01:07:55 INFO mapred.MapTask: data buffer = 79691776/99614720
09/01/07 01:07:55 INFO mapred.MapTask: record buffer = 262144/327680
09/01/07 01:07:55 INFO mapred.MapTask: Starting flush of map output
09/01/07 01:07:55 INFO mapred.MapTask: Finished spill 0
09/01/07 01:07:55 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000001_0 is done. And is in the process of commiting
09/01/07 01:07:55 INFO mapred.LocalJobRunner: file:/Users/kyanny/hadoop-0.19.0/input/hadoop-default.xml:0+49456
09/01/07 01:07:55 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000001_0' done.
09/01/07 01:07:55 INFO mapred.MapTask: numReduceTasks: 1
09/01/07 01:07:55 INFO mapred.MapTask: io.sort.mb = 100
09/01/07 01:07:55 INFO mapred.MapTask: data buffer = 79691776/99614720
09/01/07 01:07:55 INFO mapred.MapTask: record buffer = 262144/327680
09/01/07 01:07:55 INFO mapred.MapTask: Starting flush of map output
09/01/07 01:07:55 INFO mapred.MapTask: Index: (0, 2, 6)
09/01/07 01:07:55 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000002_0 is done. And is in the process of commiting
09/01/07 01:07:55 INFO mapred.LocalJobRunner: file:/Users/kyanny/hadoop-0.19.0/input/hadoop-site.xml:0+178
09/01/07 01:07:55 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000002_0' done.
09/01/07 01:07:55 INFO mapred.Merger: Merging 3 sorted segments
09/01/07 01:07:55 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 1324 bytes
09/01/07 01:07:55 INFO mapred.JobClient:  map 100% reduce 0%
09/01/07 01:07:56 INFO mapred.TaskRunner: Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting
09/01/07 01:07:56 INFO mapred.LocalJobRunner: 
09/01/07 01:07:56 INFO mapred.TaskRunner: Task attempt_local_0001_r_000000_0 is allowed to commit now
09/01/07 01:07:56 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local_0001_r_000000_0' to file:/Users/kyanny/hadoop-0.19.0/grep-temp-1486019891
09/01/07 01:07:56 INFO mapred.LocalJobRunner: reduce > reduce
09/01/07 01:07:56 INFO mapred.TaskRunner: Task 'attempt_local_0001_r_000000_0' done.
09/01/07 01:07:56 INFO mapred.JobClient: Job complete: job_local_0001
09/01/07 01:07:56 INFO mapred.JobClient: Counters: 11
09/01/07 01:07:56 INFO mapred.JobClient:   File Systems
09/01/07 01:07:56 INFO mapred.JobClient:     Local bytes read=758456
09/01/07 01:07:56 INFO mapred.JobClient:     Local bytes written=677328
09/01/07 01:07:56 INFO mapred.JobClient:   Map-Reduce Framework
09/01/07 01:07:56 INFO mapred.JobClient:     Reduce input groups=42
09/01/07 01:07:56 INFO mapred.JobClient:     Combine output records=42
09/01/07 01:07:56 INFO mapred.JobClient:     Map input records=1585
09/01/07 01:07:56 INFO mapred.JobClient:     Reduce output records=42
09/01/07 01:07:56 INFO mapred.JobClient:     Map output bytes=1306
09/01/07 01:07:56 INFO mapred.JobClient:     Map input bytes=51699
09/01/07 01:07:56 INFO mapred.JobClient:     Combine input records=46
09/01/07 01:07:56 INFO mapred.JobClient:     Map output records=46
09/01/07 01:07:56 INFO mapred.JobClient:     Reduce input records=42
09/01/07 01:07:57 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
09/01/07 01:07:57 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
09/01/07 01:07:57 INFO mapred.FileInputFormat: Total input paths to process : 1
09/01/07 01:07:57 INFO mapred.JobClient: Running job: job_local_0002
09/01/07 01:07:57 INFO mapred.FileInputFormat: Total input paths to process : 1
09/01/07 01:07:57 INFO mapred.MapTask: numReduceTasks: 1
09/01/07 01:07:57 INFO mapred.MapTask: io.sort.mb = 100
09/01/07 01:07:57 INFO mapred.MapTask: data buffer = 79691776/99614720
09/01/07 01:07:57 INFO mapred.MapTask: record buffer = 262144/327680
09/01/07 01:07:57 INFO mapred.MapTask: Starting flush of map output
09/01/07 01:07:57 INFO mapred.MapTask: Finished spill 0
09/01/07 01:07:57 INFO mapred.TaskRunner: Task:attempt_local_0002_m_000000_0 is done. And is in the process of commiting
09/01/07 01:07:57 INFO mapred.LocalJobRunner: file:/Users/kyanny/hadoop-0.19.0/grep-temp-1486019891/part-00000:0+1660
09/01/07 01:07:57 INFO mapred.TaskRunner: Task 'attempt_local_0002_m_000000_0' done.
09/01/07 01:07:57 INFO mapred.Merger: Merging 1 sorted segments
09/01/07 01:07:57 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 1324 bytes
09/01/07 01:07:58 INFO mapred.TaskRunner: Task:attempt_local_0002_r_000000_0 is done. And is in the process of commiting
09/01/07 01:07:58 INFO mapred.LocalJobRunner: 
09/01/07 01:07:58 INFO mapred.TaskRunner: Task attempt_local_0002_r_000000_0 is allowed to commit now
09/01/07 01:07:58 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local_0002_r_000000_0' to file:/Users/kyanny/hadoop-0.19.0/output
09/01/07 01:07:58 INFO mapred.LocalJobRunner: reduce > reduce
09/01/07 01:07:58 INFO mapred.TaskRunner: Task 'attempt_local_0002_r_000000_0' done.
09/01/07 01:07:58 INFO mapred.JobClient: Job complete: job_local_0002
09/01/07 01:07:58 INFO mapred.JobClient: Counters: 11
09/01/07 01:07:58 INFO mapred.JobClient:   File Systems
09/01/07 01:07:58 INFO mapred.JobClient:     Local bytes read=709976
09/01/07 01:07:58 INFO mapred.JobClient:     Local bytes written=678528
09/01/07 01:07:58 INFO mapred.JobClient:   Map-Reduce Framework
09/01/07 01:07:58 INFO mapred.JobClient:     Reduce input groups=2
09/01/07 01:07:58 INFO mapred.JobClient:     Combine output records=0
09/01/07 01:07:58 INFO mapred.JobClient:     Map input records=42
09/01/07 01:07:58 INFO mapred.JobClient:     Reduce output records=42
09/01/07 01:07:58 INFO mapred.JobClient:     Map output bytes=1238
09/01/07 01:07:58 INFO mapred.JobClient:     Map input bytes=1574
09/01/07 01:07:58 INFO mapred.JobClient:     Combine input records=0
09/01/07 01:07:58 INFO mapred.JobClient:     Map output records=42
09/01/07 01:07:58 INFO mapred.JobClient:     Reduce input records=42
$ cat output/*
3	dfs.
3	dfs.name.dir
1	dfs.http.address
1	dfs.access.time.precision
1	dfs.balance.bandwidth
1	dfs.block.size
1	dfs.blockreport.initial
1	dfs.blockreport.interval
1	dfs.client.block.write.retries
1	dfs.data.dir
1	dfs.datanode.address
1	dfs.datanode.dns.interface
1	dfs.datanode.dns.nameserver
1	dfs.datanode.du.reserved
1	dfs.datanode.handler.count
1	dfs.datanode.http.address
1	dfs.datanode.https.address
1	dfs.datanode.ipc.address
1	dfs.default.chunk.view.size
1	dfs.df.interval
1	dfs.heartbeat.interval
1	dfs.hosts
1	dfs.hosts.exclude
1	dfs.https.address
1	dfs.impl
1	dfs.max.objects
1	dfs.name.edits.dir
1	dfs.namenode.decommission.interval
1	dfs.namenode.handler.count
1	dfs.namenode.logging.level
1	dfs.permissions
1	dfs.permissions.supergroup
1	dfs.replication
1	dfs.replication.consider
1	dfs.replication.interval
1	dfs.replication.max
1	dfs.replication.min
1	dfs.replication.min.
1	dfs.safemode.extension
1	dfs.safemode.threshold.pct
1	dfs.secondary.http.address
1	dfs.web.ugi

何やってるかさっぱりわかんないけど無事動いた。 GNU Linux でも Windows でもないけどちゃんと動く。 Java だから当たり前か。