ubuntu下安装pig
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了ubuntu下安装pig相关的知识,希望对你有一定的参考价值。
转载自: http://blog.csdn.net/a925907195/article/details/42325579
1 安装
只安装在namenode节点上即可
1.1 下载并解压
下载:http://pig.apache.org/releases.html下载pig-0.12.1版本的pig-0.12.1.tar.gz
存放路径:/home/Hadoop/
解压:tar -zxvf pig-0.12.1.tar.gz 改名:mv pig-0.12.1 pig 然后放到/usr/local/hadoop下
1.2 改变pig的所有者
chown -R hadoop:hadoop /usr /local/hadoop/pig
1.3 修改配置文件
添加path路径:打开/etc/profile文件(vi /etc/profile)在最后加入如下内容 #pig path
export PATH=$PATH: /usr /local/hadoop/pig/bin 使修改生效:source /etc/profile
1.4 验证安装是否成功
输入pig –x local 命令。看到出现“grunt>”提示符,表明Pig已经安装成功,如下:
Pig –x local
1.5 配置pig的mapreduce模式
编辑/etc/profile文件,加入hadoop/conf路径
Vim /etc profile
export PATH=$PATH: /usr /local/Hadoop/pig/bin
export PIG_CLASSPATH=/usr /local/Hadoop/conf
执行使配置文件生效
Source /etc/profile
1.6 验证pig的mapreduce模式
直接输入pig命令,出现“grunt>”提示即可(必须先启动hadoop)
1.7 修改Pig的日志文件目录
Pig的日志默认在当前目录,不方便进行分析和管理,需要修改日志文件目录,修改如下:
1) 在/usr/pig目录下新建文件夹logs
midir /usr/local/hadoop/pig/logs
2) 修改/usr/local/Hadoop/pig/conf/pig.properties文件中的pig.logfile=/usr/local/Hadoop /pig/logs
打开/usr/local/Hadoop /pig/conf/pig.properties文件找到pig.logfile修改如下:
Pig.logfile=/usr/local/Hadoop/pig/logs
1.8 pig 常用命令
Pig –x local以本地模式进入pig
Pig直接以hdfs系统模式进入pig
测试Pig latin语句
常用语句:
LOAD : 指出载入数据的方法
FOREACH:逐行扫描进行某种处理
FILTER:过滤行
DUMP:把结果显示到屏幕
STORE:把结果保存到文件
通常书写执行顺序:
LOAD ——〉FOREACH——〉STORE
1.9 测试pig在MapReduce 模式下作业的执行
步骤一:上传passwd.txt到hdfs文件系统
cat/home/hadoop/fjshtest/passwd.txt
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
bin:x:2:2:bin:/bin:/bin/sh
sys:x:3:3:sys:/dev:/bin/sh
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/bin/sh
man:x:6:12:man:/var/cache/man:/bin/sh
lp:x:7:7:lp:/var/spool/lpd:/bin/sh
mail:x:8:8:mail:/var/mail:/bin/sh
news:x:9:9:news:/var/spool/news:/bin/sh
uucp:x:10:10:uucp:/var/spool/uucp:/bin/sh
proxy:x:13:13:proxy:/bin:/bin/sh
www-data:x:33:33:www-data:/var/www:/bin/sh
backup:x:34:34:backup:/var/backups:/bin/sh
list:x:38:38:MailingList Manager:/var/list:/bin/sh
bin/hadoop fs -put /home/hadoop/fjshtest/passwd.txt /user/hadoop/in
bin/hadoop fs -ls /user/hadoop/in
-rw-r--r-- 2 hadoop supergroup 1705 2015-01-01 22:46/user/hadoop/in/passwd.txt
-rw-r--r-- 2 hadoop supergroup 1026 2015-01-01 22:23 /user/hadoop/in/pigtest
-rw-r--r-- 2 hadoop supergroup 12 2014-11-14 23:18/user/hadoop/in/test1.txt
-rw-r--r-- 2 hadoop supergroup 13 2014-11-14 23:18/user/hadoop/in/test2.txt
步骤二:在grunt编译器命令行依次执行如下命令
A = load ‘/user/hadoop/in/passwd.txt‘ usingPigStorage(‘:‘);
B = foreach A generate$0 as id;
dump B;
在屏幕可以直接查看命令执行结果
- Total input paths toprocess : 1
(root)
(daemon)
(bin)
(sys)
(sync)
(games)
(man)
(lp)
(mail)
(news)
(uucp)
(proxy)
(www-data)
(backup)
(list)
(irc)
(gnats)
(nobody)
(libuuid)
常见错误整理:
pig语句等号两次需要空格,否则报错
A=load‘test.txt‘ as {ip:chararray ,other:chararray} usingPigStorage(‘ ‘);
-->报错
grunt>A=load ‘test.txt‘ as {ip:chararray,other:chararray} using PigStorage(‘ ‘);
2014-07-0416:05:35,935 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Errorduring parsing. Encountered " <PATH> "A=load "" atline 2, column 1.
问题2:load加载数据时,usingPigStorage(‘ ‘)需要写到as之前
A =LOAD ‘test.txt‘ AS (ip:chararray ,other:chararray)using PigStorage(‘ ‘);
-->报错
grunt>A = LOAD ‘test.txt‘ AS (ip:chararray,other:chararray) using PigStorage(‘ ‘);
2014-07-0416:03:35,421 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200:<line 1, column 54> mismatched input ‘using‘ expectingSEMI_COLON
问题3:有些函数和关键字如COUNT,PigStorage,分区大小写,否则提示不存在
C =foreach B {generate ip,count(ip);};
-->报错
grunt>C = foreach B {generate ip,count(ip);};
2014-07-0416:19:40,167 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Couldnot resolve count using imports: [, Java.lang., org.apache.pig.builtin.,org.apache.pig.impl.builtin.]
Detailsat logfile: /app01/pig-0.13.0/pig_1404460981802.log
问题4:指定字段名,需要指定是那个关系(A.ip)
C =foreach B {generate ip,COUNT(ip);};
-->报错
grunt>C = foreach B {generate ip,COUNT(ip);};
2014-07-0416:18:54,919 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1025:
<line4, column 24> Invalid field projection. Projected field [ip] does not existin schema: group:chararray,A:bag{:tuple(ip:chararray,other:chararray)}.
以上是关于ubuntu下安装pig的主要内容,如果未能解决你的问题,请参考以下文章