PIG latin - DUMP 命令不显示

Posted

技术标签:

【中文标题】PIG latin - DUMP 命令不显示【英文标题】:PIG latin - DUMP command not displaying 【发布时间】:2016-02-22 01:51:10 【问题描述】:

我只是尝试使用 DUMP 显示 GROUPed 记录的结果,但没有显示数据,而是有很多日志数据。我只是在玩 10 条记录。

详情:

grunt> DUMP grouped_records;
2016-02-21 17:34:24,338 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: GROUP_BY,FILTER
2016-02-21 17:34:24,339 [main] INFO  org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, DuplicateForEachColumnRewrite, GroupByConstParallelSetter, ImplicitSplitInserter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, NewPartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter], RULES_DISABLED=[FilterLogicExpressionSimplifier, PartitionFilterOptimizer]
2016-02-21 17:34:24,354 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2016-02-21 17:34:24,374 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2016-02-21 17:34:24,374 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2016-02-21 17:34:24,434 [main] INFO  org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032
2016-02-21 17:34:24,440 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2016-02-21 17:34:24,527 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2016-02-21 17:34:24,530 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Reduce phase detected, estimating # of required reducers.
2016-02-21 17:34:24,534 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator
2016-02-21 17:34:24,541 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=142
2016-02-21 17:34:24,541 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1
2016-02-21 17:34:25,128 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job662989067023626482.jar
2016-02-21 17:34:31,290 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job662989067023626482.jar created
2016-02-21 17:34:31,335 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2016-02-21 17:34:31,338 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
2016-02-21 17:34:31,338 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cache
2016-02-21 17:34:31,338 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []
2016-02-21 17:34:31,549 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2016-02-21 17:34:31,550 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2016-02-21 17:34:31,556 [JobControl] INFO  org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032
2016-02-21 17:34:31,607 [JobControl] INFO  org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2016-02-21 17:34:31,918 [JobControl] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2016-02-21 17:34:31,918 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2016-02-21 17:34:31,921 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2016-02-21 17:34:31,979 [JobControl] INFO  org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1
2016-02-21 17:34:32,092 [JobControl] INFO  org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1454294818944_0034
2016-02-21 17:34:32,192 [JobControl] INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1454294818944_0034
2016-02-21 17:34:32,198 [JobControl] INFO  org.apache.hadoop.mapreduce.Job - The url to track the job: http://quickstart.cloudera:8088/proxy/application_1454294818944_0034/
2016-02-21 17:34:32,198 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1454294818944_0034
2016-02-21 17:34:32,198 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases filtered_records,grouped_records,records
2016-02-21 17:34:32,198 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: records[1,10],records[-1,-1],filtered_records[2,19],grouped_records[3,18] C:  R: 
2016-02-21 17:34:32,198 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://localhost:50030/jobdetails.jsp?jobid=job_1454294818944_0034
2016-02-21 17:34:32,428 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2016-02-21 17:35:02,623 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete
2016-02-21 17:35:23,469 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2016-02-21 17:35:23,470 [main] INFO  org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics: 

HadoopVersion   PigVersion  UserId  StartedAt   FinishedAt  Features
2.6.0-cdh5.5.0  0.12.0-cdh5.5.0 cloudera    2016-02-21 17:34:24 2016-02-21 17:35:23 GROUP_BY,FILTER

Success!

Job Stats (time in seconds):
JobId   Maps    Reduces MaxMapTime  MinMapTIme  AvgMapTime  MedianMapTime   MaxReduceTime   MinReduceTime   AvgReduceTime   MedianReducetime    Alias   Feature Outputs
job_1454294818944_0034  1   1   12  12  12  12  16  16  16  16  filtered_records,grouped_records,records    GROUP_BY    hdfs://quickstart.cloudera:8020/tmp/temp-1703423271/tmp-988597361,

Input(s):
Successfully read 10 records (525 bytes) from: "/user/hduser/input/maxtemppig.tsv"

Output(s):
Successfully stored 0 records in: "hdfs://quickstart.cloudera:8020/tmp/temp-1703423271/tmp-988597361"

Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG:
job_1454294818944_0034


2016-02-21 17:35:23,646 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
2016-02-21 17:35:23,648 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2016-02-21 17:35:23,648 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2016-02-21 17:35:23,649 [main] INFO  org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
2016-02-21 17:35:23,660 [main] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2016-02-21 17:35:23,660 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1

我尝试过的命令: 记录 = LOAD '/user/hduser/input/maxtemppig.tsv' AS (year:chararray, temperature:int, quality:int); 过滤记录 = 按温度输入 (-10,19) 和质量输入 (0,1,4,5,9) 过滤记录; DUMP 过滤记录;

grouped_records = 按年份分组过滤记录; DUMP grouped_records;

max_temp = FOREACH grouped_records GENERATE group, MAX(filtered_records.temperature); 转储最大温度;

我的输入 tsv 文件...

1950 32 01459 1951 33 01459 1950 21 01459 1940 24 01459 1950 33 01459 2000 30 01459 2010 44 01459 2014-10 01459 2016-20 01459 2011 19 01459

我错过了什么?

【问题讨论】:

请用您的文件和其余的 PIG 代码编辑您的问题。有些人可能想尝试重现问题 请在您的问题下方找到相应的“编辑”链接。请不要将 cmets 用于代码。另外,添加 tsv 文件的内容。 再次。请删除这些cmets和edit你的问题 【参考方案1】:

很有可能解析不工作,您正在过滤所有记录。

试试

records = LOAD '/user/hduser/input/maxtemppig.tsv' USING PigStorage('\t') AS (year:chararray, temperature:int, quality:int);

【讨论】:

感谢您的回复...我会检查解析部分。

以上是关于PIG latin - DUMP 命令不显示的主要内容,如果未能解决你的问题,请参考以下文章

何时不使用 Pig Latin

Pig Latin 中不区分大小写的搜索

Store 命令中的 Pig Latin 参数

Apache Pig Latin 参考手册 [关闭]

聚合的 Pig Latin 逻辑测试

在 Pig Latin 中选择不同的行