如何周期性把每天日志导入hive
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了如何周期性把每天日志导入hive相关的知识,希望对你有一定的参考价值。
参考技术A spark也不是无所不能的啊,提供思路: 直接监听mysql的bin log 然后处理bin log的日志到hdfs上sqoop把hive表数据导入到mysql中
应用:
SQOOP是用于对数据进行导入导出的。
(1)把MySQL、Oracle等数据库中的数据导入到HDFS、Hive、HBase中
(2)把HDFS、Hive、HBase中的数据导出到MySQL、Oracle等数据库中
1.把数据从mysql导入到hdfs(默认是/user/<username>)中
sqoop import --connect jdbc:mysql://hadoop0:3306/hive --username root --password admin --table TBLS --fields-terminated-by '\\t' --null-string '**' -m 1 --append --hive-import
sqoop import --connect jdbc:mysql://hadoop0:3306/hive --username root --password admin --table TBLS --fields-terminated-by '\\t' --null-string '**' -m 1 --append --hive-import --check-column 'TBL_ID' --incremental append --last-value 6
2.把数据从hdfs导出到mysql中
sqoop export --connect jdbc:mysql://hadoop0:3306/hive --username root --password admin --table ids --fields-terminated-by '\\t' --export-dir '/ids'
3.设置为作业,运行作业
sqoop job --create myjob -- import --connect jdbc:mysql://hadoop0:3306/hive --username root --password admin --table TBLS --fields-terminated-by '\\t' --null-string '**' -m 1 --append --hive-import
4. 导入导出的事务是以Mapper任务为单位。
例子:
1、解压sqoop:
[root@i-love-you local]# tar -zxvf sqoop-1.4.4.bin__hadoop-2.0.4-alpha.tar.gz
2、把 mysql-connector-java-5.1.33-bin.jar 拷贝到 sqoop 的lib 目录里
3、把hive表里数据导入到mysql里面去:
[root@i-love-you sqoop]# bin/sqoop export --connect jdbc:mysql://192.168.1.1:3306/bbs --username root --password mysqladmin --table bbs_info --fields-terminated-by '\\001' --export-dir '/hive/bs_info';
Warning: /usr/lib/hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
15/05/26 19:46:24 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
15/05/26 19:46:24 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
15/05/26 19:46:24 INFO tool.CodeGenTool: Beginning code generation
15/05/26 19:46:26 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `bbs_info` AS t LIMIT 1
15/05/26 19:46:26 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `bbs_info` AS t LIMIT 1
15/05/26 19:46:26 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/hadoop
注: /tmp/sqoop-root/compile/8c90be046e8284e1d849b412095d161f/bbs_info.java使用或覆盖了已过时的 API。
注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。
15/05/26 19:46:32 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/8c90be046e8284e1d849b412095d161f/bbs_info.jar
15/05/26 19:46:32 INFO mapreduce.ExportJobBase: Beginning export of bbs_info
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hbase-0.99.2/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
15/05/26 19:46:33 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
15/05/26 19:46:38 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
15/05/26 19:46:38 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
15/05/26 19:46:38 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
15/05/26 19:46:38 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
15/05/26 19:46:47 INFO input.FileInputFormat: Total input paths to process : 1
15/05/26 19:46:47 INFO input.FileInputFormat: Total input paths to process : 1
15/05/26 19:46:47 INFO mapreduce.JobSubmitter: number of splits:4
15/05/26 19:46:48 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1432620593569_0014
15/05/26 19:46:50 INFO impl.YarnClientImpl: Submitted application application_1432620593569_0014
15/05/26 19:46:50 INFO mapreduce.Job: The url to track the job: http://i-love-you:8088/proxy/application_1432620593569_0014/
15/05/26 19:46:50 INFO mapreduce.Job: Running job: job_1432620593569_0014
15/05/26 19:47:57 INFO mapreduce.Job: Job job_1432620593569_0014 running in uber mode : false
15/05/26 19:47:57 INFO mapreduce.Job: map 0% reduce 0%
15/05/26 19:53:08 INFO mapreduce.Job: map 100% reduce 0%
15/05/26 19:54:07 INFO mapreduce.Job: Job job_1432620593569_0014 completed successfully
15/05/26 19:54:24 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=455144
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=673
HDFS: Number of bytes written=0
HDFS: Number of read operations=19
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
Job Counters
Launched map tasks=4
Data-local map tasks=4
Total time spent by all maps in occupied slots (ms)=1360849
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=1360849
Total vcore-seconds taken by all map tasks=1360849
Total megabyte-seconds taken by all map tasks=1393509376
Map-Reduce Framework
Map input records=1
Map output records=1
Input split bytes=571
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=20255
CPU time spent (ms)=10140
Physical memory (bytes) snapshot=214138880
Virtual memory (bytes) snapshot=3361767424
Total committed heap usage (bytes)=62914560
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0
15/05/26 19:54:25 INFO ipc.Client: Retrying connect to server: i-love-you/192.168.1.10:52381. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
15/05/26 19:54:26 INFO ipc.Client: Retrying connect to server: i-love-you/192.168.1.10:52381. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
15/05/26 19:54:27 INFO ipc.Client: Retrying connect to server: i-love-you/192.168.1.10:52381. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
15/05/26 19:54:28 INFO mapred.ClientServiceDelegate: Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
15/05/26 19:55:04 INFO mapreduce.ExportJobBase: Transferred 673 bytes in 505.5613 seconds (1.3312 bytes/sec)
15/05/26 19:55:05 INFO mapreduce.ExportJobBase: Exported 1 records.
[root@i-love-you sqoop]#
传输完毕,在windows的mysql里面的数据:
这个表在传输之前创建好:
mysql> create table bbs_info(log_date varchar(10) , pv varchar(10) , register varchar(10) , ip varch
ar(10) , jumper varchar(10));
Query OK, 0 rows affected (0.17 sec)
查询传输完的数据:
mysql> select * from bbs_info;
+------------+--------+----------+------+--------+
| log_date | pv | register | ip | jumper |
+------------+--------+----------+------+--------+
| 2015-05-25 | 170647 | 28 | 9584 | 3367 |
+------------+--------+----------+------+--------+
1 row in set (0.00 sec)
mysql>
---------------------------------------------导入第二张表:
[root@i-love-you sqoop]# bin/sqoop export --connect jdbc:mysql://192.168.1.1:3306/bbs --username root --password mysqladmin --table bbs_forum --fields-terminated-by '\\001' --export-dir '/hive/bs_forum';
Warning: /usr/lib/hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
15/05/26 20:01:10 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
15/05/26 20:01:10 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
15/05/26 20:01:10 INFO tool.CodeGenTool: Beginning code generation
15/05/26 20:01:12 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `bbs_forum` AS t LIMIT 1
15/05/26 20:01:12 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `bbs_forum` AS t LIMIT 1
15/05/26 20:01:12 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/hadoop
注: /tmp/sqoop-root/compile/179d8b142d44860aa72cdc89c80f4355/bbs_forum.java使用或覆盖了已过时的 API。
注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。
15/05/26 20:01:17 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/179d8b142d44860aa72cdc89c80f4355/bbs_forum.jar
15/05/26 20:01:17 INFO mapreduce.ExportJobBase: Beginning export of bbs_forum
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hbase-0.99.2/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
15/05/26 20:01:18 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
15/05/26 20:01:23 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
15/05/26 20:01:23 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
15/05/26 20:01:23 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
15/05/26 20:01:23 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
15/05/26 20:01:35 INFO input.FileInputFormat: Total input paths to process : 1
15/05/26 20:01:35 INFO input.FileInputFormat: Total input paths to process : 1
15/05/26 20:01:35 INFO mapreduce.JobSubmitter: number of splits:4
15/05/26 20:01:36 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1432620593569_0015
15/05/26 20:01:38 INFO impl.YarnClientImpl: Submitted application application_1432620593569_0015
15/05/26 20:01:38 INFO mapreduce.Job: The url to track the job: http://i-love-you:8088/proxy/application_1432620593569_0015/
15/05/26 20:01:38 INFO mapreduce.Job: Running job: job_1432620593569_0015
15/05/26 20:02:06 INFO mapreduce.Job: Job job_1432620593569_0015 running in uber mode : false
15/05/26 20:02:07 INFO mapreduce.Job: map 0% reduce 0%
15/05/26 20:07:20 INFO mapreduce.Job: map 100% reduce 0%
15/05/26 20:08:31 INFO mapreduce.Job: Job job_1432620593569_0015 completed successfully
15/05/26 20:08:45 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=455120
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=5913
HDFS: Number of bytes written=0
HDFS: Number of read operations=19
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
Job Counters
Launched map tasks=4
Data-local map tasks=4
Total time spent by all maps in occupied slots (ms)=1384052
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=1384052
Total vcore-seconds taken by all map tasks=1384052
Total megabyte-seconds taken by all map tasks=1417269248
Map-Reduce Framework
Map input records=105
Map output records=105
Input split bytes=576
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=4363
CPU time spent (ms)=10000
Physical memory (bytes) snapshot=242823168
Virtual memory (bytes) snapshot=3363196928
Total committed heap usage (bytes)=62914560
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0
15/05/26 20:08:46 INFO ipc.Client: Retrying connect to server: i-love-you/192.168.1.10:52245. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
15/05/26 20:08:47 INFO ipc.Client: Retrying connect to server: i-love-you/192.168.1.10:52245. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
15/05/26 20:08:48 INFO ipc.Client: Retrying connect to server: i-love-you/192.168.1.10:52245. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
15/05/26 20:08:49 INFO mapred.ClientServiceDelegate: Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
15/05/26 20:09:01 INFO mapreduce.ExportJobBase: Transferred 5.7744 KB in 457.5735 seconds (12.9225 bytes/sec)
15/05/26 20:09:04 INFO mapreduce.ExportJobBase: Exported 105 records.
[root@i-love-you sqoop]#
----------------------------------mysql事先创建的表和传输后的数据
mysql> create table bbs_forum(log_date varchar(10) , forum varchar(10) ,ip varchar(10) ,pv varc
0));
Query OK, 0 rows affected (0.17 sec)
mysql> select * from bbs_forum;
+------------+-------+------+------+
| log_date | forum | ip | pv |
+------------+-------+------+------+
| 2015-05-25 | 10 | 7 | 7 |
| 2015-05-25 | 100 | 21 | 17 |
| 2015-05-25 | 101 | 46 | 37 |
| 2015-05-25 | 102 | 69 | 45 |
| 2015-05-25 | 103 | 29 | 25 |
| 2015-05-25 | 104 | 15 | 13 |
| 2015-05-25 | 105 | 11 | 11 |
| 2015-05-25 | 106 | 23 | 21 |
| 2015-05-25 | 107 | 345 | 169 |
| 2015-05-25 | 108 | 100 | 69 |
| 2015-05-25 | 109 | 104 | 61 |
| 2015-05-25 | 11 | 4 | 3 |
| 2015-05-25 | 111 | 415 | 194 |
| 2015-05-25 | 112 | 29 | 23 |
| 2015-05-25 | 113 | 3 | 2 |
| 2015-05-25 | 114 | 11 | 10 |
| 2015-05-25 | 115 | 6 | 6 |
| 2015-05-25 | 116 | 7 | 7 |
| 2015-05-25 | 117 | 6 | 6 |
| 2015-05-25 | 118 | 8 | 8 |
| 2015-05-25 | 119 | 12 | 5 |
| 2015-05-25 | 121 | 4 | 3 |
| 2015-05-25 | 122 | 562 | 88 |
| 2015-05-25 | 123 | 15 | 13 |
| 2015-05-25 | 124 | 3 | 2 |
| 2015-05-25 | 34 | 3 | 3 |
| 2015-05-25 | 36 | 7 | 7 |
| 2015-05-25 | 37 | 6 | 6 |
| 2015-05-25 | 39 | 1 | 1 |
| 2015-05-25 | 4 | 5 | 4 |
| 2015-05-25 | 40 | 5 | 5 |
| 2015-05-25 | 41 | 4 | 4 |
| 2015-05-25 | 43 | 3 | 3 |
| 2015-05-25 | 44 | 2 | 2 |
| 2015-05-25 | 46 | 25 | 18 |
| 2015-05-25 | 47 | 348 | 280 |
| 2015-05-25 | 51 | 2 | 2 |
| 2015-05-25 | 52 | 7 | 6 |
| 2015-05-25 | 53 | 47 | 33 |
| 2015-05-25 | 54 | 54 | 43 |
| 2015-05-25 | 55 | 44 | 22 |
| 2015-05-25 | 56 | 19 | 16 |
| 2015-05-25 | 57 | 27 | 25 |
| 2015-05-25 | 58 | 10 | 10 |
| 2015-05-25 | 59 | 5 | 5 |
| 2015-05-25 | 60 | 60 | 51 |
| 2015-05-25 | 61 | 46 | 36 |
| 2015-05-25 | 62 | 15 | 12 |
| 2015-05-25 | 63 | 2 | 2 |
| 2015-05-25 | 64 | 4 | 4 |
| 2015-05-25 | 65 | 48 | 37 |
| 2015-05-25 | 66 | 22 | 22 |
| 2015-05-25 | 125 | 7 | 7 |
| 2015-05-25 | 126 | 31 | 17 |
| 2015-05-25 | 128 | 14 | 7 |
| 2015-05-25 | 129 | 25 | 23 |
| 2015-05-25 | 130 | 42 | 31 |
| 2015-05-25 | 131 | 54 | 36 |
| 2015-05-25 | 132 | 13 | 10 |
| 2015-05-25 | 133 | 18 | 11 |
| 2015-05-25 | 136 | 3 | 2 |
| 2015-05-25 | 138 | 4 | 1 |
| 2015-05-25 | 14 | 6 | 5 |
| 2015-05-25 | 141 | 6 | 5 |
| 2015-05-25 | 142 | 68 | 35 |
| 2015-05-25 | 143 | 7 | 7 |
| 2015-05-25 | 144 | 10 | 10 |
| 2015-05-25 | 145 | 108 | 39 |
| 2015-05-25 | 15 | 1 | 1 |
| 2015-05-25 | 16 | 6 | 5 |
| 2015-05-25 | 17 | 1 | 1 |
| 2015-05-25 | 18 | 1 | 1 |
| 2015-05-25 | 19 | 8 | 8 |
| 2015-05-25 | 2 | 4 | 4 |
| 2015-05-25 | 21 | 1 | 1 |
| 2015-05-25 | 26 | 3 | 3 |
| 2015-05-25 | 31 | 3 | 3 |
| 2015-05-25 | 32 | 6 | 6 |
| 2015-05-25 | 67 | 2 | 2 |
| 2015-05-25 | 68 | 2 | 2 |
| 2015-05-25 | 69 | 2 | 2 |
| 2015-05-25 | 70 | 4 | 3 |
| 2015-05-25 | 71 | 5 | 4 |
| 2015-05-25 | 72 | 9 | 8 |
| 2015-05-25 | 73 | 4 | 3 |
| 2015-05-25 | 76 | 4 | 4 |
| 2015-05-25 | 78 | 2 | 2 |
| 2015-05-25 | 79 | 2 | 2 |
| 2015-05-25 | 8 | 2 | 2 |
| 2015-05-25 | 82 | 1 | 1 |
| 2015-05-25 | 83 | 3 | 3 |
| 2015-05-25 | 84 | 1 | 1 |
| 2015-05-25 | 85 | 1 | 1 |
| 2015-05-25 | 86 | 1 | 1 |
| 2015-05-25 | 9 | 1 | 1 |
| 2015-05-25 | 90 | 37 | 27 |
| 2015-05-25 | 91 | 43 | 35 |
| 2015-05-25 | 92 | 31 | 25 |
| 2015-05-25 | 93 | 38 | 25 |
| 2015-05-25 | 94 | 14 | 14 |
| 2015-05-25 | 95 | 13 | 10 |
| 2015-05-25 | 96 | 3 | 3 |
| 2015-05-25 | 97 | 22 | 14 |
| 2015-05-25 | 98 | 6 | 6 |
| 2015-05-25 | 99 | 41 | 37 |
+------------+-------+------+------+
105 rows in set (0.00 sec)
mysql>
以上是关于如何周期性把每天日志导入hive的主要内容,如果未能解决你的问题,请参考以下文章