如何周期性把每天日志导入hive

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了如何周期性把每天日志导入hive相关的知识,希望对你有一定的参考价值。

参考技术A spark也不是无所不能的啊,提供思路: 直接监听mysql的bin log 然后处理bin log的日志到hdfs上

sqoop把hive表数据导入到mysql中

应用:

SQOOP是用于对数据进行导入导出的。
    (1)把MySQL、Oracle等数据库中的数据导入到HDFS、Hive、HBase中
    (2)把HDFS、Hive、HBase中的数据导出到MySQL、Oracle等数据库中


1.把数据从mysql导入到hdfs(默认是/user/<username>)中
  sqoop import --connect jdbc:mysql://hadoop0:3306/hive  --username root --password admin --table TBLS --fields-terminated-by '\\t'  --null-string '**'  -m 1 --append  --hive-import
  sqoop import --connect jdbc:mysql://hadoop0:3306/hive  --username root --password admin --table TBLS --fields-terminated-by '\\t'  --null-string '**'  -m 1 --append  --hive-import  --check-column 'TBL_ID' --incremental append --last-value 6
  
2.把数据从hdfs导出到mysql中  
  sqoop export --connect jdbc:mysql://hadoop0:3306/hive  --username root --password admin --table ids --fields-terminated-by '\\t' --export-dir '/ids'
  
3.设置为作业,运行作业
  sqoop job --create myjob -- import --connect jdbc:mysql://hadoop0:3306/hive  --username root --password admin --table TBLS --fields-terminated-by '\\t'  --null-string '**'  -m 1 --append  --hive-import  
  
4. 导入导出的事务是以Mapper任务为单位。


例子:

1、解压sqoop:

[root@i-love-you local]# tar -zxvf sqoop-1.4.4.bin__hadoop-2.0.4-alpha.tar.gz


2、把 mysql-connector-java-5.1.33-bin.jar 拷贝到 sqoop 的lib 目录里


3、把hive表里数据导入到mysql里面去:
[root@i-love-you sqoop]# bin/sqoop export --connect jdbc:mysql://192.168.1.1:3306/bbs --username root --password mysqladmin --table bbs_info --fields-terminated-by '\\001' --export-dir '/hive/bs_info';
Warning: /usr/lib/hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
15/05/26 19:46:24 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
15/05/26 19:46:24 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
15/05/26 19:46:24 INFO tool.CodeGenTool: Beginning code generation
15/05/26 19:46:26 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `bbs_info` AS t LIMIT 1
15/05/26 19:46:26 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `bbs_info` AS t LIMIT 1
15/05/26 19:46:26 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/hadoop
注: /tmp/sqoop-root/compile/8c90be046e8284e1d849b412095d161f/bbs_info.java使用或覆盖了已过时的 API。
注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。
15/05/26 19:46:32 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/8c90be046e8284e1d849b412095d161f/bbs_info.jar
15/05/26 19:46:32 INFO mapreduce.ExportJobBase: Beginning export of bbs_info
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hbase-0.99.2/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
15/05/26 19:46:33 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
15/05/26 19:46:38 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
15/05/26 19:46:38 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
15/05/26 19:46:38 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
15/05/26 19:46:38 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
15/05/26 19:46:47 INFO input.FileInputFormat: Total input paths to process : 1
15/05/26 19:46:47 INFO input.FileInputFormat: Total input paths to process : 1
15/05/26 19:46:47 INFO mapreduce.JobSubmitter: number of splits:4
15/05/26 19:46:48 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1432620593569_0014
15/05/26 19:46:50 INFO impl.YarnClientImpl: Submitted application application_1432620593569_0014
15/05/26 19:46:50 INFO mapreduce.Job: The url to track the job: http://i-love-you:8088/proxy/application_1432620593569_0014/
15/05/26 19:46:50 INFO mapreduce.Job: Running job: job_1432620593569_0014
15/05/26 19:47:57 INFO mapreduce.Job: Job job_1432620593569_0014 running in uber mode : false
15/05/26 19:47:57 INFO mapreduce.Job:  map 0% reduce 0%
15/05/26 19:53:08 INFO mapreduce.Job:  map 100% reduce 0%
15/05/26 19:54:07 INFO mapreduce.Job: Job job_1432620593569_0014 completed successfully
15/05/26 19:54:24 INFO mapreduce.Job: Counters: 30
        File System Counters
                FILE: Number of bytes read=0
                FILE: Number of bytes written=455144
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=673
                HDFS: Number of bytes written=0
                HDFS: Number of read operations=19
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=0
        Job Counters
                Launched map tasks=4
                Data-local map tasks=4
                Total time spent by all maps in occupied slots (ms)=1360849
                Total time spent by all reduces in occupied slots (ms)=0
                Total time spent by all map tasks (ms)=1360849
                Total vcore-seconds taken by all map tasks=1360849
                Total megabyte-seconds taken by all map tasks=1393509376
        Map-Reduce Framework
                Map input records=1
                Map output records=1
                Input split bytes=571
                Spilled Records=0
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=20255
                CPU time spent (ms)=10140
                Physical memory (bytes) snapshot=214138880
                Virtual memory (bytes) snapshot=3361767424
                Total committed heap usage (bytes)=62914560
        File Input Format Counters
                Bytes Read=0
        File Output Format Counters
                Bytes Written=0
15/05/26 19:54:25 INFO ipc.Client: Retrying connect to server: i-love-you/192.168.1.10:52381. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
15/05/26 19:54:26 INFO ipc.Client: Retrying connect to server: i-love-you/192.168.1.10:52381. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
15/05/26 19:54:27 INFO ipc.Client: Retrying connect to server: i-love-you/192.168.1.10:52381. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
15/05/26 19:54:28 INFO mapred.ClientServiceDelegate: Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
15/05/26 19:55:04 INFO mapreduce.ExportJobBase: Transferred 673 bytes in 505.5613 seconds (1.3312 bytes/sec)
15/05/26 19:55:05 INFO mapreduce.ExportJobBase: Exported 1 records.
[root@i-love-you sqoop]#




传输完毕,在windows的mysql里面的数据:


这个表在传输之前创建好:
mysql> create table bbs_info(log_date varchar(10) , pv varchar(10) , register varchar(10) , ip varch
ar(10) , jumper varchar(10));
Query OK, 0 rows affected (0.17 sec)
查询传输完的数据:


mysql> select * from bbs_info;
+------------+--------+----------+------+--------+
| log_date   | pv     | register | ip   | jumper |
+------------+--------+----------+------+--------+
| 2015-05-25 | 170647 | 28       | 9584 | 3367   |
+------------+--------+----------+------+--------+
1 row in set (0.00 sec)


mysql>








---------------------------------------------导入第二张表:


[root@i-love-you sqoop]# bin/sqoop export --connect jdbc:mysql://192.168.1.1:3306/bbs --username root --password mysqladmin --table bbs_forum --fields-terminated-by '\\001' --export-dir '/hive/bs_forum';
Warning: /usr/lib/hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
15/05/26 20:01:10 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
15/05/26 20:01:10 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
15/05/26 20:01:10 INFO tool.CodeGenTool: Beginning code generation
15/05/26 20:01:12 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `bbs_forum` AS t LIMIT 1
15/05/26 20:01:12 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `bbs_forum` AS t LIMIT 1
15/05/26 20:01:12 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/hadoop
注: /tmp/sqoop-root/compile/179d8b142d44860aa72cdc89c80f4355/bbs_forum.java使用或覆盖了已过时的 API。
注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。
15/05/26 20:01:17 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/179d8b142d44860aa72cdc89c80f4355/bbs_forum.jar
15/05/26 20:01:17 INFO mapreduce.ExportJobBase: Beginning export of bbs_forum
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hbase-0.99.2/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
15/05/26 20:01:18 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
15/05/26 20:01:23 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
15/05/26 20:01:23 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
15/05/26 20:01:23 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
15/05/26 20:01:23 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
15/05/26 20:01:35 INFO input.FileInputFormat: Total input paths to process : 1
15/05/26 20:01:35 INFO input.FileInputFormat: Total input paths to process : 1
15/05/26 20:01:35 INFO mapreduce.JobSubmitter: number of splits:4
15/05/26 20:01:36 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1432620593569_0015
15/05/26 20:01:38 INFO impl.YarnClientImpl: Submitted application application_1432620593569_0015
15/05/26 20:01:38 INFO mapreduce.Job: The url to track the job: http://i-love-you:8088/proxy/application_1432620593569_0015/
15/05/26 20:01:38 INFO mapreduce.Job: Running job: job_1432620593569_0015
15/05/26 20:02:06 INFO mapreduce.Job: Job job_1432620593569_0015 running in uber mode : false
15/05/26 20:02:07 INFO mapreduce.Job:  map 0% reduce 0%
15/05/26 20:07:20 INFO mapreduce.Job:  map 100% reduce 0%
15/05/26 20:08:31 INFO mapreduce.Job: Job job_1432620593569_0015 completed successfully
15/05/26 20:08:45 INFO mapreduce.Job: Counters: 30
        File System Counters
                FILE: Number of bytes read=0
                FILE: Number of bytes written=455120
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=5913
                HDFS: Number of bytes written=0
                HDFS: Number of read operations=19
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=0
        Job Counters
                Launched map tasks=4
                Data-local map tasks=4
                Total time spent by all maps in occupied slots (ms)=1384052
                Total time spent by all reduces in occupied slots (ms)=0
                Total time spent by all map tasks (ms)=1384052
                Total vcore-seconds taken by all map tasks=1384052
                Total megabyte-seconds taken by all map tasks=1417269248
        Map-Reduce Framework
                Map input records=105
                Map output records=105
                Input split bytes=576
                Spilled Records=0
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=4363
                CPU time spent (ms)=10000
                Physical memory (bytes) snapshot=242823168
                Virtual memory (bytes) snapshot=3363196928
                Total committed heap usage (bytes)=62914560
        File Input Format Counters
                Bytes Read=0
        File Output Format Counters
                Bytes Written=0
15/05/26 20:08:46 INFO ipc.Client: Retrying connect to server: i-love-you/192.168.1.10:52245. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
15/05/26 20:08:47 INFO ipc.Client: Retrying connect to server: i-love-you/192.168.1.10:52245. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
15/05/26 20:08:48 INFO ipc.Client: Retrying connect to server: i-love-you/192.168.1.10:52245. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
15/05/26 20:08:49 INFO mapred.ClientServiceDelegate: Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
15/05/26 20:09:01 INFO mapreduce.ExportJobBase: Transferred 5.7744 KB in 457.5735 seconds (12.9225 bytes/sec)
15/05/26 20:09:04 INFO mapreduce.ExportJobBase: Exported 105 records.
[root@i-love-you sqoop]#






----------------------------------mysql事先创建的表和传输后的数据
mysql> create table bbs_forum(log_date varchar(10) , forum varchar(10) ,ip varchar(10) ,pv varc
0));
Query OK, 0 rows affected (0.17 sec)


mysql> select * from bbs_forum;
+------------+-------+------+------+
| log_date   | forum | ip   | pv   |
+------------+-------+------+------+
| 2015-05-25 | 10    | 7    | 7    |
| 2015-05-25 | 100   | 21   | 17   |
| 2015-05-25 | 101   | 46   | 37   |
| 2015-05-25 | 102   | 69   | 45   |
| 2015-05-25 | 103   | 29   | 25   |
| 2015-05-25 | 104   | 15   | 13   |
| 2015-05-25 | 105   | 11   | 11   |
| 2015-05-25 | 106   | 23   | 21   |
| 2015-05-25 | 107   | 345  | 169  |
| 2015-05-25 | 108   | 100  | 69   |
| 2015-05-25 | 109   | 104  | 61   |
| 2015-05-25 | 11    | 4    | 3    |
| 2015-05-25 | 111   | 415  | 194  |
| 2015-05-25 | 112   | 29   | 23   |
| 2015-05-25 | 113   | 3    | 2    |
| 2015-05-25 | 114   | 11   | 10   |
| 2015-05-25 | 115   | 6    | 6    |
| 2015-05-25 | 116   | 7    | 7    |
| 2015-05-25 | 117   | 6    | 6    |
| 2015-05-25 | 118   | 8    | 8    |
| 2015-05-25 | 119   | 12   | 5    |
| 2015-05-25 | 121   | 4    | 3    |
| 2015-05-25 | 122   | 562  | 88   |
| 2015-05-25 | 123   | 15   | 13   |
| 2015-05-25 | 124   | 3    | 2    |
| 2015-05-25 | 34    | 3    | 3    |
| 2015-05-25 | 36    | 7    | 7    |
| 2015-05-25 | 37    | 6    | 6    |
| 2015-05-25 | 39    | 1    | 1    |
| 2015-05-25 | 4     | 5    | 4    |
| 2015-05-25 | 40    | 5    | 5    |
| 2015-05-25 | 41    | 4    | 4    |
| 2015-05-25 | 43    | 3    | 3    |
| 2015-05-25 | 44    | 2    | 2    |
| 2015-05-25 | 46    | 25   | 18   |
| 2015-05-25 | 47    | 348  | 280  |
| 2015-05-25 | 51    | 2    | 2    |
| 2015-05-25 | 52    | 7    | 6    |
| 2015-05-25 | 53    | 47   | 33   |
| 2015-05-25 | 54    | 54   | 43   |
| 2015-05-25 | 55    | 44   | 22   |
| 2015-05-25 | 56    | 19   | 16   |
| 2015-05-25 | 57    | 27   | 25   |
| 2015-05-25 | 58    | 10   | 10   |
| 2015-05-25 | 59    | 5    | 5    |
| 2015-05-25 | 60    | 60   | 51   |
| 2015-05-25 | 61    | 46   | 36   |
| 2015-05-25 | 62    | 15   | 12   |
| 2015-05-25 | 63    | 2    | 2    |
| 2015-05-25 | 64    | 4    | 4    |
| 2015-05-25 | 65    | 48   | 37   |
| 2015-05-25 | 66    | 22   | 22   |
| 2015-05-25 | 125   | 7    | 7    |
| 2015-05-25 | 126   | 31   | 17   |
| 2015-05-25 | 128   | 14   | 7    |
| 2015-05-25 | 129   | 25   | 23   |
| 2015-05-25 | 130   | 42   | 31   |
| 2015-05-25 | 131   | 54   | 36   |
| 2015-05-25 | 132   | 13   | 10   |
| 2015-05-25 | 133   | 18   | 11   |
| 2015-05-25 | 136   | 3    | 2    |
| 2015-05-25 | 138   | 4    | 1    |
| 2015-05-25 | 14    | 6    | 5    |
| 2015-05-25 | 141   | 6    | 5    |
| 2015-05-25 | 142   | 68   | 35   |
| 2015-05-25 | 143   | 7    | 7    |
| 2015-05-25 | 144   | 10   | 10   |
| 2015-05-25 | 145   | 108  | 39   |
| 2015-05-25 | 15    | 1    | 1    |
| 2015-05-25 | 16    | 6    | 5    |
| 2015-05-25 | 17    | 1    | 1    |
| 2015-05-25 | 18    | 1    | 1    |
| 2015-05-25 | 19    | 8    | 8    |
| 2015-05-25 | 2     | 4    | 4    |
| 2015-05-25 | 21    | 1    | 1    |
| 2015-05-25 | 26    | 3    | 3    |
| 2015-05-25 | 31    | 3    | 3    |
| 2015-05-25 | 32    | 6    | 6    |
| 2015-05-25 | 67    | 2    | 2    |
| 2015-05-25 | 68    | 2    | 2    |
| 2015-05-25 | 69    | 2    | 2    |
| 2015-05-25 | 70    | 4    | 3    |
| 2015-05-25 | 71    | 5    | 4    |
| 2015-05-25 | 72    | 9    | 8    |
| 2015-05-25 | 73    | 4    | 3    |
| 2015-05-25 | 76    | 4    | 4    |
| 2015-05-25 | 78    | 2    | 2    |
| 2015-05-25 | 79    | 2    | 2    |
| 2015-05-25 | 8     | 2    | 2    |
| 2015-05-25 | 82    | 1    | 1    |
| 2015-05-25 | 83    | 3    | 3    |
| 2015-05-25 | 84    | 1    | 1    |
| 2015-05-25 | 85    | 1    | 1    |
| 2015-05-25 | 86    | 1    | 1    |
| 2015-05-25 | 9     | 1    | 1    |
| 2015-05-25 | 90    | 37   | 27   |
| 2015-05-25 | 91    | 43   | 35   |
| 2015-05-25 | 92    | 31   | 25   |
| 2015-05-25 | 93    | 38   | 25   |
| 2015-05-25 | 94    | 14   | 14   |
| 2015-05-25 | 95    | 13   | 10   |
| 2015-05-25 | 96    | 3    | 3    |
| 2015-05-25 | 97    | 22   | 14   |
| 2015-05-25 | 98    | 6    | 6    |
| 2015-05-25 | 99    | 41   | 37   |
+------------+-------+------+------+
105 rows in set (0.00 sec)

mysql>



以上是关于如何周期性把每天日志导入hive的主要内容,如果未能解决你的问题,请参考以下文章

sql server 2000 日志备份

什么是 Ansible 日志轮换周期?

shell定时采集数据到HDFS

Linux中的日志功能

linux中的计划任务Crontab

volume 生命周期管理 - 每天5分钟玩转 Docker 容器技术(44)