使用python脚本提取数据
Posted sgqhappy
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了使用python脚本提取数据相关的知识,希望对你有一定的参考价值。
版权声明:本文为博主原创文章,转载请注明出处:https://www.cnblogs.com/sgqhappy/p/9956956.html
我们经常用到数据提取的Hive Sql的编写,每次数据提取都得进行hive的编写,为了将这种重复性强的运行命令简单化自动化人性化,我特地编写了一个python脚本,可以实现数据清洗,数据处理,计数下发,读写文件,保存日志等功能。
1. 导包
1 #!/usr/bin/python 2 #coding:utf-8 3 4 ‘‘‘ 5 Made by sgqhappy 6 Date: 20181113 7 function: data extract 8 ‘‘‘ 9 10 from subprocess import Popen,PIPE 11 import os 12 import sys 13 import io 14 import re 15 import commands 16 import logging 17 from logging import handlers 18 from re import match
2. 定义一个类,用来打印脚本运行的log日志
日志既可以打印在控制台上,也可以输出到log文件。
1 class Logger(object): 2 def __init__(self,log_file_name,log_level,logger_name): 3 self.__logger = logging.getLogger(logger_name); 4 self.__logger.setLevel(log_level); 5 file_handler = logging.FileHandler(log_file_name); 6 console_handler = logging.StreamHandler(); 7 8 #set log format and show log at console and log_file. 9 LOG_FORMAT = "%(asctime)s - %(pathname)s[line:%(lineno)d] - %(levelname)s : %(message)s"; 10 formatter = logging.Formatter(LOG_FORMAT); 11 12 file_handler.setFormatter(formatter); 13 console_handler.setFormatter(formatter); 14 15 self.__logger.addHandler(file_handler); 16 self.__logger.addHandler(console_handler); 17 18 def get_log(self): 19 return self.__logger;
3. 定义文件名及文件路径
1 #This is file name. 2 file_name = "%s_%s_%s" % (sys.argv[2],sys.argv[4],sys.argv[11]); 3 info_log_path = ‘/python_test/%s.info.log‘ % (file_name); 4 5 #this is record name and path. 6 record_name = "data_extract_record.txt"; 7 record_path = "/python_test/"; 8 9 logger = Logger(log_file_name="%s" % (info_log_path),log_level=logging.DEBUG,logger_name="myLogger").get_log(); 10 11 #this is log path. 12 path = ‘/python_test/%s.desc.log‘ % (file_name); 13 logger.info(" "); 14 logger.info("log path: %s" % (path)); 15 logger.info(" ");
4. 提取字段信息保存
1 #function:write all fields to log file. 2 hive_cmd_desc = ‘beeline -u ip -n username -e "desc %s.%s" >> %s‘ % (sys.argv[1],sys.argv[2],path); 3 logger.info(hive_cmd_desc); 4 logger.info(" "); 5 status,output = commands.getstatusoutput(hive_cmd_desc); 6 logger.info(output); 7 logger.info(" "); 8 9 #logger.info success or failed information. 10 if status ==0: 11 logger.info("desc %s to %s successful!" % (sys.argv[2],path)); 12 else: 13 #set color: ‘ 33[;31;40m‘+...+‘ 33[0m‘ 14 logger.error(‘