大数据处理之道（十分钟学会Python）

Posted 2020-09-27

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了大数据处理之道（十分钟学会Python）相关的知识，希望对你有一定的参考价值。

（0）文件夹

高速学Python 和易犯错误（文本处理）

Python文本处理和Java/C比对

十分钟学会Python的基本类型

高速学会Python（实战）

大数据处理之道（十分钟学会Python）

一：python 简单介绍

（1）Python的由来

Python（英语发音：/?pa?θ?n/）, 是一种面向对象、解释型计算机程序设计语言，由Guido van Rossum于1989年底发明，第一个公开发行版发行于1991

年。Python语法简洁而清晰，具有丰富和强大的类库。它常被昵称为胶水语言，它可以把用其它语言制作的各种模块（尤其是C/C++）非常轻松地联结

在一起。常见的一种应用情形是，使用Python高速生成程序的原型（有时甚至是程序的终于界面）。然后对当中有特别要求的部分，用更合适的语言改写，

比如3D游戏中的图形渲染模块，性能要求特别高。就能够用C++重写。

（2）Python 语法简单介绍 ---- 类型转化

int(x [,base ]) 将x转换为一个整数
long(x [,base ]) 将x转换为一个长整数
float(x ) 将x转换到一个浮点数
complex(real [,imag ]) 创建一个复数
str(x ) 将对象 x 转换为字符串
repr(x ) 将对象 x 转换为表达式字符串
eval(str ) 用来计算在字符串中的有效Python表达式,并返回一个对象
tuple(s ) 将序列 s 转换为一个元组
list(s ) 将序列 s 转换为一个列表
chr(x ) 将一个整数转换为一个字符
unichr(x ) 将一个整数转换为Unicode字符
ord(x ) 将一个字符转换为它的整数值
hex(x ) 将一个整数转换为一个十六进制字符串
oct(x ) 将一个整数转换为一个八进制字符串

（3）Python 语法简单介绍 ---- 类型转化

s + r 序列连接
s * n , n * s s的 n 次拷贝,n为整数
s % d 字符串格式化(仅字符串)
s[i] 索引
s[i :j ] 切片
x in s , x not in s 从属关系
for x in s : 迭代
len(s) 长度
min(s) 最小元素
max(s) 最大元素
s[i ] = x 为s[i]又一次赋值
s[i :j ] = r 将列表片段又一次赋值
del s[i ] 删除列表中一个元素
del s[i :j ] 删除列表中一个片段

（4）（3）Python 语法简单介绍 ---- 类型转化

x >> y 右移
x & y 按位与
x | y 按位或
x ^ y 按位异或 (exclusive or)
~x 按位翻转
x + y 加
x - y 减
x * y 乘
x / y 常规除
x // y 地板除
x ** y 乘方 (xy )
x % y 取模 (x mod y )
-x 改变操作数的符号位
+x 什么也不做
~x ~x=-(x+1)
abs(x ) 绝对值
divmod(x ,y ) 返回 (int(x / y ), x % y )
pow(x ,y [,modulo ]) 返回 (x ** y ) x % modulo
round(x ,[n]) 四舍五入。n为小数点位数
x < y 小于
x > y 大于
x == y 等于
x != y 不等于(与<>同样)
x >= y 大于等于
x <= y 小于等于

二：python应用

（1）文件处理

filename = raw_input(‘Enter your file name‘)  #输入要遍历读取的文件路径及文件名称
file = open(filename,‘r‘)
done = 0
while not  done:
        aLine = file.readline()
        if(aLine != ‘‘):
            print aLine,
        else:
            done = 1
file.close()   #关闭文件

解释：

.readline() 和 .readlines() 之间的差异是后者一次读取整个文件，.readlines() 自己主动将文件内容分析成一个行的列表，该列表能够由 Python 的 for ... in ... 结构

进行处理。还有一方面。.readline() 每次仅仅读取一行，通常比 .readlines() 慢得多。

仅当没有足够内存能够一次读取整个文件时，才应该使用 .readline()。

假设Python文件读到了文件尾，则会返回一个空字符串‘’。而假设是读到一个空行的话。则会返回一个‘\n’

Python的readline（）方法，每行最后都会加上一个换行字符‘\n’。有时候有的文件最后一行没有以‘\n‘结尾时，不返回‘\n’。

readlines（）方法返回的是一个列表，而readline（）返回一个字符串。

（2）错误处理

Python报错TypeError: ‘str‘ object is not callable
当一般内部函数被用作变量名后可能出现此错误。比方：
range=1
for i in range(0,1):
………
就会报这种错误
这种错会报在for行。可是时间引起的原因却是在range=1这行，假设两行相距较远，怎非常难被发现。所以要特别注意不要用内部已有的变量和函数名作自己定义变量名。或者str被预先定义了
str=10
for i in range(1,10):

print str(i)

(3) 综合应用，文件读取，控制台读取，时间转化，编码转换

import time
from time import strftime
import sys
reload(sys)
sys.setdefaultencoding(‘utf8‘)
# -*- coding: cp936 -*-
print ("Hello, Python!")
#!/usr/bin/python
a = 21
b = 10
c = 0

c = a + b
print "Line 1 - Value of c is ", c

c = a - b
print "Line 2 - Value of c is ", c 

c = a * b
print "Line 3 - Value of c is ", c 

c = a / b
print "Line 4 - Value of c is ", c 

c = a % b
print "Line 5 - Value of c is ", c

a = 2
b = 3
c = a**b 
print "Line 6 - Value of c is ", c

a = 10
b = 5
c = a//b 
print "Line 7 - Value of c is ", c
# for repeat its
list = [2, 4, 6, 8]
sum = 0
for num in list:
    sum = sum + num
print("The sum is:", sum)
# print and Input, assignment
print("Hello, I‘m Python!")

name = input(‘What is your name?
\n‘)
print(‘Hi, %s.‘ % name)

# test for
fruits = [‘Banana‘, ‘Apple‘, ‘Lime‘]
loud_fruits = [fruit.upper() for fruit in fruits]
print(loud_fruits)

# open, write and read file
fo = open("./tmp/foo.txt","w+")
fo.write("Python is a gerat language.\nYeah its great!!\nI am zhang yapeng, who are you?\n")
t_str = u‘我是张燕鹏，您是什么货色？‘
print(t_str)
fo.write(t_str)
fo.close()

#read and write
fr = open("./tmp/foo1.txt","r+")
fw = open("foo_rw.txt","wb")
done = 0;
localtime = time.asctime(time.localtime(time.time()))
print "Local current time : ", localtime
fw.write(localtime + "\n")
while not done:
    t_str = fr.readline()
    if(t_str != ‘‘):
        print "Read String is : ", t_str
        fw.write(t_str)
    else:
        done = 1
fr.close()
fw.close()

# test time (import)
localtime = time.localtime(time.time())
print "Local current time : ", localtime
# format the time from time import strftime
t_time = strftime( ‘%Y-%m-%d %H:%M:%S‘, localtime)
print "formatting local current time : ", t_time
# design the time by yourself
year = str(localtime.tm_year)
mon = str(localtime.tm_mon)
day = str(localtime.tm_mday)
hour = str(localtime.tm_hour)
mins = str(localtime.tm_min)
sec = str(localtime.tm_sec)
newtime = u"时间是： " + year + "年" + mon + "月" + day + "日 " + hour + ":" + mins + ":" + sec
print "Local current time : ", newtime

（4）执行图：