python6

Posted 2023-04-11 林木森3

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了python6相关的知识，希望对你有一定的参考价值。

第一部分

代码1

import pandas as pd
from sqlalchemy import create_engine

engine = create_engine(\'mysql+pymysql://root:102011@localhost/test?charset=utf8\')
sql = pd.read_sql(\'all_gzdata\', engine, chunksize = 10000)
\'\'\'
用create_engine建立连接，连接地址的意思依次为“数据库格式（mysql）+程序名（pymysql）+账号密码@地址端口/数据库名（test）”，最后指定编码为utf8；
all_gzdata是表名，engine是连接数据的引擎，chunksize指定每次读取1万条记录。这时候sql是一个容器，未真正读取数据。
\'\'\'

代码2

counts = [ i[\'fullURLId\'].value_counts() for i in sql] #按次10000存取，逐块统计
counts = counts.copy()
counts = pd.concat(counts).groupby(level=0).sum() #合并统计结果，把相同的统计项合并（即按index分组并求和）
counts = counts.reset_index() #重新设置index，将原来的index作为counts的一列。
counts.columns = [\'index\', \'num\'] #重新设置列名，主要是第二列，默认为0
counts[\'type\'] = counts[\'index\'].str.extract(\'(\\d3)\') #提取前三个数字作为类别id
counts_ = counts[[\'type\', \'num\']].groupby(\'type\').sum() #按类别合并
counts_.sort_values(\'num\', ascending = False) #降序排列
counts_[\'percentage\'] = (counts_[\'num\']/counts_[\'num\'].sum())*100
print(counts_)

type      num            percentage
101      411665         49.156965
102       17357          2.072601
103        1715          0.204788
106        3957          0.472506
107      182900         21.840110
199      201426         24.052302
301       18430          2.200728

代码3

#统计107类别的情况
def count107(i): #自定义统计函数
  j = i[[\'fullURL\']][i[\'fullURLId\'].str.contains(\'107\')].copy() #找出类别包含107的网址
  j[\'type\'] = None #添加空列
  j[\'type\'][j[\'fullURL\'].str.contains(\'info/.+?/\')] = u\'知识首页\' #info以/结尾
  j[\'type\'][j[\'fullURL\'].str.contains(\'info/.+?/.+?\')] = u\'知识列表页\'
  j[\'type\'][j[\'fullURL\'].str.contains(\'/\\d+?_*\\d+?\\.html\')] = u\'知识内容页\'
  return j[\'type\'].value_counts()

engine = create_engine(\'mysql+pymysql://root:102011@localhost/test?charset=utf8\')
sql = pd.read_sql(\'all_gzdata\', engine, chunksize = 10000)

counts2 = [count107(i) for i in sql] #逐块统计
counts2 = pd.concat(counts2).groupby(level=0).sum() #合并统计结果
print(counts2)
#计算各个部分的占比
res107 = pd.DataFrame(counts2)
# res107.reset_index(inplace=True)
res107.index.name= u\'107类型\'
res107.rename(columns=\'type\':\'num\',inplace=True)
res107[u\'比例\'] = (res107[\'num\']/res107[\'num\'].sum())*100
res107.reset_index(inplace = True)
print(res107)

知识内容页    164243
知识列表页      9656
知识首页       9001
Name: type, dtype: int64
   107类型     num         比例
0  知识内容页  164243     89.799344
1  知识列表页    9656      5.279388
2   知识首页     9001      4.921268

代码4

def countquestion(i):  # 自定义统计函数
    j = i[[\'fullURLId\']][i[\'fullURL\'].str.contains(\'\\?\')].copy()  # 找出类别包含107的网址
    return j

#engine = create_engine(\'mysql+pymysql://root:123456@127.0.0.1:3306/test?charset=utf8\')
sql = pd.read_sql(\'all_gzdata\', engine, chunksize = 10000)

counts3 = [countquestion(i)[\'fullURLId\'].value_counts() for i in sql]
counts3 = pd.concat(counts3).groupby(level=0).sum()
print(counts3)

# 求各个类型的占比并保存数据
df1 =  pd.DataFrame(counts3)
df1[\'perc\'] = df1[\'fullURLId\']/df1[\'fullURLId\'].sum()*100
df1.sort_values(by=\'fullURLId\',ascending=False,inplace=True)
print(df1.round(4))

101003        47
102002        25
107001       346
1999001    64718
301001       356
Name: fullURLId, dtype: int64
         fullURLId     perc
1999001      64718  98.8182
301001         356   0.5436
107001         346   0.5283
101003          47   0.0718
102002          25   0.0382

def page199(i):  # 自定义统计函数
  j = i[[\'fullURL\', \'pageTitle\']][(i[\'fullURLId\'].str.contains(\'199\')) &
                                  (i[\'fullURL\'].str.contains(\'\\?\'))]
  j[\'pageTitle\'].fillna(\'空\', inplace=True)
  j[\'type\'] = \'其他\'  # 添加空列
  j[\'type\'][j[\'pageTitle\'].str.contains(\'法律快车-律师助手\')] = \'法律快车-律师助手\'
  j[\'type\'][j[\'pageTitle\'].str.contains(\'咨询发布成功\')] = \'咨询发布成功\'
  j[\'type\'][j[\'pageTitle\'].str.contains(\'免费发布法律咨询\')] = \'免费发布法律咨询\'
  j[\'type\'][j[\'pageTitle\'].str.contains(\'法律快搜\')] = \'快搜\'
  j[\'type\'][j[\'pageTitle\'].str.contains(\'法律快车法律经验\')] = \'法律快车法律经验\'
  j[\'type\'][j[\'pageTitle\'].str.contains(\'法律快车法律咨询\')] = \'法律快车法律咨询\'
  j[\'type\'][(j[\'pageTitle\'].str.contains(\'_法律快车\')) |
            (j[\'pageTitle\'].str.contains(\'-法律快车\'))] = \'法律快车\'
  j[\'type\'][j[\'pageTitle\'].str.contains(\'空\')] = \'空\'

  return j


# 注意：获取一次sql对象就需要重新访问一下数据库
# engine = create_engine(\'mysql+pymysql://root:123456@127.0.0.1:3306/test?charset=utf8\')
sql = pd.read_sql(\'all_gzdata\', engine, chunksize=10000)  # 分块读取数据库信息
# sql = pd.read_sql_query(\'select * from all_gzdata limit 10000\', con=engine)

counts4 = [page199(i) for i in sql]  # 逐块统计
counts4 = pd.concat(counts4)
d1 = counts4[\'type\'].value_counts()
print(d1)
d2 = counts4[counts4[\'type\'] == \'其他\']
print(d2)
# 求各个部分的占比并保存数据
df1_ = pd.DataFrame(d1)
df1_[\'perc\'] = df1_[\'type\'] / df1_[\'type\'].sum() * 100
df1_.sort_values(by=\'type\', ascending=False, inplace=True)
print(df1_)

法律快车-律师助手    49894
法律快车法律咨询      6421
咨询发布成功         5220
快搜                1943
法律快车              818
其他                 359
法律快车法律经验        59
空                     4
Name: type, dtype: int64
                                                fullURL  \\
2631  http://www.lawtime.cn/spelawyer/index.php?py=g...   
2632  http://www.lawtime.cn/spelawyer/index.php?py=g...   
1677  http://m.baidu.com/from=844b/bd_page_type=1/ss...   
4303  http://m.baidu.com/from=0/bd_page_type=1/ssid=...   
3673  http://www.lawtime.cn/lawyer/lll25879862593080...   
...                                                 ...   
4829  http://www.lawtime.cn/spelawyer/index.php?m=se...   
4837  http://www.lawtime.cn/spelawyer/index.php?m=se...   
4842  http://www.lawtime.cn/spelawyer/index.php?m=se...   
8302  http://www.lawtime.cn/spelawyer/index.php?m=se...   
5034  http://www.baidu.com/link?url=O7iBD2KmoJdkHWTZ...   

                                     pageTitle type  
2631   个旧律师成功案例 - 法律快车提供个旧知名律师、优秀律师、专业律师的咨询和推荐   其他  
2632   个旧律师成功案例 - 法律快车提供个旧知名律师、优秀律师、专业律师的咨询和推荐   其他  
1677                          婚姻法论文 - 法律快车法律论文                  其他  
4303                什么是机动车？什么是非机动车？ - 法律快车交通事故            其他  
3673                          404错误提示页面 - 法律快车                     其他  
...                                        ...  ...  
4829  律师搜索,律师查找 - 法律快车提供全国知名律师、优秀律师、专业律师的咨询和推荐   其他  
4837  律师搜索,律师查找 - 法律快车提供全国知名律师、优秀律师、专业律师的咨询和推荐   其他  
4842  律师搜索,律师查找 - 法律快车提供全国知名律师、优秀律师、专业律师的咨询和推荐   其他  
8302  律师搜索,律师查找 - 法律快车提供全国知名律师、优秀律师、专业律师的咨询和推荐   其他  
5034                 离婚协议书范本（2015年版） - 法律快车婚姻法               其他  

[359 rows x 3 columns]
     type         1999001 总数       perc
法律快车-律师助手       49894        77.094471
法律快车法律咨询         6421         9.921506
咨询发布成功            5220         8.065762
快搜                   1943         3.002256
法律快车                 818         1.263945
其他                    359         0.554714
法律快车法律经验           59         0.091165
空                        4         0.006181

代码6

#6无目的用户
def xiaguang(i):  # 自定义统计函数
  j = i.loc[(i[\'fullURL\'].str.contains(\'\\.html\')) == False,
  [\'fullURL\', \'fullURLId\', \'pageTitle\']]
  return j

# 注意获取一次sql对象就需要重新访问一下数据库
engine = create_engine(\'mysql+pymysql://root:102011@127.0.0.1:3306/test?charset=utf8\')
sql = pd.read_sql(\'all_gzdata\', engine, chunksize=10000)  # 分块读取数据库信息

counts5 = [xiaguang(i) for i in sql]
counts5 = pd.concat(counts5)

xg1 = counts5[\'fullURLId\'].value_counts()
print(xg1)
# 求各个部分的占比
xg_ = pd.DataFrame(xg1)
xg_.reset_index(inplace=True)
xg_.columns = [\'index\', \'num\']
xg_[\'perc\'] = xg_[\'num\'] / xg_[\'num\'].sum() * 100
xg_.sort_values(by=\'num\', ascending=False, inplace=True)

xg_[\'type\'] = xg_[\'index\'].str.extract(\'(\\d3)\')  # 提取前三个数字作为类别id

xgs_ = xg_[[\'type\', \'num\']].groupby(\'type\').sum()  # 按类别合并
xgs_.sort_values(by=\'num\', ascending=False, inplace=True)  # 降序排列
xgs_[\'percentage\'] = xgs_[\'num\'] / xgs_[\'num\'].sum() * 100

print(xgs_.round(4))

1999001    117124
107001      17843
102002      12021
101001       5603
106001       3957
102001       2129
102003       1235
301001       1018
101009        854
102007        538
102008        404
101008        378
102004        361
102005        271
102009        214
102006        184
101004        125
101006        107
101005         63
Name: fullURLId, dtype: int64
type     num     percentage                   
199    117124     71.2307
107     17843     10.8515
102     17357     10.5559
101      7130      4.3362
106      3957      2.4065
301      1018      0.6191

代码7

# 分析网页点击次数
# 统计点击次数
engine = create_engine(\'mysql+pymysql://root:102011@127.0.0.1:3306/test?charset=utf8\')
sql = pd.read_sql(\'all_gzdata\', engine, chunksize = 10000)# 分块读取数据库信息

counts1 = [i[\'realIP\'].value_counts() for i in sql] # 分块统计各个IP的出现次数
counts1 = pd.concat(counts1).groupby(level=0).sum() # 合并统计结果，level=0表示按照index分组
print(counts1)

counts1_ = pd.DataFrame(counts1)
counts1_
counts1[\'realIP\'] = counts1.index.tolist()

counts1_[1]=1  # 添加1列全为1
hit_count = counts1_.groupby(\'realIP\').sum()  # 统计各个“不同点击次数”分别出现的次数
# 也可以使用counts1_[\'realIP\'].value_counts()功能
hit_count.columns=[\'用户数\']
hit_count.index.name = \'点击次数\'

# 统计1~7次、7次以上的用户人数
hit_count.sort_index(inplace = True)
hit_count_7 = hit_count.iloc[:7,:]
time = hit_count.iloc[7:,0].sum()  # 统计点击次数7次以上的用户数
hit_count_7 = hit_count_7.append([\'用户数\':time], ignore_index=True)
hit_count_7.index = [\'1\',\'2\',\'3\',\'4\',\'5\',\'6\',\'7\',\'7次以上\']
hit_count_7[\'用户比例\'] = hit_count_7[\'用户数\'] / hit_count_7[\'用户数\'].sum()
print(hit_count_7)
82033         2
95502         1
103182        1
116010        2
136206        1
             ..
4294809358    2
4294811150    1
4294852154    3
4294865422    2
4294917690    1
Name: realIP, Length: 230149, dtype: int64
         用户数      用户比例
1       132119    0.574059
2        44175    0.191941
3        17573    0.076355
4        10156    0.044128
5         5952    0.025862
6         4132    0.017954
7         2632    0.011436
7次以上   13410    0.058267

代码8

# 分析浏览一次的用户行为

engine = create_engine(\'mysql+pymysql://root:102011@127.0.0.1:3306/test?charset=utf8\')
all_gzdata = pd.read_sql_table(\'all_gzdata\', con = engine)  # 读取all_gzdata数据

#对realIP进行统计
# 提取浏览1次网页的数据
real_count = pd.DataFrame(all_gzdata.groupby("realIP")["realIP"].count())
real_count.columns = ["count"]
real_count["realIP"] = real_count.index.tolist()
user_one = real_count[(real_count["count"] == 1)]  # 提取只登录一次的用户
# 通过realIP与原始数据合并
real_one = pd.merge(user_one, all_gzdata,right_on=\'realIP\',left_index=True,how=\'left\')

# 统计浏览一次的网页类型
URL_count = pd.DataFrame(real_one.groupby("fullURLId")["fullURLId"].count())
URL_count.columns = ["count"]
URL_count.sort_values(by=\'count\', ascending=False, inplace=True)  # 降序排列
# 统计排名前4和其他的网页类型
URL_count_4 = URL_count.iloc[:4,:]
time = hit_count.iloc[4:,0].sum()  # 统计其他的
URLindex = URL_count_4.index.values
URL_count_4 = URL_count_4.append([\'count\':time], ignore_index=True)
URL_count_4.index = [URLindex[0], URLindex[1], URLindex[2], URLindex[3],
                     \'其他\']
URL_count_4[\'比例\'] = URL_count_4[\'count\'] / URL_count_4[\'count\'].sum()
print(URL_count_4)
count        比例
101003   102560     0.649011
107001    19443     0.123037
1999001    9381     0.059364
301001      515     0.003259
其他       26126     0.165328

代码9

# 在浏览1次的前提下, 得到的网页被浏览的总次数
fullURL_count = pd.DataFrame(real_one.groupby("fullURL")["fullURL"].count())
fullURL_count.columns = ["count"]
fullURL_count["fullURL"] = fullURL_count.index.tolist()
fullURL_count.sort_values(by=\'count\', ascending=False, inplace=True)  # 降序排列
print(fullURL_count.head(10))
count  \\
fullURL                                                     
http://www.lawtime.cn/info/shuifa/slb/201211197...   1013   
http://www.lawtime.cn/info/hunyin/lhlawlhxy/201...    501   
http://www.lawtime.cn/ask/question_925675.html        423   
http://www.lawtime.cn/info/shuifa/slb/201211197...    367   
http://www.lawtime.cn/ask/exp/13655.html              301   
http://www.lawtime.cn/ask/exp/8495.html               241   
http://www.lawtime.cn/ask/exp/13445.html              199   
http://www.lawtime.cn/guangzhou                       177   
http://www.lawtime.cn/ask/exp/17357.html              171   
http://www.lawtime.cn/citylist.php                    117   

                                                                                              fullURL  
fullURL                                                                                                
http://www.lawtime.cn/info/shuifa/slb/201211197...  http://www.lawtime.cn/info/shuifa/slb/20121119...  
http://www.lawtime.cn/info/hunyin/lhlawlhxy/201...  http://www.lawtime.cn/info/hunyin/lhlawlhxy/20...  
http://www.lawtime.cn/ask/question_925675.html         http://www.lawtime.cn/ask/question_925675.html  
http://www.lawtime.cn/info/shuifa/slb/201211197...  http://www.lawtime.cn/info/shuifa/slb/20121119...  
http://www.lawtime.cn/ask/exp/13655.html                     http://www.lawtime.cn/ask/exp/13655.html  
http://www.lawtime.cn/ask/exp/8495.html                       http://www.lawtime.cn/ask/exp/8495.html  
http://www.lawtime.cn/ask/exp/13445.html                     http://www.lawtime.cn/ask/exp/13445.html  
http://www.lawtime.cn/guangzhou                                       http://www.lawtime.cn/guangzhou  
http://www.lawtime.cn/ask/exp/17357.html                     http://www.lawtime.cn/ask/exp/17357.html  
http://www.lawtime.cn/citylist.php                                 http://www.lawtime.cn/citylist.php

第二部分

代码10

import os
import re
import pandas as pd
import pymysql as pm
from random import sample

# 修改工作路径到指定文件夹
os.chdir("D:/python123")

# 读取数据
con = pm.connect(host=\'localhost\',user=\'root\',password=\'102011\',database=\'test\',charset=\'utf8\')
data = pd.read_sql(\'select * from all_gzdata\',con=con)
con.close()           #关闭连接

# 取出107类型数据
index107 = [re.search(\'107\',str(i))!=None for i in data.loc[:,\'fullURLId\']]
data_107 = data.loc[index107,:]

# 在107类型中筛选出婚姻类数据
index = [re.search(\'hunyin\',str(i))!=None for i in data_107.loc[:,\'fullURL\']]
data_hunyin = data_107.loc[index,:]

# 提取所需字段(realIP、fullURL)
info = data_hunyin.loc[:,[\'realIP\',\'fullURL\']]

# 去除网址中“？”及其后面内容
da = [re.sub(\'\\?.*\',\'\',str(i)) for i in info.loc[:,\'fullURL\']]
info.loc[:,\'fullURL\'] = da     # 将info中‘fullURL’那列换成da
# 去除无html网址
index = [re.search(\'\\.html\',str(i))!=None for i in info.loc[:,\'fullURL\']]
index.count(True)   # True 或者 1 ， False 或者 0
info1 = info.loc[index,:]
print(info1.head())
realIP                                            fullURL
0   2683657840  http://www.lawtime.cn/info/hunyin/hunyinfagui/...
4   2683657840  http://www.lawtime.cn/info/hunyin/hunyinfagui/...
9   1275347569  http://www.lawtime.cn/info/hunyin/lhlawlhxy/20...
62  1531496412  http://www.lawtime.cn/info/hunyin/hunyinfagui/...
86   838215995  http://www.lawtime.cn/info/hunyin/lhlawlhxy/20...

代码11

# 找出翻页和非翻页网址
index = [re.search(\'/\\d+_\\d+\\.html\',i)!=None for i in info1.loc[:,\'fullURL\']]
index1 = [i==False for i in index]
info1_1 = info1.loc[index,:]   # 带翻页网址
info1_2 = info1.loc[index1,:]  # 无翻页网址
# 将翻页网址还原
da = [re.sub(\'_\\d+\\.html\',\'.html\',str(i)) for i in info1_1.loc[:,\'fullURL\']]
info1_1.loc[:,\'fullURL\'] = da
# 翻页与非翻页网址合并
frames = [info1_1,info1_2]
info2 = pd.concat(frames)
# 或者
info2 = pd.concat([info1_1,info1_2],axis = 0)   # 默认为0，即行合并
# 去重（realIP和fullURL两列相同）
info3 = info2.drop_duplicates()
# 将IP转换成字符型数据
info3.iloc[:,0] = [str(index) for index in info3.iloc[:,0]]
info3.iloc[:,1] = [str(index) for index in info3.iloc[:,1]]
len(info3)

代码12

# 筛选满足一定浏览次数的IP
IP_count = info3[\'realIP\'].value_counts()
# 找出IP集合
IP = list(IP_count.index)
count = list(IP_count.values)
# 统计每个IP的浏览次数，并存放进IP_count数据框中,第一列为IP，第二列为浏览次数
IP_count = pd.DataFrame(\'IP\':IP,\'count\':count)
# 3.3筛选出浏览网址在n次以上的IP集合
n = 2
index = IP_count.loc[:,\'count\']>n
IP_index = IP_count.loc[index,\'IP\']
print(IP_index.head())
0    2609113527
1    3812410744
2     225896631
3     242673847
4    1190924814
Name: IP, dtype: object

代码13

# 划分IP集合为训练集和测试集
index_tr = sample(range(0,len(IP_index)),int(len(IP_index)*0.8))  # 或者np.random.sample
index_te = [i for i in range(0,len(IP_index)) if i not in index_tr]
IP_tr = IP_index[index_tr]
IP_te = IP_index[index_te]
# 将对应数据集划分为训练集和测试集
index_tr = [i in list(IP_tr) for i in info3.loc[:,\'realIP\']]
index_te = [i in list(IP_te) for i in info3.loc[:,\'realIP\']]
data_tr = info3.loc[index_tr,:]
data_te = info3.loc[index_te,:]
print(len(data_tr))
IP_tr = data_tr.iloc[:,0]  # 训练集IP
url_tr = data_tr.iloc[:,1]  # 训练集网址
IP_tr = list(set(IP_tr))  # 去重处理
url_tr = list(set(url_tr))  # 去重处理
len(url_tr)

代码14

# 利用训练集数据构建模型
UI_matrix_tr = pd.DataFrame(0,index=IP_tr,columns=url_tr)
# 求用户－物品矩阵
for i in data_tr.index:
    UI_matrix_tr.loc[data_tr.loc[i,\'realIP\'],data_tr.loc[i,\'fullURL\']] = 1
# sum(UI_matrix_tr.sum(axis=1))

# 求物品相似度矩阵（因计算量较大，需要耗费的时间较久）
Item_matrix_tr = pd.DataFrame(0,index=url_tr,columns=url_tr)
for i in Item_matrix_tr.index:
    for j in Item_matrix_tr.index:
        a = sum(UI_matrix_tr.loc[:,[i,j]].sum(axis=1)==2)
        b = sum(UI_matrix_tr.loc[:,[i,j]].sum(axis=1)!=0)
        Item_matrix_tr.loc[i,j] = a/b

# 将物品相似度矩阵对角线处理为零
for i in Item_matrix_tr.index:
    Item_matrix_tr.loc[i,i]=0

#利用测试集数据对模型评价
IP_te = data_te.iloc[:,0]
url_te = data_te.iloc[:,1]
IP_te = list(set(IP_te))
url_te = list(set(url_te))

# 测试集数据用户物品矩阵
UI_matrix_te = pd.DataFrame(0,index=IP_te,columns=url_te)
for i in data_te.index:
    UI_matrix_te.loc[data_te.loc[i,\'realIP\'],data_te.loc[i,\'fullURL\']] = 1

# 对测试集IP进行推荐
Res = pd.DataFrame(\'NaN\',index=data_te.index,columns=[\'IP\',\'已浏览网址\',\'推荐网址\',\'T/F\'])
Res.loc[:,\'IP\']=list(data_te.iloc[:,0])
Res.loc[:,\'已浏览网址\']=list(data_te.iloc[:,1])

# 开始推荐
for i in Res.index:
    if Res.loc[i,\'已浏览网址\'] in list(Item_matrix_tr.index):
        Res.loc[i,\'推荐网址\'] = Item_matrix_tr.loc[Res.loc[i,\'已浏览网址\'],:].idxmax()
        if Res.loc[i,\'推荐网址\'] in url_te:
            Res.loc[i,\'T/F\']=UI_matrix_te.loc[Res.loc[i,\'IP\'],Res.loc[i,\'推荐网址\']]==1
        else:
            Res.loc[i,\'T/F\'] = False

# 保存推荐结果
Res.to_csv(\'D:/python123/Res.csv\',index=False,encoding=\'utf8\')

代码15

# 读取保存的推荐结果
Res = pd.read_csv(\'D:/python123/Res.csv\',keep_default_na=False, encoding=\'utf8\')

# 计算推荐准确率
Pre = round(sum(Res.loc[:,\'T/F\']==\'True\') / (len(Res.index)-sum(Res.loc[:,\'T/F\']==\'NaN\')), 3)
print(\'推荐准确率：\',Pre)

# 计算推荐召回率
Rec = round(sum(Res.loc[:,\'T/F\']==\'True\') / (sum(Res.loc[:,\'T/F\']==\'True\')+sum(Res.loc[:,\'T/F\']==\'NaN\')), 3)
print(\'推荐召回率：\',Rec)

# 计算F1指标
F1 = round(2*Pre*Rec/(Pre+Rec),3)
print(\'推荐F1指标：\',F1)

Python学习--Python 环境搭建

Python环境搭建

　　Python是跨平台的编程语言，可应用于Windows、Linux、Mac OS X。你可以通过终端窗口输入"python"命令来查看本地是否安装了Python已经安装的Python的版本。

Python下载

　　你可以到Python的官网下载你想要的版本。（Python官网：http://www.python.org/）你也可以下载Python官网文档。（Python文档下载地址：www.python.org/doc/）

Python安装

　　Python可以被安装在不同的平台上，以下为不同平台上安装Python的方法：

　　Windows平台安装Python：

　　以下为windows平台安装Python的简单步骤：

在浏览器中打开http://www.python.org/download/。
在下载列表中选择Window平台安装包，包格式为：python-XYZ.msi 文件， XYZ 为你要安装的版本号。
要使用安装程序 python-XYZ.msi, Windows系统必须支持Microsoft Installer 2.0搭配使用。只要保存安装文件到本地计算机，然后运行它，看看你的机器支持MSI。Windows XP和更高版本已经有MSI，很多老机器也可以安装MSI。
下载后，双击下载包，进入Python安装向导，安装非常简单，你只需要使用默认的设置一直点击"下一步"直到安装完成即可。

　　Unix & Linux 平台安装 Python:

　　以下为在Unix & Linux 平台上安装 Python 的简单步骤：

在浏览器中打开http://www.python.org/download/。
在下载列表中选择适合于Unix/Linux的源码压缩包。
下载及解压压缩包。
如果你需要自定义一些选项修改Modules/Setup
执行./configure
make
make install

　　操作完成后，Python会安装在 /usr/local/bin 目录中，Python库安装在/usr/local/lib/pythonXX，XX为你使用的Python的版本号。

　　MAC 平台安装 Python:

　　最近的Macs系统都自带有Python环境，你也可以在链接 http://www.python.org/download/ 上下载最新版安装。

配置环境变量

　　程序和可执行文件可以在许多目录，而这些路径很可能不在操作系统提供可执行文件的搜索路径中。path(路径)存储在环境变量中，这是由操作系统维护的一个命名的字符串。这些变量包含可用的命令行解释器和其他程序的信息。Unix或Windows中路径变量为PATH（UNIX区分大小写，Windows不区分大小写）。在Mac OS中，安装程序过程中改变了python的安装路径。如果你需要在其他目录引用Python，你必须在path中添加Python目录。

　　在Windows设置环境变量

　　通过以下方式设置：

右键点击"计算机"，然后点击"属性"。
后点击"高级系统设置"。
选择"系统变量"窗口下面的"Path",双击即可！
然后在"Path"行，添加python安装路径即可(我的C:\\Python27)，所以在后面，添加该路径即可。 ps：记住，路径直接用分号"；"隔开！
最后设置成功以后，在cmd命令行，输入命令"python"，就可以有相关显示。

技术分享

运行Python

　　有三种方式可以运行Python：

　　1、交互式解释器

　　你可以通过命令行窗口进入python并且在交互式解释器中开始编写python代码。

E:>python

　　2、命令行脚本

E:\\Python>python hello.py

3、集成开发环境（IDE：Integrated Development Environment）

　　您可以使用图形用户界面（GUI）环境来编写及运行Python代码。以下推荐各个平台上使用的IDE：

Unix: IDLE 是 UNIX 上最早的 Python IDE 。
Windows: PyCharm是一个强大的Python集成开发环境。
Macintosh: Python 的 Mac 可以使用 IDLE IDE，你可以在网站上下载对应MAC的IDLE 。

以上是关于python6的主要内容，如果未能解决你的问题，请参考以下文章