如何使用 python 列表理解/字典将每一列打印为唯一变量
Posted
技术标签:
【中文标题】如何使用 python 列表理解/字典将每一列打印为唯一变量【英文标题】:how to use python list comprehension/ dictionary to print each column as an unique variable 【发布时间】:2017-07-01 03:31:39 【问题描述】:假设我们有一个 csv
PROPERTY_ID,CLIENT_ID,FROM_YEAR
1,5,2015
2,6,2015
3,9,2015
4,9,2015
我正在尝试将 CLIENT_ID、PROPERTY_ID、FROM_YEAR 的每个唯一组合传递到字典或列表中,这样我就可以将每个“PROPERTY_ID、CLIENT_ID、FROM_YEAR”对放入 mysql 查询中:
SELECT * FROM client_5 WHERE PROPERTY_ID = 1 and FROM_YEAR = 2015;
SELECT * FROM client_6 WHERE PROPERTY_ID = 2 and FROM_YEAR = 2015;
SELECT * FROM client_9 WHERE PROPERTY_ID = 3 and FROM_YEAR = 2015;
SELECT * FROM client_9 WHERE PROPERTY_ID = 4 and FROM_YEAR = 2015;
从变量的角度来看:
1st round:
$CLIENT_ID,$PROPERTY_ID,$FROM_YEAR=5,1,2015
2nd round
$CLIENT_ID,$PROPERTY_ID,$FROM_YEAR=6,2,2015
3rd round
$CLIENT_ID,$PROPERTY_ID,$FROM_YEAR=9,3,2015
4th round
$CLIENT_ID,$PROPERTY_ID,$FROM_YEAR=9,4,2015
我尝试过使用列表推导:
df = pd.read_csv("test.csv")
df2=df.apply(tuple, 1).unique().tolist()
for CLIENT_ID in [x[0] for x in df2]:
CLIENT_ID=CLIENT_ID.astype('str')
print "SELECT * FROM client"+CLIENT_ID
for PROPERTY_CODE in [y[1] for y in df2]:
PROPERTY_CODE=PROPERTY_CODE.astype('str')
print "WHERE PROPERTY_ID = "+PROPERTY_CODE
它返回以下内容,这不是我们正在寻找的:
SELECT * FROM client_5
WHERE FK_PROPERTY_ID = 1
WHERE FK_PROPERTY_ID = 2
WHERE FK_PROPERTY_ID = 3
WHERE FK_PROPERTY_ID = 4
有人能解惑吗?谢谢。
【问题讨论】:
你为什么使用pandas
?只解析csv?
只需遍历数据框,构建您的查询并add
将它们发送到预构建的set
。完成创建查询后,您将执行它们。 set
消除了重复。
【参考方案1】:
我会使用format
fstr = '$CLIENT_ID,$PROPERTY_ID,$FROM_YEAR=CLIENT_ID,PROPERTY_ID,FROM_YEAR'
df.drop_duplicates().apply(lambda x: fstr.format(**x), 1)
0 $CLIENT_ID,$PROPERTY_ID,$FROM_YEAR=5,1,2015
1 $CLIENT_ID,$PROPERTY_ID,$FROM_YEAR=6,2,2015
2 $CLIENT_ID,$PROPERTY_ID,$FROM_YEAR=9,3,2015
3 $CLIENT_ID,$PROPERTY_ID,$FROM_YEAR=9,4,2015
dtype: object
【讨论】:
【参考方案2】:我认为您可以将apply
与set
和list
一起使用:
L = list(set(df.apply(lambda x: 'SELECT * FROM client_ WHERE PROPERTY_ID = and FROM_YEAR = ;'.format(x['CLIENT_ID'], x['PROPERTY_ID'], x['FROM_YEAR']),1)))
print (L)
['SELECT * FROM client_5 WHERE PROPERTY_ID = 1 and FROM_YEAR = 2015;',
'SELECT * FROM client_9 WHERE PROPERTY_ID = 3 and FROM_YEAR = 2015;',
'SELECT * FROM client_9 WHERE PROPERTY_ID = 4 and FROM_YEAR = 2015;',
'SELECT * FROM client_6 WHERE PROPERTY_ID = 2 and FROM_YEAR = 2015;']
【讨论】:
【参考方案3】:这对你有用:-
import csv
with open('fileName.csv') as f:
reader = csv.reader(f)
next(reader, None)
for row in reader:
#print row
print """SELECT * FROM client_%s WHERE PROPERTY_ID = %s and FROM_YEAR = %s;"""%(row[1],row[0],row[2])
【讨论】:
在使用for循环之前最好手动捕获开头head = next(row)
的标题,而不是在循环中添加一个减慢每次迭代的钩子。无论如何,我认为使用csv
模块是最好的选择。
csv 模块在这种情况下更容易。我想在这种情况下我把头放在熊猫身上太多了,它变成了一个兔子洞哈哈。答案其实很简单>.
【参考方案4】:
使用.format
方法很容易实现:
import pandas as pd
df = pd.read_csv('test.csv')
rows = df.apply(tuple, 1).unique().tolist()
for (prop_id, client_id, year) in rows:
print("SELECT * FROM client_client_id WHERE property_id = prop_id AND from_year = year".format(
prop_id=prop_id,
client_id=client_id,
year=year
))
在 Python 3.6 中,您可以使用字符串插值:
for (prop_id, client_id, year) in rows:
print(f"SELECT * FROM client_client_id WHERE property_id = prop_id AND from_year = year")
【讨论】:
以上是关于如何使用 python 列表理解/字典将每一列打印为唯一变量的主要内容,如果未能解决你的问题,请参考以下文章
使用 PySpark 将每一行的每一列作为单独的文件写入 S3