如何使用 python 列表理解/字典将每一列打印为唯一变量

Posted

技术标签:

【中文标题】如何使用 python 列表理解/字典将每一列打印为唯一变量【英文标题】:how to use python list comprehension/ dictionary to print each column as an unique variable 【发布时间】:2017-07-01 03:31:39 【问题描述】:

假设我们有一个 csv

PROPERTY_ID,CLIENT_ID,FROM_YEAR
1,5,2015
2,6,2015
3,9,2015
4,9,2015

我正在尝试将 CLIENT_ID、PROPERTY_ID、FROM_YEAR 的每个唯一组合传递到字典或列表中,这样我就可以将每个“PROPERTY_ID、CLIENT_ID、FROM_YEAR”对放入 mysql 查询中:

SELECT * FROM client_5 WHERE PROPERTY_ID = 1 and FROM_YEAR = 2015;

SELECT * FROM client_6 WHERE PROPERTY_ID = 2 and FROM_YEAR = 2015;

SELECT * FROM client_9 WHERE PROPERTY_ID = 3 and FROM_YEAR = 2015;

SELECT * FROM client_9 WHERE PROPERTY_ID = 4 and FROM_YEAR = 2015;

从变量的角度来看:

1st round:
$CLIENT_ID,$PROPERTY_ID,$FROM_YEAR=5,1,2015

2nd round
$CLIENT_ID,$PROPERTY_ID,$FROM_YEAR=6,2,2015

3rd round
$CLIENT_ID,$PROPERTY_ID,$FROM_YEAR=9,3,2015

4th round
$CLIENT_ID,$PROPERTY_ID,$FROM_YEAR=9,4,2015

我尝试过使用列表推导:

df = pd.read_csv("test.csv")

df2=df.apply(tuple, 1).unique().tolist()

for CLIENT_ID in [x[0] for x in df2]:

    CLIENT_ID=CLIENT_ID.astype('str')

    print "SELECT * FROM client"+CLIENT_ID

    for PROPERTY_CODE in [y[1] for y in df2]:

        PROPERTY_CODE=PROPERTY_CODE.astype('str')

        print "WHERE PROPERTY_ID = "+PROPERTY_CODE

它返回以下内容,这不是我们正在寻找的:

SELECT * FROM client_5
WHERE FK_PROPERTY_ID = 1
WHERE FK_PROPERTY_ID = 2
WHERE FK_PROPERTY_ID = 3
WHERE FK_PROPERTY_ID = 4

有人能解惑吗?谢谢。

【问题讨论】:

你为什么使用pandas?只解析csv? 只需遍历数据框,构建您的查询并add 将它们发送到预构建的set。完成创建查询后,您将执行它们。 set 消除了重复。 【参考方案1】:

我会使用format

fstr = '$CLIENT_ID,$PROPERTY_ID,$FROM_YEAR=CLIENT_ID,PROPERTY_ID,FROM_YEAR'
df.drop_duplicates().apply(lambda x: fstr.format(**x), 1)

0    $CLIENT_ID,$PROPERTY_ID,$FROM_YEAR=5,1,2015
1    $CLIENT_ID,$PROPERTY_ID,$FROM_YEAR=6,2,2015
2    $CLIENT_ID,$PROPERTY_ID,$FROM_YEAR=9,3,2015
3    $CLIENT_ID,$PROPERTY_ID,$FROM_YEAR=9,4,2015
dtype: object

【讨论】:

【参考方案2】:

我认为您可以将applysetlist 一起使用:

L = list(set(df.apply(lambda x: 'SELECT * FROM client_ WHERE PROPERTY_ID =  and FROM_YEAR = ;'.format(x['CLIENT_ID'], x['PROPERTY_ID'], x['FROM_YEAR']),1)))

print (L)
['SELECT * FROM client_5 WHERE PROPERTY_ID = 1 and FROM_YEAR = 2015;', 
 'SELECT * FROM client_9 WHERE PROPERTY_ID = 3 and FROM_YEAR = 2015;',
 'SELECT * FROM client_9 WHERE PROPERTY_ID = 4 and FROM_YEAR = 2015;', 
 'SELECT * FROM client_6 WHERE PROPERTY_ID = 2 and FROM_YEAR = 2015;']

【讨论】:

【参考方案3】:

这对你有用:-

import csv 

with open('fileName.csv') as f:
    reader = csv.reader(f)
    next(reader, None)
    for row in reader:

        #print row
        print """SELECT * FROM client_%s WHERE PROPERTY_ID = %s and FROM_YEAR = %s;"""%(row[1],row[0],row[2])

【讨论】:

在使用for循环之前最好手动捕获开头head = next(row)的标题,而不是在循环中添加一个减慢每次迭代的钩子。无论如何,我认为使用csv 模块是最好的选择。 csv 模块在这种情况下更容易。我想在这种情况下我把头放在熊猫身上太多了,它变成了一个兔子洞哈哈。答案其实很简单>. 【参考方案4】:

使用.format 方法很容易实现:

import pandas as pd

df = pd.read_csv('test.csv')
rows = df.apply(tuple, 1).unique().tolist()

for (prop_id, client_id, year) in rows:
    print("SELECT * FROM client_client_id WHERE property_id = prop_id AND from_year = year".format(
        prop_id=prop_id,
        client_id=client_id,
        year=year
    ))

在 Python 3.6 中,您可以使用字符串插值:

for (prop_id, client_id, year) in rows:
    print(f"SELECT * FROM client_client_id WHERE property_id = prop_id AND from_year = year")

【讨论】:

以上是关于如何使用 python 列表理解/字典将每一列打印为唯一变量的主要内容,如果未能解决你的问题,请参考以下文章

使用 PySpark 将每一行的每一列作为单独的文件写入 S3

python中,如何将列表中的一列数据和字典的key比较?

如何获得二维数组中每一列和每一行的总和?

python中如何取一列数最大值

如何使用 python 或 pandas 根据包含字典列表的列过滤 DataFrame?

将python字典写入CSV列:第一列的键,第二列的值