删除 for 循环 - 使用字典而不是 pandas
Posted
技术标签:
【中文标题】删除 for 循环 - 使用字典而不是 pandas【英文标题】:Removing for loop - using dictionary instead of pandas 【发布时间】:2022-01-05 12:15:00 【问题描述】:我有 2 个列表:
customer_ids 建议(列表列表,每个列表有 6000 个 shop_id)recommendations
中的每个列表都代表customer_ids
中的客户推荐的商店。
我必须仅根据客户所在城市的商店过滤掉 20 个 shop_id。
期望的输出:
recommendations-(列表列表,每个列表有 20 个 shop_id)customer_ids = ['1','2','3',...]
recommendations = [['110','589','865'...], ['422','378','224'...],['198','974','546'...]]
过滤器:商店所在城市 == 客户所在城市。
要为客户和商店提取城市,我有 2 个 sql 查询:
df_cust_city = pd.read_sql_query("SELECT id, city_id FROM customer_table")
df_shop_city = pd.read_sql_query("SELECT shop_id, city FROM shop_table")
使用列表的代码
filtered_list = []
for cust_id, shop_id in zip(customer_ids, recommendations):
cust_city = df_cust_city.loc[df_cust_city['id'] == cust_id, 'city_id'].iloc[0] #get customer city
df_city_filter = (df_shop_city.where(df_shop_city['city'] == cust_city)).dropna() #get all shops in customer city
df_city_filter = df_city_filter.astype(int)
filter_shop = df_city_filter['shop_id'].astype(str).values.tolist() #make a list of shop_ids in customer city
filtered = [x for x in shop_id if x in filter_rest] #filter recommended shop_ids based on city-filtered list
shop_filtered = list(islice(filtered, 20))
filtered_list.append(shop_filtered) #create recommendation list of lists with only 20 filtered shop_ids
使用熊猫的代码
filtered_list = []
for cust_id, shop_id in zip(customer_ids, recommendations):
cust_city = df_cust_city.loc[df_cust_city['id'] == cust_id, 'city_id'].iloc[0] #get customer city
df_city_filter = (df_shop_city.where(df_shop_city['city'] == cust_city)).dropna()
recommended_shop = pd.DataFrame(shop_id, columns=['id'])
recommended_shop['id'] = recommended_shop['id'].astype(int)
shop_city_filter = pd.DataFrame(df_city_filter['id'].astype(int))
shops_common = recommended_shop.merge(shop_id, how='inner', on='id')
shops_common.drop_duplicates(subset="id", keep=False, inplace=True)
filtered = shops_common.head(20)
shop_filtered = filtered['id'].values.tolist()
filtered_list.append(shop_filtered)
完成 for 循环运行所需的时间:
使用列表:~8000 秒
使用熊猫:~3000 秒
我必须运行 for 循环 22 次。
有没有办法完全摆脱 for 循环?关于如何实现这一点的任何提示/指针,以便同时为 50000 名客户花费更少的时间。我正在用字典试一试。
df_cust_city:
id city_id
00919245 1
02220205 2
02221669 2
02223750 2
02304202 2
df_shop_city:
shop_id city
28 1
29 1
30 1
31 1
32 1
【问题讨论】:
【参考方案1】:这不会摆脱 for 循环,但是您先按城市对客户进行分组怎么样?
这样,导致filter_shop
的操作只需执行N_cities
次,而不是N_customers
。此外,filtered
变量 might be significantly 的计算速度更快。
【讨论】:
使用集合时有 0.002 秒的差异,但使用集合时不保持顺序。以上是关于删除 for 循环 - 使用字典而不是 pandas的主要内容,如果未能解决你的问题,请参考以下文章
迭代 4 个 pandas 数据框列并将它们存储到 4 个列表中,其中一个 for 循环而不是 4 个 for 循环
使用 Django QuerySets 时使用列表推导而不是 for 循环
Pandas:自定义 WMAPE 函数聚合函数到多列而没有 for 循环?