如何从 for 循环返回多个具有唯一名称的 pandas 数据帧?
Posted
技术标签:
【中文标题】如何从 for 循环返回多个具有唯一名称的 pandas 数据帧?【英文标题】:How do I return multiple pandas dataframes with unique names from a for loop? 【发布时间】:2020-08-06 21:32:12 【问题描述】:我正在尝试使用 for 循环来创建多个数据帧。如果我将 print()
输出放在 for 循环中,我会看到我想要的输出。但是,当我将其更改为 return
输出以从函数外部访问数据帧时,我只能获得第一个数据帧。我对正在发生的事情感到困惑,非常感谢您的帮助!
代码:
# Global libraries
from datetime import datetime
import pandas as pd
# Local libraries
from instruments import metadata
# Get the data
def data():
# Get list of instruments
nifty = metadata()
# Specify the date range for the data pull
from_date = datetime.today().strftime('%Y-%m-%d 09:00:00')
to_date = datetime.today().strftime('%Y-%m-%d 15:30:00')
# Interval is the candle interval (minute, day, 5 minute etc.).
interval = 'minute'
# Iterate through the metadata
for stock in nifty:
# Set the instrument token for the data pull
instrument_token = stock['instrument_token']
# Call the api (I've removed login information for kite from this script)
hist = kite.historical_data(instrument_token,
from_date,
to_date,
interval,
continuous=False,
oi=False)
# Put the data in a pandas dataframe as a timeseries
data = pd.DataFrame(hist).set_index('date')
# Format data
data.columns = map(str.capitalize, data.columns)
data['Expiry'] = stock['expiry']
data = data[['Open','High','Low','Close','Volume','Expiry']]
data.index.names = ['Datetime']
# return data as a print statement - this works and I see the correct output
print(data)
# return data - this does not work as intended. If I call the function, I only get the first dataframe
return data
变量nifty
的样本数据:
['tick_size': 0.05, 'expiry': datetime.date(2020, 4, 30), 'exchange_token': '56059', 'instrument_type': 'FUT', 'segment': 'NFO-FUT', 'strike': 0.0, 'last_pri
ce': 0.0, 'name': 'NIFTY', 'lot_size': 75, 'tradingsymbol': 'NIFTY20APRFUT', 'exchange': 'NFO', 'instrument_token': 14351106, 'tick_size': 0.05, 'expiry': d
atetime.date(2020, 6, 25), 'exchange_token': '95734', 'instrument_type': 'FUT', 'segment': 'NFO-FUT', 'strike': 0.0, 'last_price': 0.0, 'name': 'NIFTY', 'lot_
size': 75, 'tradingsymbol': 'NIFTY20JUNFUT', 'exchange': 'NFO', 'instrument_token': 24507906]
变量hist
中的样本数据:
['volume': 220650, 'low': 9158.45, 'close': 9173.7, 'date': datetime.datetime(2020, 4, 23, 9, 15, tzinfo=tzoffset(None, 19800)), 'high': 9200, 'open': 9200,
'volume': 92475, 'low': 9173, 'close': 9176.75, 'date': datetime.datetime(2020, 4, 23, 9, 16, tzinfo=tzoffset(None, 19800)), 'high': 9180, 'open': 9173]
我一直在阅读有关为数据框赋予动态名称以使其独一无二的信息,但我不知道如何使其工作。
提前感谢您的帮助!
【问题讨论】:
将你的数据框附加到一个空列表或 for 循环之外的 dict 并返回它而不是数据......你的回报也在你的 for 循环内;它应该在外面。 但这不会给我一个包含所有数据的列表吗?我正在尝试获取多个数据框 - 每只股票一个(希望具有唯一名称) 那是你使用字典的时候。附加到一个空字典并将键分配为股票名称,将值分配为数据框 我没有尝试过,我会阅读它。谢谢!你有没有机会向我展示它是如何工作的样本?假设密钥是stock['tradingsymbol']
【参考方案1】:
这是一个简单的示例:
# sample data
df = pd.DataFrame(np.random.rand(10,3), columns=list('abc'))
def data(df):
# empty dict
d =
# iterate
for col in df.columns:
# assign key and value
d[col] = df[col]
# return d
return d
new_dict = data(df)
# call each individual dataframe from the dict based on the key you assigned
# print(new_dict['a'])
# print(new_dict['b']
# bonus dict comprehension
# new_dict1 = k:v for k,v in df.items()
这是您的功能的外观(未经测试):
def data():
# Get list of instruments
nifty = metadata()
# Specify the date range for the data pull
from_date = datetime.today().strftime('%Y-%m-%d 09:00:00')
to_date = datetime.today().strftime('%Y-%m-%d 15:30:00')
# Interval is the candle interval (minute, day, 5 minute etc.).
interval = 'minute'
d = # empty dict <----------------
# Iterate through the metadata
for stock in nifty:
# Set the instrument token for the data pull
instrument_token = stock['instrument_token']
# Call the api (I've removed login information for kite from this script)
hist = kite.historical_data(instrument_token,
from_date,
to_date,
interval,
continuous=False,
oi=False)
# Put the data in a pandas dataframe as a timeseries
data = pd.DataFrame(hist).set_index('date')
# Format data
data.columns = map(str.capitalize, data.columns)
data['Expiry'] = stock['expiry']
data = data[['Open','High','Low','Close','Volume','Expiry']]
data.index.names = ['Datetime']
d[stock['tradingsymbol']] = data # assign key and vale to empty dict <--------------------
# return data as a print statement - this works and I see the correct output
print(data)
# return data - this does not work as intended. If I call the function, I only get the first dataframe
return d # retrun outside of for loop <-----------
【讨论】:
你太棒了!谢谢你的帮助:)以上是关于如何从 for 循环返回多个具有唯一名称的 pandas 数据帧?的主要内容,如果未能解决你的问题,请参考以下文章
如何创建具有从 csv 文件中的列表收集的唯一名称值的类的多个对象
如何使用for循环或条件在pandas数据框的子集中创建多个回归模型(statsmodel)?