如何从 for 循环返回多个具有唯一名称的 pandas 数据帧?

Posted

技术标签:

【中文标题】如何从 for 循环返回多个具有唯一名称的 pandas 数据帧?【英文标题】:How do I return multiple pandas dataframes with unique names from a for loop? 【发布时间】:2020-08-06 21:32:12 【问题描述】:

我正在尝试使用 for 循环来创建多个数据帧。如果我将 print() 输出放在 for 循环中,我会看到我想要的输出。但是,当我将其更改为 return 输出以从函数外部访问数据帧时,我只能获得第一个数据帧。我对正在发生的事情感到困惑,非常感谢您的帮助!

代码:

# Global libraries
from datetime import datetime
import pandas as pd

# Local libraries
from instruments import metadata    

# Get the data
def data():

    # Get list of instruments
    nifty = metadata()

    # Specify the date range for the data pull
    from_date = datetime.today().strftime('%Y-%m-%d 09:00:00')
    to_date = datetime.today().strftime('%Y-%m-%d 15:30:00')

    # Interval is the candle interval (minute, day, 5 minute etc.).
    interval = 'minute'

    # Iterate through the metadata
    for stock in nifty:

        # Set the instrument token for the data pull
        instrument_token = stock['instrument_token']

        # Call the api (I've removed login information for kite from this script) 
        hist =  kite.historical_data(instrument_token,
                                    from_date,
                                    to_date,
                                    interval,
                                    continuous=False,
                                    oi=False)


        # Put the data in a pandas dataframe as a timeseries
        data = pd.DataFrame(hist).set_index('date')


        # Format data
        data.columns = map(str.capitalize, data.columns)
        data['Expiry'] = stock['expiry']
        data = data[['Open','High','Low','Close','Volume','Expiry']]
        data.index.names = ['Datetime']

        # return data as a print statement - this works and I see the correct output
        print(data)

        # return data - this does not work as intended. If I call the function, I only get the first dataframe 
        return data

变量nifty的样本数据:

['tick_size': 0.05, 'expiry': datetime.date(2020, 4, 30), 'exchange_token': '56059', 'instrument_type': 'FUT', 'segment': 'NFO-FUT', 'strike': 0.0, 'last_pri
ce': 0.0, 'name': 'NIFTY', 'lot_size': 75, 'tradingsymbol': 'NIFTY20APRFUT', 'exchange': 'NFO', 'instrument_token': 14351106, 'tick_size': 0.05, 'expiry': d
atetime.date(2020, 6, 25), 'exchange_token': '95734', 'instrument_type': 'FUT', 'segment': 'NFO-FUT', 'strike': 0.0, 'last_price': 0.0, 'name': 'NIFTY', 'lot_
size': 75, 'tradingsymbol': 'NIFTY20JUNFUT', 'exchange': 'NFO', 'instrument_token': 24507906]

变量hist中的样本数据:

['volume': 220650, 'low': 9158.45, 'close': 9173.7, 'date': datetime.datetime(2020, 4, 23, 9, 15, tzinfo=tzoffset(None, 19800)), 'high': 9200, 'open': 9200,
 'volume': 92475, 'low': 9173, 'close': 9176.75, 'date': datetime.datetime(2020, 4, 23, 9, 16, tzinfo=tzoffset(None, 19800)), 'high': 9180, 'open': 9173]

我一直在阅读有关为数据框赋予动态名称以使其独一无二的信息,但我不知道如何使其工作。

提前感谢您的帮助!

【问题讨论】:

将你的数据框附加到一个空列表或 for 循环之外的 dict 并返回它而不是数据......你的回报也在你的 for 循环内;它应该在外面。 但这不会给我一个包含所有数据的列表吗?我正在尝试获取多个数据框 - 每只股票一个(希望具有唯一名称) 那是你使用字典的时候。附加到一个空字典并将键分配为股票名称,将值分配为数据框 我没有尝试过,我会阅读它。谢谢!你有没有机会向我展示它是如何工作的样本?假设密钥是stock['tradingsymbol'] 【参考方案1】:

这是一个简单的示例:

# sample data
df = pd.DataFrame(np.random.rand(10,3), columns=list('abc'))

def data(df):
    # empty dict
    d = 
    # iterate
    for col in df.columns:
        # assign key and value
        d[col] = df[col]
    # return d
    return d
new_dict = data(df)
# call each individual dataframe from the dict based on the key you assigned
# print(new_dict['a'])
# print(new_dict['b']

# bonus dict comprehension
# new_dict1 = k:v for k,v in df.items()

这是您的功能的外观(未经测试):

def data():

    # Get list of instruments
    nifty = metadata()

    # Specify the date range for the data pull
    from_date = datetime.today().strftime('%Y-%m-%d 09:00:00')
    to_date = datetime.today().strftime('%Y-%m-%d 15:30:00')

    # Interval is the candle interval (minute, day, 5 minute etc.).
    interval = 'minute'

    d =  # empty dict <----------------

    # Iterate through the metadata
    for stock in nifty:

        # Set the instrument token for the data pull
        instrument_token = stock['instrument_token']

        # Call the api (I've removed login information for kite from this script) 
        hist =  kite.historical_data(instrument_token,
                                    from_date,
                                    to_date,
                                    interval,
                                    continuous=False,
                                    oi=False)


        # Put the data in a pandas dataframe as a timeseries
        data = pd.DataFrame(hist).set_index('date')


        # Format data
        data.columns = map(str.capitalize, data.columns)
        data['Expiry'] = stock['expiry']
        data = data[['Open','High','Low','Close','Volume','Expiry']]
        data.index.names = ['Datetime']

        d[stock['tradingsymbol']] = data # assign key and vale to empty dict <--------------------

        # return data as a print statement - this works and I see the correct output
        print(data)

        # return data - this does not work as intended. If I call the function, I only get the first dataframe 
    return d # retrun outside of for loop <-----------

【讨论】:

你太棒了!谢谢你的帮助:)

以上是关于如何从 for 循环返回多个具有唯一名称的 pandas 数据帧?的主要内容,如果未能解决你的问题,请参考以下文章

如何创建具有从 csv 文件中的列表收集的唯一名称值的类的多个对象

遍历具有通用名称的变量

Bash for 具有多个条件的循环

如何使用for循环或条件在pandas数据框的子集中创建多个回归模型(statsmodel)?

如何根据非常大的df中的名称有效地将唯一ID分配给具有多个条目的个人

如何在循环中创建多个具有不同名称的 tkinter 小部件?