.diff() 函数仅在 pandas 数据框中返回 NaN 值

Posted

技术标签:

【中文标题】.diff() 函数仅在 pandas 数据框中返回 NaN 值【英文标题】:.diff() function is only returning NaN values in pandas data frame 【发布时间】:2021-08-01 07:50:22 【问题描述】:

我想在for 循环中的 log_price 列上使用 .diff() 函数。我所追求的是旧的原木价格值——来自 df_DC_product 数据框的新原木价格值。当我尝试在 for 循环中使用 .diff() 时,它只返回 NaN 值。任何想法为什么会发生这种情况?感谢您的帮助。

DC_list = data4['Geography'].drop_duplicates().tolist()
Product_List = data4['Product'].drop_duplicates().tolist()

# create multiple empty lists to store values in:
my_dict = 
    "Product" : [],
    "Geography" : [],
    "Base Dollar Sales": [],
    "Base Unit Sales" :[],
    "Price Numerator" : [],
    "Price Denominator": [],
    "Demand Numerator" : [],
    "Demand Denominator" : [],
    "% Change in Price" : [],
    "% Change in Demand": [],
    "Price Elasticity of Demand" : []

dc_product_ped_with_metrics_all = []

for DC in DC_list:
    
    df_DC = data4.copy()
    # # Filtering to the loop's current DC
    df_DC = df_DC.loc[(df_DC['Geography'] == DC)]
    df_DC = df_DC.copy()
    # Making a list of all of the current DC's Product to loop through
    Product_list = df_DC['Product'].drop_duplicates().tolist()
    
    for Product in Product_list:
        
        df_DC_product = df_DC.copy()
        # # Filtering to the Product
        df_DC_product = df_DC_product.loc[(df_DC_product['Product'] == Product)]
        df_DC_product = df_DC_product.copy()
        
        # create container:
        df_DC_product['pn'] = df_DC_product.iloc[:,5].diff()
        df_DC_product['price_d'] = np.divide(df_DC_product.iloc[:,5].cumsum(),2)
        df_DC_product['dn'] = df_DC_product.iloc[:,6].diff()
        df_DC_product['dd'] = np.divide(df_DC_product.iloc[:,6].cumsum(),2)
        df_DC_product['% Change in Demand'] = np.divide(df_DC_product['dn'],df_DC_product['dd'])*100
        df_DC_product['% Change in Price'] = np.divide(df_DC_product['pn'],df_DC_product['price_d'])*100
        df_DC_product['ped']= np.divide(df_DC_product['% Change in Demand'], df_DC_product['% Change in Price'])
        
        Product = Product,
        DC = DC
        sales = df_DC_product['Base_Dollar_Sales'].sum()
        qty = df_DC_product['Base_Unit_Sales'].sum()
        price = df_DC_product['Price'].mean()
        log_price = df_DC_product['log_price'].mean()
        log_units = df_DC_product['log_units'].sum()
        price_numerator = df_DC_product['pn'].mean()
        price_denominator = df_DC_product['price_d'].sum()
        demand_numerator = df_DC_product['dn'].mean()
        demand_denominator = df_DC_product['dd'].sum()
        delta_demand = df_DC_product['% Change in Demand'].sum()
        delta_price = df_DC_product['% Change in Price'].mean()
        ped = df_DC_product['ped'].mean()
        
        dc_product_ped_with_metrics = [
            Product,
            DC,
            sales,
            qty,
            price,
            price_numerator,
            price_denominator,
            demand_numerator,
            demand_denominator,
            delta_demand,
            delta_price,
            ped
        ]
        
        dc_product_ped_with_metrics_all.append(dc_product_ped_with_metrics)
        
columns = [
    'Product',
    'Geography',
    'Sales',
    'Qty',
    'Price',
    'Price Numerator',
    'Price Denominator',
    'Demand Numerator',
    'Demand Denominator',
    '% Change in Demand',
    '% Change in Price',
    'Price Elasticity of Demand'
]

dc_product_ped_with_metrics_all = pd.DataFrame(data=dc_product_ped_with_metrics_all, columns=columns)
dc_product_ped_with_metrics_all

【问题讨论】:

您能否编辑问题以专注于您的问题? @CeliusStingher 让我知道这是否更好,谢谢。 【参考方案1】:

.append() 不会就地更新您的数据框。您需要重新分配数据框。

for DC in DC_list:
    # your code
    for Product in Product_list:
        # your code
        dc_product_ped_with_metrics_all = dc_product_ped_with_metrics_all.append(dc_product_ped_with_metrics)

【讨论】:

以上是关于.diff() 函数仅在 pandas 数据框中返回 NaN 值的主要内容,如果未能解决你的问题,请参考以下文章

Pandas 数据框中的经过时间

diff()函数

对 Pandas 数据框中的每一行只运行一次函数

使用 Pandas 数据框中的值注释热图

在 CSV 数据框中查找列时 Streamlit Panda 查询函数语法错误

在 python 中创建一个函数,它将在 pandas 数据框中估算均值或中值