如何填充开始日期为每月第一天的缺失值?

Posted

技术标签:

【中文标题】如何填充开始日期为每月第一天的缺失值?【英文标题】:how to fill the missing values where start date has been first day of month? 【发布时间】:2021-11-03 10:00:02 【问题描述】:

我有这样的数据框:

tst=
Date    % on Merchant   % on Customer   Merchants   Location    
2021-08-04  0.0 0.10    Zwarma - The Shawarma Maker Palani  
2021-08-05  0.0 0.10    Zwarma - The Shawarma Maker Palani  
2021-08-06  0.0 0.10    Zwarma - The Shawarma Maker Palani  
2021-08-01  0.0 0.12    Zwarma - The Shawarma Maker Pollachi    
2021-08-02  0.0 0.12    Zwarma - The Shawarma Maker Pollachi    
2021-08-03  0.0 0.12    Zwarma - The Shawarma Maker Pollachi    
2021-08-04  0.0 0.12    Zwarma - The Shawarma Maker Pollachi    
2021-08-05  0.0 0.12    Zwarma - The Shawarma Maker Pollachi    
2021-08-06  0.0 0.12    Zwarma - The Shawarma Maker Pollachi    

uni_ind= ['% on Merchant','% on Customer','Merchants','Location']

我正在寻找输出:

Date    % on Merchant   % on Customer   Merchants   Location    
2021-08-01  0.0 0.10    Zwarma - The Shawarma Maker Palani  
2021-08-02  0.0 0.10    Zwarma - The Shawarma Maker Palani  
2021-08-03  0.0 0.10    Zwarma - The Shawarma Maker Palani  
2021-08-04  0.0 0.10    Zwarma - The Shawarma Maker Palani  
2021-08-05  0.0 0.10    Zwarma - The Shawarma Maker Palani  
2021-08-06  0.0 0.10    Zwarma - The Shawarma Maker Palani  
2021-08-01  0.0 0.12    Zwarma - The Shawarma Maker Pollachi    
2021-08-02  0.0 0.12    Zwarma - The Shawarma Maker Pollachi    
2021-08-03  0.0 0.12    Zwarma - The Shawarma Maker Pollachi    
2021-08-04  0.0 0.12    Zwarma - The Shawarma Maker Pollachi    
2021-08-05  0.0 0.12    Zwarma - The Shawarma Maker Pollachi    
2021-08-06  0.0 0.12    Zwarma - The Shawarma Maker Pollachi    

tst.groupby(uni_ind).resample('D').bfill()..reset_index(level=(0,1,2,3),drop= True).reset_index()

【问题讨论】:

【参考方案1】: 为缺少的商家创建月份日期范围 外部连接到原始数据框和fillna(method="bfill")
import pandas as pd
import io

df = pd.read_csv(io.StringIO("""Date    % on Merchant   % on Customer   Merchants   Location    
2021-08-04  0.0  0.10    Zwarma - The Shawarma Maker  Palani  
2021-08-05  0.0  0.10    Zwarma - The Shawarma Maker  Palani  
2021-08-06  0.0  0.10    Zwarma - The Shawarma Maker  Palani  
2021-08-01  0.0  0.12    Zwarma - The Shawarma Maker  Pollachi    
2021-08-02  0.0  0.12    Zwarma - The Shawarma Maker  Pollachi    
2021-08-03  0.0  0.12    Zwarma - The Shawarma Maker  Pollachi    
2021-08-04  0.0  0.12    Zwarma - The Shawarma Maker  Pollachi    
2021-08-05  0.0  0.12    Zwarma - The Shawarma Maker  Pollachi    
2021-08-06  0.0  0.12    Zwarma - The Shawarma Maker  Pollachi """), sep="\s\s+", engine="python")
df["Date"] = pd.to_datetime(df["Date"])

df = (
    df.merge(
        df.groupby(
            [df["Date"].dt.year, df["Date"].dt.month, "Merchants", "Location"], as_index=False
        )
        .agg("Date": "min")
        .loc[lambda d: d["Date"].dt.day.gt(1)]
        .apply(
            lambda r: pd.Series(
                
                    "Date": list(
                        pd.date_range(
                            r["Date"] - pd.offsets.MonthBegin(1),
                            r["Date"] - pd.Timedelta(days=1),
                        )
                    ),
                    "Merchants": r["Merchants"],
                    "Location": r["Location"]
                
            ),
            axis=1,
        )
        .explode("Date"),
        on=["Date", "Merchants", "Location"],
        how="outer",
    )
    .sort_values(["Merchants", "Location", "Date"])
    .fillna(method="bfill")
)

df

Date % on Merchant % on Customer Merchants Location
9 2021-08-01 00:00:00 0 0.1 Zwarma - The Shawarma Maker Palani
10 2021-08-02 00:00:00 0 0.1 Zwarma - The Shawarma Maker Palani
11 2021-08-03 00:00:00 0 0.1 Zwarma - The Shawarma Maker Palani
0 2021-08-04 00:00:00 0 0.1 Zwarma - The Shawarma Maker Palani
1 2021-08-05 00:00:00 0 0.1 Zwarma - The Shawarma Maker Palani
2 2021-08-06 00:00:00 0 0.1 Zwarma - The Shawarma Maker Palani
3 2021-08-01 00:00:00 0 0.12 Zwarma - The Shawarma Maker Pollachi
4 2021-08-02 00:00:00 0 0.12 Zwarma - The Shawarma Maker Pollachi
5 2021-08-03 00:00:00 0 0.12 Zwarma - The Shawarma Maker Pollachi
6 2021-08-04 00:00:00 0 0.12 Zwarma - The Shawarma Maker Pollachi
7 2021-08-05 00:00:00 0 0.12 Zwarma - The Shawarma Maker Pollachi
8 2021-08-06 00:00:00 0 0.12 Zwarma - The Shawarma Maker Pollachi

【讨论】:

在导入 IO 时,位置列被删除并与商家合并。该解决方案正在为此工作,但我不想合并商家和位置列..请告诉我..如果相同的解决方案可以通过选择最新的可用值来填补月末缺失的日期,那就太好了.. 从您的示例数据中,我看不到如何区分商家和位置。是最后的空间吗?解决方案真的是一样的,添加位置到groupby和系列的构造 更新了你也包括位置,只是系统地添加了【参考方案2】:

下面有一个更简单的答案。

第 1 步:通过重新映射 Month start 获取月份的第一个日期 tst1 = tst.groupby(uni_ind).resample('MS').bfill().reset_index(level=(0,1,2,3,4,5),drop= True).reset_index() 第 2 步:首先使用原始 df 附加月份 tst3 = tst.reset_index().append(tst1) 第 3 步:删除重复项,因为可能有几个月开始几个月 tst3.drop_duplicates(inplace=True,ignore_index=False,keep='first') 第 4 步:将日期设置为要使用的重采样函数的索引 tst3.set_index('Date',inplace=True) 第 5 步:重新采样 df tst3.groupby(uni_ind , dropna= False).resample('D').ffill().reset_index( level=(0,1,2,3,4,5),drop= True).reset_index()

【讨论】:

以上是关于如何填充开始日期为每月第一天的缺失值?的主要内容,如果未能解决你的问题,请参考以下文章

Python Pandas 插值:在缺失的日期范围内重新分配值

填写缺失的日期值并根据前一行填充第二列

缺失值(NaN 值)与填充值的重叠图

在 Pandas 数据框列中填充缺失的日期值

使用 Power Query 过滤每月第一天和最后一天的日期

Netezza SQL:用最近的数据填充缺失值