使用 pandas 数据框将坐标字符串拆分为 X 和 Y 列

Posted 2023-03-12

技术标签:

【中文标题】使用 pandas 数据框将坐标字符串拆分为 X 和 Y 列【英文标题】：Splitting a coordinate string into X and Y columns with a pandas data frame 【发布时间】：2020-12-13 07:46:01 【问题描述】：

所以我创建了一个 pandas 数据框，显示事件的坐标和这些坐标出现的次数，并且坐标显示在这样的字符串中。

      Coordinates  Occurrences   x
0     (76.0, -8.0)           1   0
1   (-41.0, -24.0)           1   1
2     (69.0, -1.0)           1   2
3     (37.0, 30.0)           1   3
4     (-60.0, 1.0)           1   4
..             ...         ...  ..
63  (-45.0, -11.0)           1  63
64    (80.0, -1.0)           1  64
65    (84.0, 24.0)           1  65
66     (76.0, 7.0)           1  66
67   (-81.0, -5.0)           1  67

我想创建一个新的数据框，分别显示 x 和 y 坐标，并像这样显示它们的出现--

 x    Occurrences y  Occurrences
    76      ...     -8       ...
   -41      ...     -24      ...
    69      ...     -1       ...
    37      ...     -30      ...
    60      ...      1       ...

我试图拆分字符串，但我认为我做的不正确，也不知道如何将它添加到表中——我想我以后必须做类似 for 循环的事情在我的代码中——我从 API 中抓取数据，这里是设置显示的数据框的代码。

for key in contents['liveData']['plays']['allPlays']:
    # for plays in key['result']['event']:
        # print(key)
    if (key['result']['event'] == "Shot"):
        #print(key['result']['event'])
        scoordinates = (key['coordinates']['x'], key['coordinates']['y'])
        if scoordinates not in shots:
            shots[scoordinates] = 1
        else:
            shots[scoordinates] += 1
    if (key['result']['event'] == "Goal"):
        #print(key['result']['event'])
        gcoordinates = (key['coordinates']['x'], key['coordinates']['y'])
        if gcoordinates not in goals:
            goals[gcoordinates] = 1
        else:
            goals[gcoordinates] += 1     
            
#create data frame using pandas
gdf = pd.DataFrame(list(goals.items()),columns = ['Coordinates','Occurences'])
print(gdf)
sdf = pd.DataFrame(list(shots.items()),columns = ['Coordinates','Occurences'])
print()

【问题讨论】：

【参考方案1】：

试试这个

import re
df[['x', 'y']] = df.Coordinates.apply(lambda c: pd.Series(dict(zip(['x', 'y'], re.findall('[-]?[0-9]+\.[0-9]+', c.strip())))))

【讨论】：

【参考方案2】：

使用内置的字符串方法来实现这一点应该是高性能的：

df[["x", "y"]] = df["Coordinates"].str.strip(r"[()]").str.split(",", expand=True).astype(np.float)

（这也将 x,y 转换为浮点值，尽管可能不需要）

【讨论】：

以上是关于使用 pandas 数据框将坐标字符串拆分为 X 和 Y 列的主要内容，如果未能解决你的问题，请参考以下文章