DataFrame & Series

Posted 蔡普光Blogs

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了DataFrame & Series相关的知识,希望对你有一定的参考价值。

DataFrame & Series

DataFrames

数据框是一种二维数据结构,即数据在行和列中以表格方式对齐。

以下是数据框架的特征。

  • 潜在的列是不同的类型
  • 大小 – 可变
  • 带标签的轴(行和列)
  • 可以对行和列进行算术运算

结构【structure】

让我们假设我们正在使用学生的数据创建一个数据框。

Structure Table

您可以将其视为 SQL 表或电子表格数据表示。

pandas.DataFrame

  • 可以使用以下构造函数创建 Pandas DataFrame -

    pandas.DataFrame( data, index, columns, dtype, copy)

  • 构造函数的参数如下 -

    Sr.No Parameter & Description
    1 data: data 采用各种形式,如 ndarray, series, map, lists, dict, constants and also another DataFrame.
    2 index: 对于行标签,如果没有传递索引,则用于结果帧的索引是可选的默认 np.arange(n)。
    3 columns: 对于列标签,可选的默认语法是 - np.arange(n)。这仅在没有传递索引时才成立。
    4 dtype: 每列的数据类型。
    5 copy: 如果默认值为 False,则此命令(或其他任何命令)用于复制数据。

Create an Empty DataFrame

可以创建的基本数据帧是空数据帧。

Example

#import the pandas library and aliasing as pd
import pandas as pd
df = pd.DataFrame()
print df

Its output is as follows −

Empty DataFrame
Columns: []
Index: []

Create a DataFrame from Lists

可以使用单个列表或列表的列表创建 DataFrame。

Example 1

import pandas as pd
data = [1,2,3,4,5]
df = pd.DataFrame(data)
print df

Its output is as follows −

     0
0    1
1    2
2    3
3    4
4    5

Example 2

import pandas as pd
data = [[\'Alex\',10],[\'Bob\',12],[\'Clarke\',13]]
df = pd.DataFrame(data,columns=[\'Name\',\'Age\'])
print df

Example 3

import pandas as pd
data = [[\'Alex\',10],[\'Bob\',12],[\'Clarke\',13]]
df = pd.DataFrame(data,columns=[\'Name\',\'Age\'],dtype=float)
print df

Its output is as follows −

      Name     Age
0     Alex     10.0
1     Bob      12.0
2     Clarke   13.0

注意 - 显而易见,dtype 参数将 Age 列的类型更改为浮点数。

Create a DataFrame from Dict of ndarrays / Lists

所有 ndarray 必须具有相同的长度。如果索引超出,则索引的长度应等于数组的长度。

如果索引未超出,则默认情况下,索引长将为 range(n),其中 n 是数组长度。

Example 1

import pandas as pd
data = {\'Name\':[\'Tom\', \'Jack\', \'Steve\', \'Ricky\'],\'Age\':[28,34,29,42]}
df = pd.DataFrame(data)
print df

Its output is as follows −

      Age      Name
0     28        Tom
1     34       Jack
2     29      Steve
3     42      Ricky

注意 - 显而易见 0,1,2,3。它们是使用函数 range(n) 分配给每个的默认索引。

Example 2

现在让我们使用arrays创建一个带索引的 DataFrame。

import pandas as pd
data = {\'Name\':[\'Tom\', \'Jack\', \'Steve\', \'Ricky\'],\'Age\':[28,34,29,42]}
df = pd.DataFrame(data, index=[\'rank1\',\'rank2\',\'rank3\',\'rank4\'])
print df

Its output is as follows −

         Age    Name
rank1    28      Tom
rank2    34     Jack
rank3    29    Steve
rank4    42    Ricky

注意 - 显而易见,索引参数为每一行分配一个索引。

Create a DataFrame from List of Dicts

字典列表可以作为输入数据传递以创建一个 DataFrame。默认情况下,字典的键作为列名。

Example 1

The following example shows how to create a DataFrame by passing a list of dictionaries.

import pandas as pd
data = [{\'a\': 1, \'b\': 2},{\'a\': 5, \'b\': 10, \'c\': 20}]
df = pd.DataFrame(data)
print df

Its output is as follows −

    a    b      c
0   1   2     NaN
1   5   10   20.0

Note -显而易见,NaN(非数字)附加在缺失区域。

Example 2

The following example shows how to create a DataFrame by passing a list of dictionaries and the row indices.

import pandas as pd
data = [{\'a\': 1, \'b\': 2},{\'a\': 5, \'b\': 10, \'c\': 20}]
df = pd.DataFrame(data, index=[\'first\', \'second\'])
print df

Its output is as follows −

        a   b       c
first   1   2     NaN
second  5   10   20.0

Example 3

The following example shows how to create a DataFrame with a list of dictionaries, row indices, and column indices.

import pandas as pd
data = [{\'a\': 1, \'b\': 2},{\'a\': 5, \'b\': 10, \'c\': 20}]

#With two column indices, values same as dictionary keys
df1 = pd.DataFrame(data, index=[\'first\', \'second\'], columns=[\'a\', \'b\'])

#With two column indices with one index with other name
df2 = pd.DataFrame(data, index=[\'first\', \'second\'], columns=[\'a\', \'b1\'])
print df1
print df2

Its output is as follows −

#df1 output
         a  b
first    1  2
second   5  10

#df2 output
         a  b1
first    1  NaN
second   5  NaN

Note − Observe, df2 DataFrame is created with a column index other than the dictionary key; thus, appended the NaN’s in place. Whereas, df1 is created with column indices same as dictionary keys, so NaN’s appended.

Create a DataFrame from Dict of Series

可以通过Dict of Series以形成数据帧。结果索引是所有通过的系列索引的并集。

Example

import pandas as pd

d = {\'one\' : pd.Series([1, 2, 3], index=[\'a\', \'b\', \'c\']),
   \'two\' : pd.Series([1, 2, 3, 4], index=[\'a\', \'b\', \'c\', \'d\'])}

df = pd.DataFrame(d)
print df

Its output is as follows −

      one    two
a     1.0    1
b     2.0    2
c     3.0    3
d     NaN    4

Note − Observe, 对于系列一,没有传递标签“d”,但在结果中,对于 d 标签, 附加了 NaN。

现在让我们通过示例了解列的选择、添加和删除。

Column Selection

从 DataFrame 中选择一列。

Example

import pandas as pd

d = {\'one\' : pd.Series([1, 2, 3], index=[\'a\', \'b\', \'c\']),
   \'two\' : pd.Series([1, 2, 3, 4], index=[\'a\', \'b\', \'c\', \'d\'])}

df = pd.DataFrame(d)
print df [\'one\']

Its output is as follows −

a     1.0
b     2.0
c     3.0
d     NaN
Name: one, dtype: float64

Column Addition

向现有DataFrame添加新列

Example

import pandas as pd

d = {\'one\' : pd.Series([1, 2, 3], index=[\'a\', \'b\', \'c\']),
   \'two\' : pd.Series([1, 2, 3, 4], index=[\'a\', \'b\', \'c\', \'d\'])}

df = pd.DataFrame(d)

# Adding a new column to an existing DataFrame object with column label by passing new series

print ("Adding a new column by passing as Series:")
df[\'three\']=pd.Series([10,20,30],index=[\'a\',\'b\',\'c\'])
print df

print ("Adding a new column using the existing columns in DataFrame:")
df[\'four\']=df[\'one\']+df[\'three\']

print df

Its output is as follows −

Adding a new column by passing as Series:
     one   two   three
a    1.0    1    10.0
b    2.0    2    20.0
c    3.0    3    30.0
d    NaN    4    NaN

Adding a new column using the existing columns in DataFrame:
      one   two   three    four
a     1.0    1    10.0     11.0
b     2.0    2    20.0     22.0
c     3.0    3    30.0     33.0
d     NaN    4     NaN     NaN

Column Deletion

Columns can be deleted or popped;

Example

# Using the previous DataFrame, we will delete a column
# using del function
import pandas as pd

d = {\'one\' : pd.Series([1, 2, 3], index=[\'a\', \'b\', \'c\']), 
   \'two\' : pd.Series([1, 2, 3, 4], index=[\'a\', \'b\', \'c\', \'d\']), 
   \'three\' : pd.Series([10,20,30], index=[\'a\',\'b\',\'c\'])}

df = pd.DataFrame(d)
print ("Our dataframe is:")
print df

# using del function
print ("Deleting the first column using DEL function:")
del df[\'one\']
print df

# using pop function
print ("Deleting another column using POP function:")
df.pop(\'two\')
print df

Its output is as follows −

Our dataframe is:
      one   three  two
a     1.0    10.0   1
b     2.0    20.0   2
c     3.0    30.0   3
d     NaN     NaN   4

Deleting the first column using DEL function:
      three    two
a     10.0     1
b     20.0     2
c     30.0     3
d     NaN      4

Deleting another column using POP function:
   three
a  10.0
b  20.0
c  30.0
d  NaN

Row Selection, Addition, and Deletion

We will now understand row selection, addition and deletion through examples.

Selection by Label

Rows can be selected by passing row label to a loc function.

import pandas as pd

d = {\'one\' : pd.Series([1, 2, 3], index=[\'a\', \'b\', \'c\']), 
   \'two\' : pd.Series([1, 2, 3, 4], index=[\'a\', \'b\', \'c\', \'d\'])}

df = pd.DataFrame(d)
print df.loc[\'b\']

Its output is as follows −

one 2.0
two 2.0
Name: b, dtype: float64

The result is a series with labels as column names of the DataFrame. And, the Name of the series is the label with which it is retrieved.

Selection by integer location

Rows can be selected by passing integer location to an iloc function.

import pandas as pd

d = {\'one\' : pd.Series([1, 2, 3], index=[\'a\', \'b\', \'c\']),
   \'two\' : pd.Series([1, 2, 3, 4], index=[\'a\', \'b\', \'c\', \'d\'])}

df = pd.DataFrame(d)
print df.iloc[2]

Its output is as follows −

one   3.0
two   3.0
Name: c, dtype: float64
Slice Rows切片行

Multiple rows can be selected using ‘ : ’ operator.

import pandas as pd

d = {\'one\' : pd.Series([1, 2, 3], index=[\'a\', \'b\', \'c\']), 
   \'two\' : pd.Series([1, 2, 3, 4], index=[\'a\', \'b\', \'c\', \'d\'])}

df = pd.DataFrame(d)
print df[2:4]

Its output is as follows −

   one  two
c  3.0    3
d  NaN    4
Addition of Rows

Add new rows to a DataFrame using the append function. This function will append the rows at the end.

import pandas as pd

df = pd.DataFrame([[1, 2], [3, 4]], columns = [\'a\',\'b\'])
df2 = pd.DataFrame([[5, 6], [7, 8]], columns = [\'a\',\'b\'])

df = df.append(df2)
print df

Its output is as follows −

   a  b
0  1  2
1  3  4
0  5  6
1  7  8
Deletion of Rows

使用索引标签从 DataFrame 中删除或删除行。如果标签重复,则将删除多行。

如果您观察到,在上面的示例中,标签是重复的。让我们删除一个标签,看看有多少行会被删除。

import pandas as pd

df = pd.DataFrame([[1, 2], [3, 4]], columns = [\'a\',\'b\'])
df2 = pd.DataFrame([[5, 6], [7, 8]], columns = [\'a\',\'b\'])

df = df.append(df2)

# Drop rows with label 0
df = df.drop(0)

print df

Its output is as follows −

  a b
1 3 4
1 7 8

Series

references

Python Pandas - DataFrame - Tutorialspoint

R - Data Frames - Tutorialspoint

以上是关于DataFrame & Series的主要内容,如果未能解决你的问题,请参考以下文章

TypeError: to_append 应该是 Series 或 Series 的列表/元组,得到 DataFrame

将 DataFrame 或 Series 列表转换为一个堆叠的 DataFrame(或 Series)

pandas DataFrame-合并DataFrame与Series

pandas.Series() 使用 DataFrame Columns 创建返回 NaN 数据条目

3-Panda之Series和DataFrame区别

对于使用 Dask 的 Series 对象,“试图在来自 DataFrame 的切片的副本上设置一个值”?