Python数据分析笔记#6.2.3 Pandas-索引和选取

Yuan的学习笔记 2021-10-16

303

「目录」

6.1 => Pandas的数据结构
6.2 => Pandas的基本功能
--------> reindex重新索引
--------> drop丢弃数据
--------> 索引和选取
6.3 => 数学和统计方法

在pandas中有多个方法可以索引，选取和组合数据，下面我们来看看吧！

索引

对于Series

Series索引的方式和Numpy数组的类似：

In [3]: obj = pd.Series(np.arange(4.), index = ['a', 'b', 'c', 'd'])

In [4]: obj
Out[4]:
a    0.0
b    1.0
c    2.0
d    3.0
dtype: float64

In [5]: obj[1]
Out[5]: 1.0

In [6]: obj[2:4]
Out[6]:
c    2.0
d    3.0
dtype: float64

复制

只不过Series允许的索引值不只是整数：

In [7]: obj['a']
Out[7]: 0.0

In [8]: obj[['b', 'a', 'd']]
Out[8]:
b    1.0
a    0.0
d    3.0
dtype: float64

In [9]: obj[[1, 3]]
Out[9]:
b    1.0
d    3.0
dtype: float64

In [10]: obj[obj < 2]
Out[10]:
a    0.0
b    1.0
dtype: float64

复制

还记得python的切片吗？pandas里的切片运算也与普通的python切片运算不同，python的切片不包含最后一个，而pandas是包含的。

In [11]: obj['b':'c']
Out[11]:
b    1.0
c    2.0
dtype: float64

复制

通过索引和切片可以设置Series的相应部分：

In [12]: obj['a'] = 5

In [13]: obj['b':'c'] = 6

In [14]: obj
Out[14]:
a    5.0
b    6.0
c    6.0
d    3.0
dtype: float64

复制

对于DataFrame

我们再来看对DataFrame的索引，先创建一个DataFrame：

In [15]: data = pd.DataFrame(np.arange(16).reshape((4, 4)), index=['Beijing', 'Shanghai', 'Tokyo', 'New York'], columns=['one', 'two', 'three', 'four'])

In [16]: data
Out[16]:
          one  two  three  four
Beijing     0    1      2     3
Shanghai    4    5      6     7
Tokyo       8    9     10    11
New York   12   13     14    15

复制

用一个值或序列对DataFrame进行索引会获取一个或多个列：

In [17]: data['two']
Out[17]:
Beijing      1
Shanghai     5
Tokyo        9
New York    13
Name: two, dtype: int32

In [18]: data[['three', 'one']]
Out[18]:
          three  one
Beijing       2    0
Shanghai      6    4
Tokyo        10    8
New York     14   12

复制

通过切片可以选取DataFrame的行：

In [19]: data[:2]
Out[19]:
          one  two  three  four
Beijing     0    1      2     3
Shanghai    4    5      6     7

复制

我们还可以通过布尔型DataFrame设置条件来选取，比如我们要找所有小于5的元素：

In [20]: data < 5
Out[20]:
            one    two  three   four
Beijing    True   True   True   True
Shanghai   True  False  False  False
Tokyo     False  False  False  False
New York  False  False  False  False

复制

我们可以看到所有满足条件的元素都是True，不满足的则是False。

现在将这个布尔型DataFrame当作索引传入，将所有对应为True的位置的元素改为0：

In [21]: data[data < 5] = 0

In [22]: data
Out[22]:
          one  two  three  four
Beijing     0    0      0     0
Shanghai    0    5      6     7
Tokyo       8    9     10    11
New York   12   13     14    15

复制

用loc和iloc进行选取

对于DataFrame，pandas的作者引入了特殊的标签运算符loc和iloc，可以让我们用类似Numpy的方式从DataFrame选择行和列的子集。

loc和iloc的不同在于iloc是整数索引，而loc是轴标签索引。

通过loc使用标签来选取一行和多列：

In [23]: data.loc['Shanghai', ['two', 'three']]
Out[23]:
two      5
three    6
Name: Shanghai, dtype: int32

复制

loc函数可以传入一个标签或多个标签的切片：

In [27]: data.loc[:'Beijing', 'two']
Out[27]:
Beijing    0
Name: two, dtype: int32

复制

通过iloc使用整数进行选取：

In [24]: data.iloc[2, [3, 0, 1]]
Out[24]:
four    11
one      8
two      9
Name: Tokyo, dtype: int32

In [25]: data.iloc[2]
Out[25]:
one       8
two       9
three    10
four     11
Name: Tokyo, dtype: int32

In [26]: data.iloc[[1, 2], [3, 0, 1]]
Out[26]:
          four  one  two
Shanghai     7    0    5
Tokyo       11    8    9

复制

iloc函数也可以传入整数的切片：

In [28]: data.iloc[:, :3]
Out[28]:
          one  two  three
Beijing     0    0      0
Shanghai    0    5      6
Tokyo       8    9     10
New York   12   13     14

复制

at和iat

df.at，df.iat和loc，iloc相似，只不过df.at和df.iat通过行和列的标签（整数）选取单一的值：

In [29]: data.at['New York', 'three']
Out[29]: 14

In [30]: data.iat[3, 2]
Out[30]: 14

复制

但同样的事loc和iloc也能做啊，所以这俩不常用吧。

往期回顾

Pandas使用drop丢弃项

stay hungry, stay foolish.

记得点在看啦

python

文章转载自Yuan的学习笔记，如果涉嫌侵权，请发送邮件至：contact@modb.pro进行举报，并提供相关证据，一经查实，墨天轮将立刻删除相关内容。

Python数据分析笔记#6.2.3 Pandas-索引和选取

评论

相关阅读