Python数据分析笔记#6.2.6 Pandas-排序

Yuan的学习笔记 2021-10-31

417

「目录」

6.1 => Pandas的数据结构
6.2 => Pandas的基本功能
--------> reindex重新索引
--------> drop丢弃数据
--------> 索引和选取
--------> 算术运算
--------> 排序
6.3 => 数学和统计方法

排序

sort_index方法

平时我们对数据进行处理肯定少不了排序啊，如果要对行索引或列索引进行排序，可使用sort_index方法，它将返回一个已排序的新对象，下面是一个对Series排序的例子：

In [1]: import numpy as np

In [2]: import pandas as pd

In [3]: obj = pd.Series(range(4), index=['d', 'a', 'c', 'b'])

In [4]: obj
Out[4]:
d    0
a    1
c    2
b    3
dtype: int64

In [5]: obj.sort_index()
Out[5]:
a    1
b    3
c    2
d    0
dtype: int64

对于DataFrame，则可以选择在哪一个轴（维度）上排序：

In [6]: frame = pd.DataFrame(np.arange(8).reshape(2, 4), index=['three', 'one'], columns=['d', 'a', 'b', 'c'])

In [7]: frame
Out[7]:
       d  a  b  c
three  0  1  2  3
one    4  5  6  7

默认是在index轴上（axis=0）排序：

In [8]: frame.sort_index()
Out[8]:
       d  a  b  c
one    4  5  6  7
three  0  1  2  3

若传入参数axis=1，则会在columns轴上排序：

In [9]: frame.sort_index(axis=1)
Out[9]:
       a  b  c  d
three  1  2  3  0
one    5  6  7  4

排序默认是升序排序的，传入ascending=False参数则会降序排序：

In [10]: frame.sort_index(axis=1, ascending=False)
Out[10]:
       d  c  b  a
three  0  3  2  1
one    4  7  6  5

sort_values方法

sort_index是按照索引排序，若要按值排序可以使用sort_values方法：

In [11]: obj = pd.Series([4, 7, -3, 2])

In [12]: obj.sort_values()
Out[12]:
2   -3
3    2
0    4
1    7
dtype: int64

在对DataFrame排序时，我们可能想对某一列或多个列中的值进行排序。

可以通过将一个或多个列的名字传递给sort_values的by选项实现：

In [13]: frame = pd.DataFrame({'b':[4, 7, -3, 2], 'a':[0, 1, 9, 6]})

In [14]: frame
Out[14]:
   b  a
0  4  0
1  7  1
2 -3  9
3  2  6

In [15]: frame.sort_values(by='b')
Out[15]:
   b  a
2 -3  9
3  2  6
0  4  0
1  7  1

传入名称的列表则可以对多个列排序：

In [16]: frame.sort_values(by=['a', 'b'])
Out[16]:
   b  a
0  4  0
1  7  1
3  2  6
2 -3  9

rank

还记得老师手中的成绩单不，老师在拿到成绩单后怎么知道各个同学是第几名呢？sort_values只能对成绩由大到小进行排序，而rank方法可以获取各个同学成绩的名次。

In [17]: obj = pd.Series([7, -5, 7, 4, 2, 0, 4])

In [18]: obj
Out[18]:
0    7
1   -5
2    7
3    4
4    2
5    0
6    4
dtype: int64

In [19]: obj.sort_values()
Out[19]:
1   -5
5    0
4    2
3    4
6    4
0    7
2    7
dtype: int64

默认情况，rank会为每个元素分配一个平均排名。

在上面的Series中，第0个元素是7，7在由小到大的排名的位置是第6个和第7个，所以平均排名就是6.5；第1个元素是-5，-5是其中最小的元素，所以在由小到大的排序中排第一个，所以排名的值就是1.0。

In [20]: obj.rank()
Out[20]:
0    6.5
1    1.0
2    6.5
3    4.5
4    3.0
5    2.0
6    4.5
dtype: float64

通过传入method='first'，可以根据值在原数据中出现的顺序给出排名，若有重复的值，则会根据索引的先后顺序排名：

In [21]: obj.rank(method='first')
Out[21]:
0    6.0
1    1.0
2    7.0
3    4.0
4    3.0
5    2.0
6    5.0
dtype: float64

第0个元素和第2个元素都是7，但是第0个元素在第二个之前出现（first）,所以尽管值一样，第0个元素排第6，第2个元素排第7。

To be continue...

往期回顾

Pandas里的函数映射

stay hungry, stay foolish.

记得点在看啦

python

文章转载自Yuan的学习笔记，如果涉嫌侵权，请发送邮件至：contact@modb.pro进行举报，并提供相关证据，一经查实，墨天轮将立刻删除相关内容。

Python数据分析笔记#6.2.6 Pandas-排序

评论