做频数分布图,首先需要做频数分布表,步骤如下:
1) 找到数据中的最大值和最小值;
2) 分组:按最大值、最小值划分范围;
3) 决定“组值”:一般选择中间的数值;
4) 数出各组中的数据数——“频数”;
5) 计算“相对频数”,即各组的频数占全体的比例,相对频数相加等于1;
6) 计算“累计频数”,即频数合计,累计频数最终与全部数据数一致。
做直方图的步骤:
1) 在横轴上以等间距放置组值;
2) 在各组值上做柱形,柱的高度参考其组值所属分组的频数。
练习题目
学生体重数据如下,请做频数分布表和直方图:
48, 54, 47, 50, 53, 43, 45, 43, 44, 47,
58, 46, 46, 63, 49, 50, 48, 43, 46, 45,
50, 53, 51, 58, 52, 53, 47, 49, 45, 42,
51, 49, 58, 54, 45, 53, 50, 69, 44, 50,
58, 64, 40, 57, 51, 69, 58, 47, 62, 47,
40, 60, 48, 47, 53, 47, 52, 61, 55, 55,
48, 48, 46, 52, 45, 38, 62, 47, 55, 50,
46, 47, 55, 48, 50, 50, 54, 55, 48, 50
计算频数分布
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
weights = np.array([ 48, 54, 47, 50, 53, 43, 45, 43, 44, 47, 58, 46, 46, 63, 49, 50, 48, 43, 46, 45, 50, 53, 51, 58, 52, 53, 47, 49, 45, 42, 51, 49, 58, 54, 45, 53, 50, 69, 44, 50, 58, 64, 40, 57, 51, 69, 58, 47, 62, 47, 40, 60, 48, 47, 53, 47, 52, 61, 55, 55, 48, 48, 46, 52, 45, 38, 62, 47, 55, 50, 46, 47, 55, 48, 50, 50, 54, 55, 48, 50])
sections = [35,40,45,50,55,60,65,70]
group_names = ['36~40','41~45','46~50','51~55','56~60','61~65','66~70']
cuts = pd.cut(weights,sections,labels=group_names)
计算频数:
counts = pd.value_counts(cuts)
dict(counts)
{'36~40': 3,
'41~45': 11,
'46~50': 33,
'51~55': 19,
'56~60': 7,
'61~65': 5,
'66~70': 2}
直方图
cuts.value_counts().plot(kind='bar')