参考文献:
一篇绘制累积经验分布函数图像的博客
博客原文
# 这也是借鉴别人博客的代码,我再补充点自己的看法import numpy as npimport pandas as pdimport matplotlib.pyplot as pltdatas = np.array([64.3, 65.0, 65.0, 67.2, 67.3, 67.3, 67.3, 67.3, 68.0, 68.0, 68.8, 68.8, 68.8, 69.7, 69.7, 69.7, 70.3,70.4, 70.4, 70.4, 70.4, 70.4,70.4, 70.4, 71.2, 71.2, 71.2, 71.2, 72.0, 72.0, 72.0, 72.0, 72.0, 72.0, 72.0, 72.7, 72.7, 72.7, 72.7, 72.7, 72.7, 72.7, 73.5, 73.5, 73.5, 73.5, 73.5, 73.5, 73.5, 73.5, 73.5,73.5, 73.5, 74.3, 74.3, 74.3, 74.3, 74.3, 74.3, 74.3, 74.3, 74.7, 75.0, 75.0, 75.0, 75.0, 75.0, 75.0, 75.0, 75.4, 75.6, 75.8, 75.8, 75.8, 75.8, 75.8, 76.5, 76.5, 76.5, 76.5, 76.5, 76.5, 76.5, 77.2, 77.2,77.6, 78.0, 78.8, 78.8, 78.8, 79.5, 79.5, 79.5, 80.3, 80.5, 80.5, 81.2, 81.6, 81.6, 84.3]) #数据特征计算s = np.std(datas, ddof=1)#样本标准差xbar = np.mean(datas)#样本均值#数据可视化 画数据经验分布曲线图nt, bins, patches = plt.hist(datas, bins=10, histtype='step', cumulative=True, density=True, color='darkcyan')#datas是数据,bins是分组数plt.title('bins = 10')plt.savefig('经验函数分布图1.jpg', dpi=200)plt.show()#数据可视化 画数据经验分布曲线图nt, bins, patches = plt.hist(datas, bins=15, histtype='step', cumulative=True, density=True, color='darkcyan')#datas是数据,bins是分组数plt.title('bins = 15')#正态分布函数曲线拟合y = (1 / (np.sqrt(2 * np.pi) * s)) * np.exp(-0.5 * ((bins - xbar) ** 2 / s ** 2))y = y.cumsum()y = y / y[-1]plt.plot(bins, y, 'tomato', linewidth = 1.5, label = 'Theoretical')plt.savefig('经验函数分布图2.jpg', dpi=200)plt.show()# 这种方法有点奇怪,指定bins参数之后,就不能说绘制出来的经验分布函数是原样本的经验分布函数了# 自己试试看,图太多了不方便一一往上贴,看看返回数组plt.hist([3,3,1,4],histtype='step',cumulative=True,density=True)Out[25]: (array([0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.75, 0.75, 0.75, 1. ]), array([1. , 1.3, 1.6, 1.9, 2.2, 2.5, 2.8, 3.1, 3.4, 3.7, 4. ]), [<matplotlib.patches.Polygon at 0x14fb356be20>])plt.hist([3,3,1,4],histtype='step',cumulative=True) # 没有density参数表示计数,而非计算频率Out[26]: (array([1., 1., 1., 1., 1., 1., 3., 3., 3., 4.]), array([1. , 1.3, 1.6, 1.9, 2.2, 2.5, 2.8, 3.1, 3.4, 3.7, 4. ]), [<matplotlib.patches.Polygon at 0x14fb4c47d90>]) plt.hist([3,3,1,4],histtype='step',cumulative=True,bins=4,density=True)Out[28]: (array([0.25, 0.25, 0.75, 1. ]), array([1. , 1.75, 2.5 , 3.25, 4. ]), [<matplotlib.patches.Polygon at 0x14fb6e5bb20>])plt.hist([3,3,1,4],histtype='step',cumulative=True,bins=3,density=True)Out[29]: (array([0.25, 0.25, 1. ]), array([1., 2., 3., 4.]), [<matplotlib.patches.Polygon at 0x14fb4c31850>])plt.hist([3,3,1,4],histtype='step',cumulative=True,bins=5,density=True)Out[30]: (array([0.25, 0.25, 0.25, 0.75, 1. ]), array([1. , 1.6, 2.2, 2.8, 3.4, 4. ]), [<matplotlib.patches.Polygon at 0x14fb4c37820>]) # 也就是说,bins参数是平均分原数据的份数。由原数据的最大值和最小值以及bins参数共同决定各个子区间的范围。 方法二: from statsmodels.distributions.empirical_distribution import ECDFx = [3,3,1,4]ecdf = ECDF([3,3,1,4])type(ecdf)Out[37]: statsmodels.distributions.empirical_distribution.ECDFy = ecdf(x) # 计算分布函数值x.sort()xOut[43]: [1, 3, 3, 4]y.sort()# 画阶梯函数之前一定要记得排序,不然就是乱七八糟的回字形plt.step(x,y)Out[45]: [<matplotlib.lines.Line2D at 0x14fc03b6850>]这就是我想要的了