plot函数python,python怎样控制qq

Q-Q图表主要可用于回答这些问题：

两组数据是否来自同一分布

PS :当然也可以用KS进行检查，利用python中的scipy.stats.ks_2samp函数得到差分KS statistic和p值进行判断。

两组数据的尺度范围是否一致

是否有与两组数据相似的分布形状

前面两个问题可以通过样本数据集的Q-Q图上的点与基准线的距离来判断；后者用点的拟合线的倾斜度来判断。

用Q-Q图分析分布有什么好处？谁说对了就给他)

两组数据集的大小可以不同

可以回答上面后者的两个问题。这是更深的数据分布水平的信息。

那么，Q-Q图是怎么画的呢？

参考其中一个数据，将另一个数据作为样本。样本数据集中的样本数据的每个值的百分比是Q-Q图表上的横轴值，而当该值被放置在参考数据集中时，该百分比是Q-Q图表上的纵轴。一般来说，我们在Q-Q图上做45度的基准线。如果两组数据来自同一分布，则样本数据集的所有点都应该靠近基准线；相反，如果距离较远，则表明这两个数据很可能来自不同的分布。

在python中利用scipy.stats.percentileofscore函数可以方便地计算上诉所需的百分位数；可以使用numpy.polyfit函数和sk learn.linear _ model.linear regression类来拟合采样点的回归曲线

from scipy.statsimportpercentileofscore

froms klearn.linear _ modelimportlinearregression

导入pandas as PD

import matplotlib.pyplot as plt

# df_samp，df _ cluaretwodataframeswithinputdataset

ref=NP.asArray(df_clu ) )。

samp=NP.asarray(df_samp ) )。

ref_id=df_clu.columns

samp_id=df_samp.columns

# theoretical quantiles

samp _ pct _ x=NP.as array ([ percentileofscore (ref，x ) for x in samp]

#样本质量

samp _ pct _ y=NP.as array ((percentileofscore (samp，x ) for x in samp] () ) ) ) ) ) )。

# estimatedlinearregressionmodel

p=NP.polyfit(samp_pct_x，samp_pct_y，1 ) ) ) ) ) ) ) )。

regr=LinearRegression (

model _ x=samp _ pct _ x.reshape (len (samp _ pct _ x )，1 ) ) ) ) ) 65

model _ y=samp _ pct _ y.reshape (len (samp _ pct _ y )，1 ) ) ) ) ) 65

REGR.fit(model_x，model_y ) )。

R2=Regr.Score(model_x，model_y ) ) ) ) ) ) ) )。

#获取fit regression line

if p[1] 0:

p_function='y=%s x %s，r-square=%s'%(str(p[0]，str ) p[1]，str (r ) R2 ) )

elif p[1] 0:

p_function='y=%s x - %s，r-square=%s'%(str(p[0]，str(-p[1]，str ) ) )

else:

p_function='y=%s x，r-square=%s'%(str(p[0]，str ) R2 ) )

print ' thefittedlinearregressionmodelinq-qplotusingdatafromenterprises % sand cluster % sis % s ' % (str (samp _ id )，ssid )

#打印q-q打印

x _ ticks=NP.arange (0，100，20 ) )。

y _ ticks=NP.arange (0，100，20 ) )。

PLT.Scatter(x=samp_pct_x，y=samp_pct_y，color='blue ' )

PLT.xlim () (0，100 ) )

PLT.ylim () (0，100 ) )

#添加fit注册线

PLT.plot(samp_pct_x，regr.predict ) model_x )，color='red '，linewidth=2) ) ) )。

# add 45 -删除参考线

PLT.plot ([ 0，100 ]、[ 0，100 ]、linewidth=2)。

PLT.text (10，70，p_function ) )。

PLT.xticks(x_ticks，x_ticks ) )。

PLT.yticks(y_ticks，y_ticks ) )。

PLT.xlabel (clusterquantiles-id : % s ) % str (ref _ id ) )

PLT.ylabel (samplequantiles-id : % s ) % str (samp _ id ) )

PLT.title (“% SVS % sq-q打印“str(ref_id )，str ) samp_id ) )

plt.show () )

如上图所示，本例中使用的样本数据向左下稀疏，集中在右上，整体向上偏移，表明其分布应该与参考数据不同(分布形状不同)，KS检验中ks-statistic :0 p_value: 0.000000也验证了这一点；但是，其斜率约为1，整体上偏差幅度不大，表明这两组数据的尺度相近。

PS :此处的方法适用于不知道数据分布的情况。如果要验证数据是否适合某个已知分布，例如，对于正态分布，请向左使用scipy.stats.probplot函数。

参考：