


1.scipy.stats.shapiro ——Shapiro-Wilk test,属于专门用来做正态性检验的模块,其原假设:样本数据符合正态分布。



def shapiro(x):
  Perform the Shapiro-Wilk test for normality.

  The Shapiro-Wilk test tests the null hypothesis that the
  data was drawn from a normal distribution.

  x : array_like
    Array of sample data.

  W : float
    The test statistic.
  p-value : float
    The p-value for the hypothesis test.




def kstest(rvs, cdf, args=(), N=20, alternative='two-sided', mode='approx'):
  Perform the Kolmogorov-Smirnov test for goodness of fit.

  This performs a test of the distribution G(x) of an observed
  random variable against a given distribution F(x). Under the null
  hypothesis the two distributions are identical, G(x)=F(x). The
  alternative hypothesis can be either 'two-sided' (default), 'less'
  or 'greater'. The KS test is only valid for continuous distributions.

  rvs : str, array or callable
    If a string, it should be the name of a distribution in `scipy.stats`.
    If an array, it should be a 1-D array of observations of random
    If a callable, it should be a function to generate random variables;
    it is required to have a keyword argument `size`.
  cdf : str or callable
    If a string, it should be the name of a distribution in `scipy.stats`.
    If `rvs` is a string then `cdf` can be False or the same as `rvs`.
    If a callable, that callable is used to calculate the cdf.
  args : tuple, sequence, optional
    Distribution parameters, used if `rvs` or `cdf` are strings.
  N : int, optional
    Sample size if `rvs` is string or callable. Default is 20.
  alternative : {'two-sided', 'less','greater'}, optional
    Defines the alternative hypothesis (see explanation above).
    Default is 'two-sided'.
  mode : 'approx' (default) or 'asymp', optional
    Defines the distribution used for calculating the p-value.

     - 'approx' : use approximation to exact distribution of test statistic
     - 'asymp' : use asymptotic distribution of test statistic

  statistic : float
    KS test statistic, either D, D+ or D-.
  pvalue : float
    One-tailed or two-tailed p-value.









def normaltest(a, axis=0, nan_policy='propagate'):
  Test whether a sample differs from a normal distribution.

  This function tests the null hypothesis that a sample comes
  from a normal distribution. It is based on D'Agostino and
  Pearson's [1]_, [2]_ test that combines skew and kurtosis to
  produce an omnibus test of normality.

  a : array_like
    The array containing the sample to be tested.
  axis : int or None, optional
    Axis along which to compute test. Default is 0. If None,
    compute over the whole array `a`.
  nan_policy : {'propagate', 'raise', 'omit'}, optional
    Defines how to handle when input contains nan. 'propagate' returns nan,
    'raise' throws an error, 'omit' performs the calculations ignoring nan
    values. Default is 'propagate'.

  statistic : float or array
    ``s^2 + k^2``, where ``s`` is the z-score returned by `skewtest` and
    ``k`` is the z-score returned by `kurtosistest`.
  pvalue : float or array
    A 2-sided chi squared probability for the hypothesis test.


axis=None 可以表示对整个数据做检验,默认值是0。

nan_policy:当输入的数据中有nan时,'propagate',返回空值;'raise' 时,抛出错误;'omit' 时,忽略空值。


4.scipy.stats.anderson:由 scipy.stats.kstest 改进而来,用于检验样本是否属于某一分布(正态分布、指数分布、logistic 或者 Gumbel等分布)


def anderson(x, dist='norm'):
  Anderson-Darling test for data coming from a particular distribution

  The Anderson-Darling tests the null hypothesis that a sample is
  drawn from a population that follows a particular distribution.
  For the Anderson-Darling test, the critical values depend on
  which distribution is being tested against. This function works
  for normal, exponential, logistic, or Gumbel (Extreme Value
  Type I) distributions.

  x : array_like
    array of sample data
  dist : {'norm','expon','logistic','gumbel','gumbel_l', gumbel_r',
    'extreme1'}, optional
    the type of distribution to test against. The default is 'norm'
    and 'extreme1', 'gumbel_l' and 'gumbel' are synonyms.

  statistic : float
    The Anderson-Darling test statistic
  critical_values : list
    The critical values for this distribution
  significance_level : list
    The significance levels for the corresponding critical values
    in percents. The function returns critical values for a
    differing set of significance levels depending on the
    distribution that is being tested against.





Critical values provided are for the following significance levels:

    15%, 10%, 5%, 2.5%, 1%
    25%, 10%, 5%, 2.5%, 1%, 0.5%
    25%, 10%, 5%, 2.5%, 1%


If the returned statistic is larger than these critical values then for the corresponding significance level, the null hypothesis that the data come from the chosen distribution can be rejected.

5.skewtest 和kurtosistest 检验:用于检验样本的skew(偏度)和kurtosis(峰度)是否与正态分布一致,因为正态分布的偏度=0,峰度=3。



6. 代码如下:

import numpy as np
from scipy import stats

a = np.random.normal(0,2,50)
b = np.linspace(0, 10, 100)

# Shapiro-Wilk test
S,p = stats.shapiro(a)
print('the shapiro test result is:',S,',',p)

# kstest(K-S检验)
K,p = stats.kstest(a, 'norm')

# normaltest
N,p = stats.normaltest(b)

# Anderson-Darling test
A,C,p = stats.anderson(b,dist='norm')




