我有一些关于pandas.stats.moments的rolling_std函数的问题.奇怪的是,与应用于数组滚动窗口的numpy.std函数相比,使用此功能得到的结果不同.
这是重现此错误的代码:
# import the modules import numpy as np import pandas as pd # define timeseries and sliding window size timeseries = np.arange(10) periods = 4 # output of different results pd.stats.moments.rolling_std(timeseries, periods) [np.std(timeseries[max(i-periods+1,0):i+1]) for i in np.arange(10)]
产量:
#pandas array([ nan, nan, nan, 1.29099445, 1.29099445, 1.29099445, 1.29099445, 1.29099445, 1.29099445, 1.29099445]) #numpy [0.0, 0.5, 0.81649658092772603, 1.1180339887498949, 1.1180339887498949, 1.1180339887498949, 1.1180339887498949, 1.1180339887498949, 1.1180339887498949, 1.1180339887498949]
如果我手工计算这个结果似乎是正确的.有没有人遇到这个或有解释?
Pandas' rolling_std
使用默认的delta自由度计算ddof
,等于1,在该方面更像R.虽然numpy的std的默认ddof为0.在指定时ddof=1
,您将获得相同的结果np.std
>>> [np.std(timeseries[max(i-periods+1,0):i+1], ddof=1) for i in np.arange(10)] [nan, 0.70710678118654757, 1.0, 1.2909944487358056, 1.2909944487358056, 1.2909944487358056, 1.2909944487358056, 1.29099444873580 56, 1.2909944487358056, 1.2909944487358056]
或ddof=0
为rolling_std
:
>>> pd.stats.moments.rolling_std(timeseries, periods, ddof=0) array([ nan, nan, nan, 1.11803399, 1.11803399, 1.11803399, 1.11803399, 1.11803399, 1.11803399, 1.11803399])