有没有办法近似熊猫时间序列的周期性?对于R,xts
对象有一个调用的方法periodicity
就是为了这个目的.有没有实现的方法呢?
例如,我们可以推断出不指定频率的时间序列的频率吗?
import pandas.io.data as web aapl = web.get_data_yahoo("AAPL")[2010-01-04 00:00:00, ..., 2013-12-19 00:00:00] Length: 999, Freq: None, Timezone: None
这个系列的频率可以合理地近似为每日.
更新:
我认为显示R的周期性方法实现的源代码可能会有所帮助.
function (x, ...) { if (timeBased(x) || !is.xts(x)) x <- try.xts(x, error = "'x' needs to be timeBased or xtsible") p <- median(diff(.index(x))) if (is.na(p)) stop("can not calculate periodicity of 1 observation") units <- "days" scale <- "yearly" label <- "year" if (p < 60) { units <- "secs" scale <- "seconds" label <- "second" } else if (p < 3600) { units <- "mins" scale <- "minute" label <- "minute" p <- p/60L } else if (p < 86400) { units <- "hours" scale <- "hourly" label <- "hour" } else if (p == 86400) { scale <- "daily" label <- "day" } else if (p <= 604800) { scale <- "weekly" label <- "week" } else if (p <= 2678400) { scale <- "monthly" label <- "month" } else if (p <= 7948800) { scale <- "quarterly" label <- "quarter" } structure(list(difftime = structure(p, units = units, class = "difftime"), frequency = p, start = start(x), end = end(x), units = units, scale = scale, label = label), class = "periodicity") }
我认为这条线是关键,我不太明白
p <- median(diff(.index(x)))
这个时间序列会跳过周末(和假期),因此它实际上没有每日开始的频率.您可以使用asfreq
它将其上采样到具有每日频率的时间序列,但是:
aapl = aapl.asfreq('D', method='ffill')
这样做会将最后观察到的值向前传播到缺少值的日期.
请注意,Pandas还有一个工作日频率,因此也可以使用以下方式上传到工作日:
aapl = aapl.asfreq('B', method='ffill')
如果您希望自动化以天为单位推断中位数频率的过程,那么您可以这样做:
import pandas as pd import numpy as np import pandas.io.data as web aapl = web.get_data_yahoo("AAPL") f = np.median(np.diff(aapl.index.values)) days = f.astype('timedelta64[D]').item().days aapl = aapl.asfreq('{}D'.format(days), method='ffill') print(aapl)
这段代码需要测试,但也许它接近您发布的R代码:
import pandas as pd import numpy as np import pandas.io.data as web def infer_freq(ts): med = np.median(np.diff(ts.index.values)) seconds = int(med.astype('timedelta64[s]').item().total_seconds()) if seconds < 60: freq = '{}s'.format(seconds) elif seconds < 3600: freq = '{}T'.format(seconds//60) elif seconds < 86400: freq = '{}H'.format(seconds//3600) elif seconds < 604800: freq = '{}D'.format(seconds//86400) elif seconds < 2678400: freq = '{}W'.format(seconds//604800) elif seconds < 7948800: freq = '{}M'.format(seconds//2678400) else: freq = '{}Q'.format(seconds//7948800) return ts.asfreq(freq, method='ffill') aapl = web.get_data_yahoo("AAPL") print(infer_freq(aapl))