与大Pandas的互相关（时滞相关）？问题的回答

与大Pandas的互相关（时滞相关）？

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

我有各种各样的时间序列，我想相互关联，或者说相互关联，找出在哪一个时间滞后的相关系数是最大的。 我找到了<a href="https://stackoverflow.com/questions/25830840/python-cross-correlation">various</a><a href="https://stackoverflow.com/questions/6991471/computing-cross-correlation-function">questions</a>和讨论如何使用numpy的答案/链接，但这意味着我必须将数据帧转换为numpy数组。由于我的时间序列常常涵盖不同的时期，我担心我会陷入混乱。 编辑 所有numpy/scipy方法的问题是，它们似乎对我的数据的时间序列特性缺乏认识。当我将1940年开始的时间序列与1970年开始的时间序列关联起来时，pandas<code>corr</code>知道这一点，而<code>np.correlate</code>只生成一个1020个条目（较长序列的长度）的数组，其中包含nan。 关于这个主题的各种Q表示应该有一种方法来解决不同长度的问题，但是到目前为止，我还没有看到关于如何在特定时间段使用它的指示。我只需要以1为增量移动12个月，就可以看到一年内的最大相关时间。 编辑2 一些最小样本数据： <pre><code>import pandas as pd import numpy as np dfdates1 = pd.date_range('01/01/1980', '01/01/2000', freq = 'MS') dfdata1 = (np.random.random_integers(-30,30,(len(dfdates1)))/10.0) #My real data is from measurements, but random between -3 and 3 is fitting df1 = pd.DataFrame(dfdata1, index = dfdates1) dfdates2 = pd.date_range('03/01/1990', '02/01/2013', freq = 'MS') dfdata2 = (np.random.random_integers(-30,30,(len(dfdates2)))/10.0) df2 = pd.DataFrame(dfdata2, index = dfdates2) </code></pre> 由于不同的处理步骤，这些df最终转变为1940年至2015年索引的df。这应再现： <pre><code>bigdates = pd.date_range('01/01/1940', '01/01/2015', freq = 'MS') big1 = pd.DataFrame(index = bigdates) big2 = pd.DataFrame(index = bigdates) big1 = pd.concat([big1, df1],axis = 1) big2 = pd.concat([big2, df2],axis = 1) </code></pre> 这就是我与熊猫关联并移动一个数据集时得到的结果： <pre><code>In [451]: corr_coeff_0 = big1[0].corr(big2[0]) In [452]: corr_coeff_0 Out[452]: 0.030543266378853299 In [453]: big2_shift = big2.shift(1) In [454]: corr_coeff_1 = big1[0].corr(big2_shift[0]) In [455]: corr_coeff_1 Out[455]: 0.020788314779320523 </code></pre> 试着用剪刀： <pre><code>In [456]: scicorr = scipy.signal.correlate(big1,big2,mode="full") In [457]: scicorr Out[457]: array([[ nan], [ nan], [ nan], ..., [ nan], [ nan], [ nan]]) </code></pre> 根据<code>whos</code> <pre><code>scicorr ndarray 1801x1: 1801 elems, type `float64`, 14408 bytes </code></pre> 但我只想有12个条目。 /Edit2 我的想法是，我自己实现一个时滞关联，就像这样： <pre><code>corr_coeff_0 = df1['Data'].corr(df2['Data']) df1_1month = df1.shift(1) corr_coeff_1 = df1_1month['Data'].corr(df2['Data']) df1_6month = df1.shift(6) corr_coeff_6 = df1_6month['Data'].corr(df2['Data']) ...and so on </code></pre> 但这可能是缓慢的，我可能正在尝试在这里重新发明轮子。编辑上面的方法似乎奏效了，我已经把它放在一个循环中，一年中的所有12个月都要经历，但我还是更喜欢一种内置的方法。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

与大Pandas的互相关（时滞相关）？

1 个回答

相关Python问题