<p>这个答案附带了一个免责声明:我已经15年没有做过这种类型的时间序列分析了,因此我非常生锈。我非常欢迎任何更正,因为我是凭记忆工作的,可能忘记了一些事情。希望它能给你一些想法,如果你的数据允许的话,我认为你会想做这样的事情,你可以改进方法。(如果是,请回复或编辑答案!)</p>
<p>正如我在上面的评论中提到的,如果您可以将数据集分成多个部分,那么您就可以开始获得数据集中不同频率的能量的置信限。当然,这意味着您需要很长的数据集,这些数据集可能可用,也可能不可用,这取决于您所做的工作。但就像任何统计技术一样,如果你没有很多数据,你很难对你的发现有信心。我将继续描述我将如何确定有一个信号,你可以反转它,找到哪里有<em>而不是</em>一个信号</p>
<p>我要做的第一件事是将数据集分解为多个部分。你能把它分成多少部分取决于你的数据集相对于你试图找到的信号的周期性(或者在你的例子中没有找到)有多长。要获得更多的分段,可以做的一件事是重叠它们,但为了节省能量,需要使用适当的窗口进行过滤。在下面的代码中,我假设有50%的重叠,并应用<a href="https://en.wikipedia.org/wiki/Hann_function" rel="nofollow noreferrer">Hanning window</a>。我真的忘记了不同窗口和重叠的细节,所以我很抱歉,我不能提供更多关于为什么我选择这个窗口重叠的信息</p>
<p>希望您能从数据集中得出您的背景状态的分布或期望值。也许它只是一个白噪声地板,或者更复杂的红移或蓝移。功率谱的背景电平应提供噪声电平的信息,因为功率谱能量与噪声有关,如<code>2 * noise**2 / (fs * L)</code>,其中<code>fs</code>是采样频率,<code>L</code>是窗口长度。如果我没记错的话,就像我说的,我很生疏</p>
首先,让我们来看看叠加在一起的噪声和噪声的功率谱。<p>
<pre><code>import numpy as np
import scipy as sp
import matplotlib.pyplot as plt
def detrend(y):
# This function removes the linear trend from the time series
# by fitting y = mx + b to x and then subtracting it
x = np.arange(len(y))
A = np.vstack([x, np.ones(len(x))]).T
m, b = np.linalg.lstsq(A, y, rcond=None)[0]
return(y - m * x - b)
def subset(f, L, overlap, fs):
# Break the function f into sections of length L.
# Normally you do some overlapping of sections,
# and you want to use an appropriate window
# to conserve the energy in the time series signal.
# Here I use a Hanning window.
N = len(f)
x = []
for i in range(0, int(N-L+1), int(L * overlap)):
window = np.hanning(L)
y = detrend(f[i:i+L]) * window
fft = np.fft.fft(y)
fft = fft[:len(fft)//2+1]
x.append(np.real(fft * np.conj(fft)) / (fs * L))
return(x)
def noise(N):
# Create a white noise field
return(np.random.random(N) - 0.5)
N = 10**4 # length of time series
dt = 1 # sampling period
fs = 2 * np.pi / dt # sampling frequency
nyquist = fs / 2 # nyquist frequency
t = np.linspace(dt, N * dt, N) # times of data collection
L = N // 100 # window size used for subsampling
overlap = 0.5 # overlap in windows
# frequencies that will come out of the fft
# you can also use fp.fft.fftfreq and adjust
freqs = np.linspace(0, nyquist, L//2+1)
# Create background white noise
noise_amplitude = 3
energy_scale = 2 * noise_amplitude**2 / (fs * L)
WhiteNoise = np.array(subset(noise_amplitude * noise(N), L, overlap, fs) ) / energy_scale
# Create another timeseries with a signal and noise
omega = 2 * np.pi / 10 # frequency in dataset
signal = (noise_amplitude / 4) * np.cos(t * omega) # signal in dataset
data = signal + noise_amplitude * noise(N) # dataset
DataSet = np.array(subset(data, L, overlap, fs)) / energy_scale
# Percentiles of White Noise Power Spectra, for plotting
Wprct = np.percentile(WhiteNoise, [2.5, 25, 50, 75, 97.5], axis = 0)
# Percentiles of Data Power Spectra, for plotting
Dprct = np.percentile(DataSet, [2.5, 25, 50, 75, 97.5], axis = 0)
fig = plt.figure()
# Plot the spectra from all the subsets of the noise
for w in WhiteNoise:
plt.semilogy(freqs[1:], w[1:], 'k', alpha = 0.05)
# Plot the median and confidence intervales
lines, = plt.semilogy(freqs[1:], Wprct[2,1:], label = 'Median of noise floor')
plt.fill_between(freqs[1:], Wprct[1,1:], Wprct[3,1:], alpha = 0.8, color = lines.get_color(), label = '50% CI of noise floor')
plt.fill_between(freqs[1:], Wprct[0,1:], Wprct[1,1:], alpha = 0.4, color = lines.get_color(), label = '95% CI of noise floor')
plt.fill_between(freqs[1:], Wprct[3,1:], Wprct[4,1:], alpha = 0.4, color = lines.get_color())
lines, = plt.plot(freqs[1:], Dprct[2,1:], label = 'Median signal-to-noise estimate')
plt.plot([omega, omega], [10**-3, 100], 'k:', label = 'Signal in dataset')
plt.plot([freqs[1], freqs[-1]], [1, 1], 'r:', label = 'Noise floor')
plt.legend()
plt.xlabel('Angular frequency')
plt.ylabel('Energy above noise floor [db]')
</code></pre>
<p><a href="https://i.stack.imgur.com/sZSzq.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/sZSzq.png" alt="psd of signals"/></a></p>
<p>很明显,信号穿过了噪音,但我认为我们可以想出比“用眼睛盯着它”更好的办法。在下面的代码中,我只看我的“数据集”的每个频率上高于噪声分布和相应比例的点数。从噪波中提取值有50%的几率使值大于噪波,因此
你希望你的信号大大高于50%。(另一方面,如果该值小于50%,则表明信号在噪声地板下方<em>,这将很有趣。)有<a href="http://www.talkstats.com/threads/probability-of-one-random-variable-being-greater-than-another.3987/" rel="nofollow noreferrer">more advanced approaches</a>,我没有在这里讨论,但我认为会给出相同的结果</p>
<pre><code>plt.figure()
for i in range(1, len(freqs)):
x = 0
for j in range(len(DataSet[:,i])):
x += np.sum(DataSet[j,i] > WhiteNoise[:,i])
x /= len(DataSet[:,i])**2
plt.plot(freqs[i], x, 'k.')
plt.plot([omega, omega], [0.4, 1], 'k:', label = 'Signal in dataset')
plt.xlabel('Angular frequency')
plt.ylabel('Probability signal is greater than noise')
plt.legend()
</code></pre>
<p><a href="https://i.stack.imgur.com/0IiTg.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/0IiTg.png" alt="probability signal > noise"/></a></p>
<p>我认为有人可以用这些方法来说明在不同频率下哪里有振荡,哪里没有振荡,这应该可以解决你们关于数据集季节性的问题</p>