使用libros理解STFT

2024-09-30 14:20:22 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个大约14秒的音频采样率为8khz。 我使用librosa从这个音频文件中提取一些特性。在

y, sr = librosa.load(file_name)
stft = np.abs(librosa.stft(y, n_fft=n_fft))

# file_length = 14.650022675736961 #sec
# defaults 
# n_fft =2048
# hop_length = 512 # win_length/4 = n_fft/4 = 512 (win_length = n_fft default)

#windowsTime = n_fft * Ts # (1/sr)

stft.shape
# (1025, 631)

规格说明:

^{pr2}$

[![stft sr=22050][1]][1]

现在, 我能理解STFT的形状

631 time bins = are 4 * ( file_length / Ts * windowsTime) #overlapping
1025 frequency bins = Frames frequency gap sr/n_fft.
so there are 1025 frequencies in 0 to sr/2(Nyquest)

我不明白的是两个不同的采样率的不同曲线 以相同的比率。 1-22050作为librosa的默认值 2-8khz作为采样率文件

y2, sr = librosa.load(file_name, sr=None)

n_fft2 =743 # (same ratio to get same visuals for comparsion)
hop_length = 186 # (1/4 n_fft by default)

stft2 = np.abs(librosa.stft(y2, n_fft=n_fft2))

所以stft的形状会有所不同

stft2.shape
# (372, 634)


[![stft sr=743][2]][2]

1。但是为什么绝对频率不一样呢?相同的信号只是没有过采样,所以每个样本都是唯一的。 我错过了什么?它是静态的y轴吗?

2。我不明白时间单位的价值。当第一个1在跃点长度内,而第二个bin是windowTime,从那一点到文件结束,我期望的是帧数。但这些单位都很威德?

我希望能够提取出特定时间(帧)内特定Fbin的大小,或者另外能够将其中的一些求和,从而得到时间范围的magnitue。在

因此,如果我取stft[number of fBin],它是1行1025个fBins(stft[1025]),并查看它的内容,那么stft[0]包含630个点,对于每个频率来说,这正好是630个时间点,因此第1-1025帧中的每一个都将具有相同的时间点。在

所以如果我取一个同样适用于所有其他fbins的样本(相同的时间点),那就是stft[0] 我可以选择时间帧和fIn并得到特定的震级:

times =  librosa.core.frames_to_time(stft2[0], sr=sr2, n_fft=n_fft2, hop_length=hop_length) 

fft_bin = 6
time_idx = 10

print('freq (Hz)', freqs[fft_bin])
print('time (s)', times[time_idx])
print('amplitude', stft[fft_bin, time_idx])

数组([0.047375,0.047625,0.04825,0.04825,0.046875,0.04675, 0.05,0.051625,0.051,0.048,0.05225,0.050375, 0.04925,0.04725,0.051625,0.0465,0.05225,0.05, 0.053,0.053875,0.048,0.0485,0.047875,0.04775, 0.0485,0.049,0.051375,0.047125,0.051125,0.047125, 0.04725,0.05025,0.05425,0.05475,0.051375,0.060375, 0.050625,0.04875,0.054125,0.048,0.05025,0.052375, 0.04975,0.054125,0.055625,0.047125,0.0475,0.047, 0.049875,0.05025,0.048375,0.047,0.050625,0.05, 0.046625,0.04925,0.048,0.049125,0.05375,0.0545, 0.04925,0.049125,0.049125,0.049625,0.047,0.047625, 0.0535,0.051875,0.05075,0.04975,0.047375,0.049, 050.050.050.050, 0.04725,0.0575,0.056875,0.047,0.0485,0.055375, 0.04975,0.047,0.0495,0.051375,0.04675,0.04925, 0.052125,0.04825,0.048125,0.046875,0.047,0.048625, 0.050875,0.05125,0.04825,0.052125,0.052375,0.05125, 0.049875,0.048625,0.04825,0.0475,0.048375,0.050875, 0.052875,0.0475,0.0485,0.05225,0.053625,0.05075, 0.0525,0.047125,0.0485,0.048875,0.049,0.0515, 0.055875、0.0515、0.05025、0.05125、0.054625、0.05525, 0.047,0.0545,0.052375,0.049875,0.051,0.048625, 0.0475,0.048,0.048875,0.050625,0.05375,0.051875, 0.048125,0.052125,0.048125,0.051,0.052625,0.048375, 0.047625,0.05,0.048125,0.050375,0.049125,0.053125, 0.053875,0.05075,0.052375,0.048875,0.05325,0.05825, 0.055625,0.0465,0.05475,0.051125,0.048375,0.0505, 0.04675,0.0495,0.04725,0.046625,0.049625,0.054, 0.056125、0.05175、0.050625、0.050375、0.047875、0.047, 0.048125,0.048875,0.050625,0.049875,0.047,0.0505, 0.047,0.053125,0.047625,0.05025,0.04825,0.05275, 0.051625,0.05,0.051625,0.05425,0.052,0.04775, 0.047,0.049125,0.05375,0.0535,0.04925,0.05125, 0.046375,0.04775,0.04775,0.0465,0.047,0.04675, 0.04675,0.04925,0.05125,0.046375,0.04825,0。05时25分, 0.057875,0.056375,0.054375,0.04825,0.0535,0.05475, 0.0485,0.048875,0.048625,0.0485,0.047625,0.046875, 0.0465,0.05125,0.054,0.05,0.048,0.047875, 0.0515,0.048125,0.055875,0.054875,0.051625,0.048125, 0.047625,0.048375,0.052875,0.0485,0.0475,0.0495, 0.05025,0.05675,0.0585,0.051625,0.05625,0.0605, 0.052125,0.0495,0.049,0.047875,0.051375,0.054125, 0.0525,0.0515,0.057875,0.055,0.05375,0.046375, 0.04775,0.0485,0.050125,0.050875,0.04925,0.049125, 0.0465,0.04975,0.053375,0.05225,0.0475,0.046375, 0.05375,0.049875,0.049875,0.047375,0.049125,0.049375, 0.04875,0.048125,0.05075,0.0505,0.046375,0.047375, 0.048625,0.0485,0.047125,0.052625,0.051125,0.04725, 0.050875,0.053875,0.0475,0.0495,0.051,0.055, 0.053,0.050125,0.04675,0.05375,0.054375,0.04725, 0.046875,0.04925,0.04725,0.0495,0.05075,0.050875, 0.04775,0.05125,0.050125,0.047875,0.04825,0.046625, 0.0475,0.046375,0.04775,0.05075,0.048125,0.046375, 0.049625,0.0495,0.04675,0.046625,0.0475,0.04825, 0.053,0.050875,0.049,0.057875,0.058875,0.049875, 0.049125,0.0475,0.05225,0.055,0.055375,0.053875, 0.051125,0.049875,0.05025,0.050875,0.049,0.0575, 0.051875,0.049375,0.04775,0.051125,0.050375,0.0465, 0.047375,0.0465,0.046375,0.048875,0.051875,0.047, 0.047125,0.047125,0.046875,0.049625,0.048625,0.051, 0.049,0.046375,0.049,0.056125,0.054625,0.047625, 0.046625,0.0475,0.051875,0.05175,0.047625,0.050375, 0.055125,0.05275,0.047125,0.05325,0.060125,0.056625, 0.053,0.052125,0.047125,0.04825,0.050375,0.05025, 0.048,0.046625,0.047125,0.04875,0.047,0.05525, 0.0535,0.047,0.0495,0.0535,0.05125,0.046625, 0.0495,0.04675,0.04875,0.047125,0.04975,0.047, 0.049875,0.046875,0.047125,0.048,0.046375,0.0495, 0.04975,0.05125,0.048375,0.049125,0.0515,0.048375, 0.052375,0.051125,0.046375,0.047125,0.050375,0.0465, 0.052375、0.05375、0.04925、0.05025、0.0565、0.054875, 0.048,0.049375,0.052625,0.055375,0.053375,0.05075, 0.048875,0.05475,0.05075,0.0485,0.049125,0.0475, 0.047375,0.047375,0.047,0.052125,0.053875,0.049, 0.052625,0.0485,0.04675,0.04875,0.05,0.0545, 0.05025、0.0495、0.0515、0.0485、0.05025、0.0465, 0.0465,0.048375,0.06375,0.10175,0.11975,0.118375, 0.121375,0.12675,0.123,0.095375,0.055,0.05525, 0.04775,0.053125,0.052375,0.056625,0.0565,0.046875, 0.048,0.05175,0.048,0.052,0.048,0.048, 0.05175,0.05025,0.049625,0.049625,0.047375,0.046625, 0.052375,0.0555,0.051375,0.050625,0.052375,0.050125, 0.048,0.052125,0.052125,0.0495,0.048875,0.048, 0.049875,0.051125,0.050625,0.048,0.0465,0.048, 0.04675,0.050875,0.048,0.046625,0.0495,0.050375, 0.046625,0.0515,0.049875,0.049625,0.04675,0.049125, 0.05025,0.050375,0.04725,0.047625,0.047,0.051625, 0.0485,0.05225,0.046875,0.0475,0.04825,0.050375, 0.05725,0.052375,0.048,0.046375,0.0475,0.0495, 0.047875,0.046375,0.049875,0.046875,0.048,0.046875, 0.048625,0.047125,0.046625,0.05,0.048875,0.04675, 0.050125、0.05425、0.051375、0.050125、0.053375、0.052, 0.053875,0.048,0.05575,0.049875,0.052125,0.048875, 0.047375,0.048875,0.049125,0.047375,0.047375,0.047625, 0.0495,0.04825,0.047875,0。04875,0.054,0.052125, 0.051,0.046625,0.04925,0.05075,0.054375,0.0555, 0.051625,0.046625,0.052125,0.055875,0.047,0.053875, 0.050875,0.0505,0.0465,0.053125,0.050875,0.050625, 0.051125,0.050875,0.056875,0.04925,0.050625,0.054125, 0.056625,0.05025,0.0465,0.04675,0.049625,0.047, 0.048375,0.047125,0.04875,0.048375,0.048875,0.04775, 0.04775、0.047、0.052125、0.050875、0.054、0.058375, 0.054,0.049125,0.04675,0.051875,0.05425,0.050125, 0.04675,0.047625,0.046375,0.05275,0.053,0.04875, 0.049125,0.047125,0.049375,0.0475,0.051125,0.0495, 0.052375、0.047、0.047125、0.050875])


  [1]: https://i.imgur.com/OeKzvrb.png
  [2]: https://i.imgur.com/ALtba5F.png

Tags: tofftbintime时间lengthfileprint
1条回答
网友
1楼 · 发布于 2024-09-30 14:20:22

问题1:

使用specshow时需要指定采样率:

librosa.display.specshow(stft, x_axis='time', y_axis='log', sr=sr)

否则将使用默认值(22050赫兹)(参见docs)。在

问题2:

librosa.core.frames_to_time不将stft[0]作为参数,它将是第一帧的频率单元。相反,它将帧数作为第一个参数。在

假设你有一个sr=10000赫兹的音频信号。然后使用n_fft=2000hop_length=1000对其运行STFT。然后每跳一帧,跳长0.1s,因为10000个样本对应1s,1000个样本(1跳)对应0.1s

stft[0]是帧编号。相反,第一个stft(1 + n_fft/2, t)的形状(参见here)。这意味着第一个维度是频率bin,第二个维度是帧编号(t)。在

因此,stft中的帧总数是stft.shape[1]。 要获取源音频的长度,可以执行以下操作:

^{pr2}$

相关问题 更多 >