所以我想得到代表两个歌手声音的2D数组之间的相似性指数。
我用pydub
读我的mp3文件。
AudioFunctions.py
from pydub import AudioSegment
class SongData():
def __init__(self, path):
self.audio = AudioSegment.from_file(path).set_channels(1)
self.rate, self.data = self.audio.frame_rate, np.array(self.audio.get_array_of_samples())
self.length = len(self.data)
self.duration = self.audio.duration_seconds
self.time = np.linspace(0, self.duration, self.length)
self.freq = np.linspace(0, self.rate / 2, int(self.length / 2))
self.fftArray = fft(self.data)
self.fftArrayPositive = self.fftArray[:self.length // 2]
self.fftArrayNegative = np.flip(self.fftArray[self.length // 2:])
self.fftArrayAbs = np.abs(self.fftArray)
self.fftPlotting = self.fftArrayAbs[: self.length // 2]
def song2data(path):
songClass = SongData(path)
return songClass
def getFirstData(songArr, time):
selectedData = songArr[:int(time*44100)]
return selectedData
这是我的代码,用来获取两首歌曲的数据和它们的光谱图
main.py
from AudioFunctions import *
from scipy import signal
import matplotlib.pyplot as plt
import librosa
import sklearn
from scipy import spatial
from sklearn.metrics.pairwise import cosine_similarity
songClass1 = song2data("sia1.mp3")
songClass2 = song2data("sia2.mp3")
# print(songClass.data)
# print(songClass.rate)
# print(songClass.duration)
# print(songClass.length)
songArray = getFirstData(songClass1.data, 120)
songArray2 = getFirstData(songClass2.data, 120)
frequencies, times, spectrogram = signal.spectrogram(songArray, 44100)
frequencies2, times2, spectrogram2 = signal.spectrogram(songArray2, 44100)
# print(frequencies)
spec = spectrogram.flatten()
spec2 = spectrogram2.flatten()
result = 1 - spatial.distance.cosine(spec, spec2)
print(result)
结果表示两种声音之间的相似性指数。然而,当比较同一个歌手(Sia)的两首歌曲时,我得到的数字很低(0.133)
歌曲1:https://drive.google.com/file/d/1svV0Ry_lNaEA9Z8c61t3S25XCSgIv6sW/view
歌曲2:https://drive.google.com/file/d/1ToKQo2MERBbxZezqDcEtEE2dhgmH-wus/view
我的逻辑有问题吗?或者这个结果在某些情况下是合乎逻辑的? 提前谢谢
目前没有回答
相关问题 更多 >
编程相关推荐