与python语音聊天

2024-10-01 09:30:50 发布

您现在位置:Python中文网/ 问答频道 /正文

我用pyaudio创建了非常简单的语音聊天,但是声音有点像meh。你通常会听到一些像老电影里那样的噪音。这可能是由于我通过UDP发送的语音块丢失造成的。有没有可能降低噪音? 此外,我想播放一个声音,当用户移动到一个按钮,但由于某些原因,这是不可能合并这两个轨道(音效和声音)!在

这是最重要的课程“声音”。它在线程中运行,所以可以在永久循环中运行。在

import numpy as np
import pyaudio
import wave      # I play the buttons effects from wav file

class Sound():
    WIDTH = 2
    CHANNELS = 2
    RATE = 44100

    def __init__(self, parent = None):
        super(Sound, self).__init__(parent)
        self.voiceStreams= []
        self.effectStreams= []
        self.vVolume= 1
        self.eVolume= 0.5
        self.voip = None

        self.p = pyaudio.PyAudio()
        self.stream = self.p.open(format = self.p.get_format_from_width(Sound.WIDTH),
                        channels = Sound.CHANNELS,
                        rate = Sound.RATE,
                        input = True,
                        output = True,
                        #stream_callback = self.callback
                        )


        self.nextSample = ""
        self.lastSample = ""
        self.stream.start_stream()

    def run(self):
        while True:
            self.myCallback()


    def myCallback(self):
        _time = time.clock()

        if self.nextSample:
            self.stream.write(self.nextSample) 
            self.lastSample = self.nextSample

        elif self.lastSample:   # I have got some crazy idea, that when there are no data (because UDP doesnt deliver them) I could play the last data, so nobody hears the short silence noise
            self.stream.write(self.lastSample)
            self.lastSample = ""

        _time = time.clock()
        #print ("{0:d}  ---- {1:d} --- timeWrite: {2:.1f}".format(len(self.voiceStreams), self.stream.get_read_available(), (time.clock() - _time)* 1000)   , end = "   ")

        if self.stream.get_read_available() > 1023:
            mic = self.stream.read(1024)
        else:
            mic = ""

        #print ("timeRead: {0:.1f}".format(  (time.clock() - _time)* 1000)  , end = "   ")

        if mic and self.voip: self.voip.sendDatagram(mic)             #This sends the CHUNK of sound to my UDP client
        _time = time.clock()


        data = np.zeros(2048, np.int64)

        length = len(self.voiceStreams)         # I read voice data
        l1 = length

        for i in range(length):
            s = self.voiceStreams.pop(0)
            data += s / length * self.vVolume * 0.4 # Here i merge multiple voices with numpy. I also reduce the volume of each voice based on how voices I have...
        length = len(self.effectStreams)        
        toPop= []                               # Here i hold indexes of effects which ended playing

        for i in range(length):
            s = self.effectStreams[i].readframes(1024)
            if s == "":                         # If there are no data to play
                toPop.append(i - len(toPop))
            else:
                d = np.fromstring(s, np.int16)
                # Sadly enough each numpy must have same length, so if I get to the end of track, which has only length of 1500 I must throw that away, because numpy doesnt allow me to merge it with array of length 2048
                if len(d) > 2047:               # And again I merge the sounds with numpy and I reduce the volume
                    data += (d/ length * length)  * self.eVolume * 0.3
        for i in toPop:     # If I am at the end of track, I delete it
            del self.effectStreams[i]


        if np.any(data):        # If there are any data to read
            self.nextSample = data.astype(np.int16).tostring()  #I prepare the next CHUNK (should be 20 ms, but I am not sure)
        else:
            self.nextSample = ""
        #print ("timeRest: {0:.1f}".format(  (time.clock() - _time)* 1000), end = "    ||  ")
        print("HOW MANY CHUNKS OF VOICE I GOT: ", l1)
        # It is weird, that when i print the times of reading and writing to stream, it usually prints something like this: (20ms, 20ms, 30ms, 20ms, 20ms, 30ms, 20ms ...)


    def close(self):
        self.timer.stop()
        self.stream.stop_stream()
        self.stream.close()
        self.p.terminate()

UDP服务器和客户端非常简单(它们工作得很好,所以我不在这里发布它们)。客户机只将所有数据发送到服务器,而服务器在获得任何数据时将所有数据发送给所有客户机。我不告诉任何人,谁发的数据。这意味着如果数据传输太晚,我将同时播放来自一个客户机的两个数据块(因为我认为它们来自多个客户机)!在

以下是wav文件:Dropbox repository 我没有创建它们,我是从站点http://www.freesound.org/people/ERH/sounds/31135/下载的,它们是在属性下授权的!在

!!我还加了一句输出.txt在dropbox文件中,显示了在两个人之间运行这个例子时python输出了什么(我只从一个用户那里获得语音数据)。在

谢谢你的建议。在


Tags: oftheto数据selfnumpydatastream