使用websockets连接到Watson SpeechtoText API以进行实时转录

2024-06-26 04:04:33 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试编写一个脚本来调用Watson Speech-to-Text(STT)API,以便通过麦克风逐字逐字地连续记录语音。我读到这应该可以使用Websockets版本的API。在

我有一个Python脚本,应该能够在Linux上实现这一点(假设安装了依赖项),但是,它在macosx上不起作用

from ws4py.client.threadedclient import WebSocketClient
import base64, json, ssl, subprocess, threading, time

class SpeechToTextClient(WebSocketClient):
    def __init__(self):
        ws_url = "wss://stream.watsonplatform.net/speech-to-text/api/v1/recognize"

        username = "your username"
        password = "your password"
        auth_string = "%s:%s" % (username, password)
        base64string = base64.encodestring(auth_string).replace("\n", "")

        self.listening = False

        try:
            WebSocketClient.__init__(self, ws_url,
                headers=[("Authorization", "Basic %s" % base64string)])
            self.connect()
        except: print "Failed to open WebSocket."

    def opened(self):
        self.send('{"action": "start", "content-type": "audio/l16;rate=16000"}')
        self.stream_audio_thread = threading.Thread(target=self.stream_audio)
        self.stream_audio_thread.start()

    def received_message(self, message):
        message = json.loads(str(message))
        if "state" in message:
            if message["state"] == "listening":
                self.listening = True
        print "Message received: " + str(message)

    def stream_audio(self):
        while not self.listening:
            time.sleep(0.1)

        reccmd = ["arecord", "-f", "S16_LE", "-r", "16000", "-t", "raw"]
        p = subprocess.Popen(reccmd, stdout=subprocess.PIPE)

        while self.listening:
            data = p.stdout.read(1024)

            try: self.send(bytearray(data), binary=True)
            except ssl.SSLError: pass

        p.kill()

    def close(self):
        self.listening = False
        self.stream_audio_thread.join()
        WebSocketClient.close(self)

try:
    stt_client = SpeechToTextClient()
    raw_input()
finally:
    stt_client.close()

理想情况下,我甚至不会用Python来做这件事,但是R是我的母语,无论如何,我都要把结果传回来处理。在

有谁能给我一个解决方案,我可以得到一个流式转录?在


Tags: toselfclientmessageclosestreamdefusername
2条回答

关于如何使用R实现这一点的一些好例子,请查看ryananderson的这些很棒的博客文章。在

Ryan在R和Watson API上做了很多工作—他分享了他在blog方面的很多知识。在

不确定这个答案是否正是您想要的,但听起来像是参数continuous的问题。在

如您所见,lib Python SDK位于Watson开发人员云中。在

您可以使用安装:pip install watson-developer-cloud

import json
from os.path import join, dirname
from watson_developer_cloud import SpeechToTextV1

speech_to_text = SpeechToTextV1(
    username='YOUR SERVICE USERNAME',
    password='YOUR SERVICE PASSWORD',
    x_watson_learning_opt_out=False
)

print(json.dumps(speech_to_text.models(), indent=2))

print(json.dumps(speech_to_text.get_model('en-US_BroadbandModel'), indent=2))

with open(join(dirname(__file__), '../resources/speech.wav'),
          'rb') as audio_file:
data = json.dumps(speech_to_text.recognize(audio_file, content_type='audio/wav', timestamps=False, word_confidence=False, continuous=True), indent=2)
print(data)

Obs.:服务返回array个结果,每个语句一个。在

#L44行中,有您可以使用的params,因此,对于连续转录,您需要使用参数continuous,并像上面的例子一样设置为true。在

  • 请参阅Official Documentation讨论Websockets以保持连接的有效性。(也许这就是你需要的)。在

相关问题 更多 >