22. Mai 2013

Using Google's Speech Recognition Web Service with Python

Google powers a mostly undocumented web service for speech recognition. The web service accepts audio data and returns a transcription. Here is a way to communicate with the web service via HTTPS POST and Python.

The reverse engineering has already been done in this tutorial. I received the hint from a friendly fellow student. Here is a Python script doing the same job. Note that the web service (see the demo page) accepts audio in the FLAC format. Use the flac program in order to convert wave to flac.

#!/usr/bin/env python2
# -*- coding: utf-8 -*-

import httplib
import json
import sys

def speech_to_text(audio):
    url = "www.google.com"
    path = "/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en"
    headers = { "Content-type": "audio/x-flac; rate=16000" };
    params = {"xjerr": "1", "client": "chromium"}
    conn = httplib.HTTPSConnection(url)
    conn.request("POST", path, audio, headers)
    response = conn.getresponse()
    data = response.read()
    jsdata = json.loads(data)
    return jsdata["hypotheses"][0]["utterance"]

if __name__ == "__main__":
    if len(sys.argv) != 2 or "--help" in sys.argv:
        print "Usage: stt.py <flac-audio-file>"
        with open(sys.argv[1], "r") as f:
            speech = f.read()
        text = speech_to_text(speech)
        print text

Download the file: File: stt.py [836.00 B]
Download: 5920