This is Part 2 of the main article: Stop taking notes! Audio record what you need and receive a transcript via email.
In this article, we will discuss how to set up and use the Speech-To-Text API. To get started, let’s follow the instructions provided in the link below.
Let’s begin with pip3 install — upgrade google-cloud-speech.
- Create a project on the Google platform and set up a service account:
2. Click create. A .json file will get generated with credentials.
3. Copy the example of the code below under the Python tab:
import io
import os# Imports the Google Cloud client library
from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types# Instantiates a client
client = speech.SpeechClient()# The name of the audio file to transcribe
file_name = os.path.join(
os.path.dirname(__file__),
'resources',
'audio.raw')# Loads the audio into memory
with io.open(file_name, 'rb') as audio_file:
content = audio_file.read()
audio = types.RecognitionAudio(content=content)config = types.RecognitionConfig(
encoding=enums.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=16000,
language_code='en-US')# Detects speech in the audio file
response = client.recognize(config, audio)for result in response.results:
print('Transcript: {}'.format(result.alternatives[0].transcript))
When I first tried to run the code, I got this message: google.auth.exceptions.DefaultCredentialsError: Could not automatically determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS or explicitly create credentials and re-run the application. For more information, please see https://cloud.google.com/docs/authentication/getting-started
We need to make some adjustments in the Instantiates a client section. To be more precise, we need to set up a service account with credentials created in Step 2, to communicate with the Speech-To-Text API.
Import service_account from google.oauth2 and initialize the credentials using that service account:
from google.oauth2 import service_account# Instantiates a client
my_credentials = service_account.Credentials.from_service_account_file(path_to_your_json_credential_file)client = speech.SpeechClient(credentials=my_credentials)
4. Let’s take a look at The name of the audio file to transcribe section:
# The name of the audio file to transcribe
file_name = os.path.join(
os.path.dirname(__file__),
'resources',
'audio.raw')
To respect this code, we have to create the resources folder inside of our project and add an audio file named audio.raw into it. Before we do that, let’s explore the audio formats supported by the Speech-To-Text API using the link below: https://cloud.google.com/speech-to-text/docs/encoding.
For some reason, the .raw file format is not listed in the supported formats, so for my example, I will use .flac format. Change ‘audio.raw’ to ‘audio.flac’.
6. In loads the audio into memory section, modify the audio encoding format from LINEAR16 to FLAC. Increase the rate from 16000 to 44100 hertz and modify the language code based on the language in your audio file. In my case it is fr-CA.
You can find the entire list of supported languages here : https://cloud.google.com/speech-to-text/docs/encoding
# Loads the audio into memory
config = types.RecognitionConfig(
encoding=enums.RecognitionConfig.AudioEncoding.FLAC,
sample_rate_hertz=44100,
language_code='fr-CA')
7. Run your project.
You will probably receive this message because Google Speech-To-Text API is not activated by default:
google.api_core.exceptions.PermissionDenied: 403 Cloud Speech-to-Text API has not been used in project 946086898056 before or it is disabled. Enable it by visiting https://console.developers.google.com/apis/api/speech.googleapis.com/overview?project=946086898056 then retry. If you enabled this API recently, wait a few minutes for the action to propagate to our systems and retry.
6. Activate the Speech-To-Text API for your project created in Step 1.
7. Run your project again.
If your file is smaller than 10485760 bytes you are done, the text return from API will appear on your console. Yay!
In my case, the audio file was bigger than that size and the following message appeared: google.api_core.exceptions.InvalidArgument: 400 Request payload size exceeds the limit: 10485760 bytes.
To fix that problem, we need to create a bucket on Google Cloud Storage, upload our audio file and read it from our Python code. Let’s do that in the next steps.
“Read and Convert to Text” audio file over 10485760 bytes.
1. Go to the Google Cloud Platform and create a new bucket.
2. Once your bucket is created, upload your audio file.
3. Select your file and copy the URI of your file.
4. Here is how the code for a long “Read and Convert to Text” audio file looks like:
credentials_path = self.credentials_path
my_credentials = service_account.Credentials.from_service_account_file(credentials_path)
client = speech.SpeechClient(credentials=my_credentials)config = types.RecognitionConfig(
encoding=enums.RecognitionConfig.AudioEncoding.FLAC,
sample_rate_hertz=44100,
language_code='fr-CA')storage_uri = "URI_content_from_step_3"
audio = {"uri": storage_uri}
operation = client.long_running_recognize(config, audio)
response = operation.result()for result in response.results:
print('Transcript: {}'.format(result.alternatives[0].transcript))
That’s it! Now we can generate a text from an audio file that is bigger than 10485760 bytes.
In the next article, we will discuss how to create and use a scheduler in Python.