top of page
Writer's pictureSunil D Shashidhara

Hot word detection using PocketSphinx

Updated: Feb 22, 2019

What if you want your voice assistant system to start listening on saying "Sussi, play 'We are the champions'"? Imagine you need to analyze an audio file and count how many times Trump says “billions”




You need hot word detection! In the above examples, "Sussi" and “billions” are hot words.

Okay, great! How do you build something like this? Full blown speech recognition? Nah, that’s too compute intensive. Welcome, PocketSphinx!


PocketSphinx

PocketSphinx is a lightweight speech recognition engine developed by CMU, that offers a wide range of functionalities, here we concentrate on the detection of hot word.


PocketSphinx has the ability to detect multiple hot words. You can specify the hot words you are interested in along with the phonemes and sensitivity for each hot word. It can also detect hot words that are not part of the English dictionary as long as you provide the phonetic description of the word

# hot word /threshold/

Alex /1e-40/

Sussi /1e-30/


Installation

Detailed instructions for installation (https://github.com/cmusphinx/pocketsphinx-python)


PocketSphinx includes Python support, however, it is based on Automake and not well supported on Windows.

pip install pocketsphinx


If you are using a Raspberry Pi or other ARM based boards, you may need to install swig additionally.

sudo apt install swig


PocketSphinx uses pyAudio to connect and listen to audio stream

pip install PyAudio


Code Snippets

Create a hotword dictionary file - hotwords.dict

shop SH AA P

assist AH S IH S T


You can specify the key phrase of your choice but make sure it has proper phonemes. You can consider generating phonemes using G2PModel


Imports

from pocketsphinx import Decoder

import pyaudio


Start a pyaudio stream

p = pyaudio.PyAudio()

stream = p.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=20480)

stream.start_stream()


Configuration

model_dir = os.path.join(pocketsphinx_dir, 'model')

ps_config = Decoder.default_config()

ps_config.set_string('-hmm', path.join(model_dir, 'en-us/en-us'))

ps_config.set_string('-dict', path.join(model_dir, 'en-us/hotwords.dict'))

ps_config.set_string('-keyphrase', 'shop assist')

ps_config.set_float('-kws_threshold', 1e-30)


The decoder needs to be configured with a language model, dictionary containing phonemes for key word and the key word along with threshold.


Setting threshold can play an important role in detection. Threshold ranges from 1e-1 to 1e-50. For shorter key phrases you can use larger thresholds like 1e-1, for longer key phrases the threshold must be smaller, up to 1e-50. Threshold must be tuned in order to balance between false alarms (false positive) and missed detection (false negative). High threshold can lead to false negatives and low thresholds can lead to false positives.


Start decoder

decoder = Decoder(ps_config)

decoder.start_utt()


In a forever loop, read the input stream using pyAudio and process the raw data in chunks of size 1024 frames. Check for hotword, if detected, proceed to perform the cool tasks :)

while True:

buf = stream.read(1024)

if buf:

decoder.process_raw(buf, False, False)

else:

break


if decoder.hyp() is not None:

print([(seg.word, seg.prob, seg.start_frame, seg.end_frame) for seg in decoder.seg()])

print("Detected %s at %s" % (decoder.hyp().hypstr, str(datetime.now().time())))

decoder.end_utt()

# PERFORM COOL TASK


Adjust threshold value and phonemes if you observe false positives or false negatives.


References


Upcoming Blog

Detection of words in audio files by auto generating phonemes and setting the threshold by fine tuning


Thanks to our Cranky Coder Balasubramanyam S for the contribution :)

2,743 views0 comments

Recent Posts

See All

Comments


bottom of page