Tuesday, March 8, 2022

STT - VOSK [ok] and Kaldi [fail]

nerd-dictation :: kaldi :: arch pulse examples :: nerd-dictation user

Nerd-dictation can currently be massaged into excellent functionalily using config tweaks and a script, but remains unpolished (esp startup/stop). Installation pulls-in a Python VOSK API and a keyboard voice simulator, xdotool. Dependencies include VOSK language models and PulseAudio. A full PulseAudio install (not just libpulse) is required due to nerd-dictation's pa_context_connect() calls.

1. nerd-dictation (usage)

quick list

  1. plug-in mic and $ arecord -l for mic verification
  2. $ pavucontrol, set the mic levels
  3. $ pactl list sources get the device name needed for nerd-dictation after "Name", eg, alsa_input.hw_1_0
  4. run script or $ nerd-dictation [options]
  5. speak as long as desired. Assign an exit hotkey, or assign so many seconds of silence to timeout close.

quirks

The application has to exit natively, not CTRL-C, or the output file won't be written.

scripts, conf file

We'll want an SH script to avoid avoid long commands. A simple example that would exit 10 seconds after a person stops talking.

#!/bin/bash

# created as 644, chmod 744 so only user can execute
# run with "sh startdoc.sh" or "bash startdoc.sh"
# todo: sequential text naming

nerd-dictation begin \
--pulse-device-name=alsa_input.hw_1_0 \
--timeout=10 \
--output=STDOUT \
1> doctate.txt

In addition to a script, a conf file (~/.config/nerd-dictation/nerd-dictation.py) is where to handle captitalization, substitutions, and so on. It's slightly delicate: startup pipe errors will arise if the config file is blank, or if it contains incorrect Python syntax.

2. nerd-dictation (install) 1 hr +/-

  1. pulseaudio
  2. binary install (AUR or Git)
  3. vosk language model
  4. possible scripts and conf file (see sect 1 above)

pulseaudio (full install)

Unfortunately, nerd-dictation needs a full PulseAudio install to handle its pa_context_connect() calls. Here's what the fail looks like if PA is not installed...

$ nerd-dictation begin
Connection failure: Connection refused
pa_context_connect() failed: Connection refused
The lightest complete install method I've found -- about 20Mb, 18 of that the optional pavucontrol -- and which also verifies proper ALSA interaction
# pacman -S pulseaudio-alsa pavucontrol

how install the app/binary?

$ yay -S nerd-dictation-git
:: Checking for conflicts...
:: Checking for inner conflicts...
[Repo:1] xdotool-3
[Aur:2] python-vosk-bin nerd-dictation-git

:: (1/2) Downloaded PKGBUILD: python-vosk-bin
:: (2/2) Downloaded PKGBUILD: nerd-dictation-git
2 python-vosk-bin
1 nerd-dictation-git

where is the binary installed?

$ which nerd-dictation
/usr/bin/nerd-dictation

where is VOSK model and where installed?

$ strace nerd-dictation begin 2>&1 | tee wheevosk.txt
[snip]
newfstatat(AT_FDCWD, "/home/foo/.config/nerd-dictation/model", 0x7ffe388937c0, 0) = -1 ENOENT (No such file or directory)
write(2, "Please download the model from h"...,
128Please download the model from https://alphacephei.com/vosk/models
and unpack it to '/home/foo/.config/nerd-dictation/model'.
...so that...
$ cd .cache
mkdir nerd-dictation
mkdir model wget https://alphacephei.com/kaldi/models/vosk-model-small-en-us-0.15.zip
... unzip and transfer folders and files to ~/.config/nerd-dictation/model.

3. failure mode (kaldi)

Installing kaldi is apparently two packages off the AUR.
$ yay -S kaldi
:: Checking for conflicts...
:: Checking for inner conflicts...
2 kaldi-openfst
1 kaldi
But it appears there's a version issue. We're installing 1.7.2 but it's looking for 1.6.7 at some points.
extras/check_dependencies.sh
extras/check_dependencies.sh: Intel MKL does not seem to be installed.
... Run extras/install_mkl.sh to install it. Some distros (e.g., Ubuntu 20.04) provide
... a version of MKL via the package manager, but verify that it is up-to-date.
... You can also use other matrix algebra libraries. For information, see:
... http://kaldi-asr.org/doc/matrixwrap.html
rm -f openfst
ln -s openfst-1.7.2 openfst
==> Entering fakeroot environment...
==> Starting package()...
cp: cannot stat '/home/foo/.cache/yay/kaldi-openfst/src/kaldi-master/tools/openfst-1.6.7/bin': No such file or directory
cp: cannot stat '/home/foo/.cache/yay/kaldi-openfst/src/kaldi-master/tools/openfst-1.6.7/include': No such file or directory
cp: cannot stat '/home/foo/.cache/yay/kaldi-openfst/src/kaldi-master/tools/openfst-1.6.7/lib': No such file or directory
cp: cannot stat '/home/foo/.cache/yay/kaldi-openfst/src/kaldi-master/tools/openfst-1.6.7/Makefile': No such file or directory
==> ERROR: A failure occurred in package().
Aborting...
-> error making: kaldi-openfst
Let's see what's in the directory -- can we even iron-out the discrepancy?
$ cd /home/foo/.cache/yay/kaldi-openfst/src/kaldi-master/tools/
$ ls -l

total 1280
drwxr-xr-x 3 foo foo 4096 Mar 7 04:43 ATLAS_headers
drwxr-xr-x 2 foo foo 4096 Mar 7 04:43 CLAPACK
-rw-r--r-- 1 foo foo 1206 Mar 7 04:43 INSTALL
-rw-r--r-- 1 foo foo 6817 Mar 7 04:43 Makefile
drwxr-xr-x 2 foo foo 4096 Mar 7 04:43 config
drwxr-xr-x 2 foo foo 4096 Mar 7 04:43 extras
lrwxrwxrwx 1 foo foo 29 Mar 7 04:43 install_pfile_utils.sh -> extras/install_pfile_utils.sh
lrwxrwxrwx 1 foo foo 27 Mar 7 04:43 install_portaudio.sh -> extras/install_portaudio.sh
lrwxrwxrwx 1 foo foo 23 Mar 7 04:43 install_speex.sh -> extras/install_speex.sh
lrwxrwxrwx 1 foo foo 23 Mar 7 04:43 install_srilm.sh -> extras/install_srilm.sh
lrwxrwxrwx 1 foo foo 13 Mar 7 19:23 openfst -> openfst-1.7.2
drwxr-xr-x 7 foo foo 4096 Mar 7 19:19 openfst-1.7.2
-rw-r--r-- 1 foo foo 1269292 Jul 17 2019 openfst-1.7.2.tar.gz
drwxr-xr-x 2 foo foo 4096 Mar 7 18:54 python
No, we see that we cannot iron-out the problem. The softlink from "openfst" is working correctly to install the latest version, however there's no way to get around the version check which will always yield an error when the program starts. So there's no installing manually using
# pacman -U [package.tar.gz]

No comments: