"X"...in a box: March 2022

Saturday, March 26, 2022

video -- text

Side note: to turn off annoying inactivity blanking, say when using a non-VLC player...

$ xset s -dpms

And then to restore it (if desired)....

$ xset s dpms

Some ppl find they have to add "$ xset s" entries "noblank" and/or "off", that is to use all three. In vanilla Arch for example, all three are required

This post is a trail of crumbs for incorporating graphic, moving text into video -- a long term, continually evolving project. There are also some screenwriting notes at the bottom, since scripts are sometimes scrolled or portions used for graphics, etc.This top portion reviews subtitles, and a previous post (1/2021), wherein I addressed the basics of static titles or labels. The subtitle information includes both user-selected, and forced subtitles. So there's a lot to cover this post: subtitles, basic labeling, graphic text.

subtitles

Subtitles can be added to videos in an optional or forced capacity. Here's a page.

There are many subtitle formats, however we want our player to be able to use them. I see these most often and they have somewhat different applications.

SRT from the SubRip days. This is the most common but appears to only have bold, italic and underline.
ASS the most expressive, according to the last post in this thread. Fonts colors, etc. I've never used it.

extract subtitles

Get the subtitle (.SRT) file from the video and look it over, update it, re-imbed it, etc

$ ffmpeg -i foo.mp4 somename.srt

To extract them from YouTube videos, where a person might have seen a foreign film can require more reading. Typically they're just in English and will come in a VTT file, received with the following:

$ youtube-dl --all-subs --skip-download [URL]

For playback viewing of the SRT or VTT file, the best bets are 1) VLC and 2) the subtitle file in the same folder as the video. When playing the video in VLC, find "Subtitle" in the VLC menu bar, and simply select the subtitle file.

For embedding the SRT or VTT file into the video itself, rendering is obviously necessary.

For extracting audio from a YT URL, which is a faster/smaller download, it's better to use yt-dlp. A description is here. For example, the post indicates that "0" is the best quality between 0-5. Knowing this, the download can be made smaller yet, with a lower bitrate.

$ yt-dlp -f bestaudio -x --audio-format mp3 --audio-quality 0 "URL"

Sometimes video downloads indicate they will be huge -- 5 or 6G or more. This happens when the video is 2 or 4K resolution. I'm typically satisfied with 720P however. When I encounter these immense downloads, I specify the lower resolution as described here. A much smaller file and a faster download.

$ yt-dlp -S res:720 "URL"

from prior post*

*The 1/2021 post. NB: Embedding text is a CPU intensive render, it's useful to verify system cooling is unobstructed.

To render one (or more) line of text, use the "drawtext" ffmpeg filter. Suppose the video date and time, in Cantarell font, in the upper left hand corner, is to be displayed for 6 seconds. We can use ffmpeg's simple filtergraph (noted by "vf"). 50 pt font should be sufficient size for 1920x1080 video.

$ ffmpeg -i video.mp4 -vf "[in]drawtext=fontfile=/usr/share/fonts/cantarell/Cantarell-Regular.otf:fontsize=50:fontcolor=white:x=100:y=100:enable='between(t,2,8)':text='Monday\, January 17, 2021 -- 2\:16 PM PST'[out]" videotest.mp4

Notice that a backslash must be added to escape special characters: Colons, semicolons, commas, left and right parens, and of course apostrophe's and quotation marks. For this simple filter, we can also omit the [in] and [out] labels. Here is a screenshot of how it looks during play.

Next, supposing we want to organize the text into two lines. We'll need one filter for each line. Since we're still only using one input file to get one output file, we can still use "vf", the simple filtergraph. 10pixels seems enough to separate the lines, so I'm placing the second line down at y=210.

$ ffmpeg -i video.mp4 -vf "drawtext=fontfile=/usr/share/fonts/cantarell/Cantarell-Regular.otf:fontsize=50:fontcolor=white:x=100:y=150:enable='between(t,2,8)':text='Monday\, January 18\, 2021'","drawtext=fontfile=/usr/share/fonts/cantarell/Cantarell-Regular.otf:fontsize=50:fontcolor=white:x=100:y=210:enable='between(t,2,8)':text='2\:16 PM PST'" videotest2.mp4

We can continue to add additional lines of text in a similar manner. For more complex effects using 2 or more inputs, this 2016 video is the best I've seen.

Ffmpeg advanced techniques pt 2 (19:29) 0612 TV w/NERDfirst, 2016. This discusses multiple input labeling for multiple filters.

PNG incorporation

If I wanted to do several lines of information, an easier solution than making additional drawtexts, is to create a template the same size as the video, in this case 1980x1080. Using, say GiMP, we could create picture with an alpha template with several ines that we might use repeatedly, and save in Drive. There is then an ffmpeg command to superimpose a PNG over the MP4.

additional options (scripts, text files, captions, proprietary)

We of course have other options for skinning the cat: adding calls to text files, creating a bash script, or writing python code to call and do these things.

The simplest use of a text files are calls from the filter in place of writing the text out each filter.

viddyoze: online video graphics option. They reender it on the site, but it's not a transparent overlay, but a 720p MP4.

viddyoze review (14:30) Jenn Jager, 2020. Unsponsored review. Explains most of the 250 templates. Renders to quicktime (if alpha), or MP4 is not.~12 minute renders

screenwriting

We of course need a LaTeX format, but then...

Answer these 6 Questions (14:56) Film Courage, 2021. About, want, get it, do about it, does/doesn't work, end.
PBX - on-site or cloud (35:26) Lois Rossman, 2016. Cited mostly for source 17:45 breaks down schematically.
PBX - true overhead costs (11:49) Rich Technology Center, 2020. Average vid, but tells hard facts. Asteriks server ($180) discussed.

Monday, March 14, 2022

paperhater - classification issues (minus database)

We want to organize electronic media as much as possible, without a database, for at least two reasons: 1) if we can reach file organization and "findability" goals without the database, then we've saved the expense, 2) if our situation ultimately requires a database, the need for one becomes more cleary defined. For this reason, the first steps are the same, database or no ultimate database.

overview

We begin with an entirely non-homogenous mess of files, a deck of scattered cards but without names or suits. Our pile includes everything from receipts, code, diplomas, research articles, manuals, correspondence, old emails, and on. Some of these files are active, some are reference, some are family history, and on.

first cut

Our first cleavage is between all the articles, texts, and manuals which might be used as a reference, or cited in a thesis, etc. Call this our library. The second pile (receipts, forms, photos) is... everything else. We'll overlay a universal file and folder naming convention to both categories. However libraries require an additional handling step due to standardized citation requirements.

There's help for both categories. 1) for library items, ISBN/ISSNs, Dewey Decimal(since 1876), and/or a Library of Congress classifications, already exist. We can organize these using BibTeX. 2) for everything else, think back to the paper era. Records management had proven ways to organize these. This process has become electronic records management (ISO 15489), and is just as helpful. See the first video below.

records basics (25:31) US National Archives Records Management, 2009 Doubling info every two years. 11:00 paper's been lost -- we used to know how to do it. sample basic structure. 17:00 She recommends a file plan.
file management (1:23:57) Nicholas Andre 2013. Windows-based lecture, but clear thinking fellow gives good context for what we're after. Backgrounder. Corning (NY) Community College course.

naming - files

3 part naming convention. Subject, date, code. These vary in order depending on what's most important to the user of that file.

file naming conventions (10:00) Simpletivity, 2018. Ad first 1:36. Uses 3 part naming, succinctly described. Probably best at .75 speed. Comments excellent.

naming - folders(directories)

Mostly the naming is the same as the files, but arranged vertically. 3 layer naming convention. Function, subfunction, action. Thinking of the deck of cards, we can arrange by suit, by number, by color. What's the fastest way to find a card if they're in folders? Might depend on my style of play. NARA notes that granularity of folders depends on number of docs for that folder.

folders - website (8:19) John Morris, 2018. Standard website folder organization.

plan - file & folder

This step is the combination of the decisions on naming of files and folders. Government suggested in General records schedule from the NARM website.

records basics (25:31) US National Archives Records Management, 2014. Donna Read. informatioin doubling every two years. 11:00 paper's been lost -- we used to know how to do it. sample basic structure. 17:00 She recommends a file plan.
file plan basics (47:44) US National Archives Records Management, 2013. Jeff Benson. staff consensus, record retention,
holding to center (10:46) Luke Smith, 2020. Part of sticking to a file plan is understanding that what works for a person can be useful.
federal social media (43:37) US National Archives Records Management, 2013. Bethany Cron. Federal records, con

A. non-research example

This is the thing for say, billings or other saved items.

B. research/library example

We rarely seem to get New Yorker articles directly related to Bay Area unless it's a controversial topic: presumably editors don't want to research on their home NYC turf and alienate locals. Let's say I want to keep one such article to cite later. The information panel.

Step 1 - gather info

Very difficult without a physical copy. There's no online index for magazine volume and numbers cross-referenced with date, at least that I've found. With a physical copy of the New Yorker, I can get the info -- ISSN 0028792X, Volume 98, No 4, Mar 14, 2022. NYC, NY. The article of interest in this example: "The Access Trap", is from Nathan Heller, pgs 34-45. I've scanned these pages into a PDF, as yet unnamed. If online, we may also find other control numbers we want to include. Eg, if we had a dissertation to cite, we know that, "There is no single source for a comprehensive dissertation search." We might encounter a different control number at different sites and want to include them.

Step 2 - PDF name - first cut

Revisit the PDF name and evaluate the 3 part name. It should include a date, name, and subject code, in a way that at least hints at the file contents. In this case, we might, eg

20220314_NY_LowellEquity.pdf

Step 3 - BibTeX it

Information for storage, retrieval, and citation (chicago, mla, apa) could come from a bibtex BIB file. Bibtex files can be massaged in the document for any citation format. The question we'll want to ask later is how many BIB files ? since we can't have just one immense BIB file for ever article we have -- and of course we need to create a custom one for any document we create -- we're going to run into meta-problems. In this case we'd "@article" template, minimally...

@article{uniquecitekey,
author = "Heller, Nathan",
title = "The Access Trap",
journal = "The New Yorker",
year = 2022,
volume = "98",
number = "4",
month = "Mar",
pages = "34--45",
issn = "0028-792X",
doi = "20220314_NY_LowellEquity.pdf",
note = "Lowell HS equity clause"
}

We probably don't need 11 fields in a database, but BibTeX has these readymade. Review the DOI and see that it corresponds with the filename -- make any adjustments.

apa style bibtex (2:53) Charles Clayton, 2016. Also does IEEE. Important about making certain filenames match. Probably best at .75 speed. Comments excellent.
latex review (59:42) Derek Banas, 2019. Typical Banas killer review. He uses "TexShop", which appears to have autocomplete
bibtex citation (7:38) Center of Math, 2015 Have to run it twice, like a table of contents.

Step 4 - filename and folder again

Review the filename, folder, and BibTex information. Or, if a random receipt, review retreival issues again, such as the folder depth (no more than 3) and the filename (gives some info).

C. database or not?

ISSN number searchable here, topic, Somtimes we have subtopics

Tuesday, March 8, 2022

STT - Kaldi [fail], VOSK, and DeepSpeech

nerd-dictation :: kaldi :: arch pulse examples :: nerd-dictation user

The usual issue is not enough CPU to handle off-line processing. The app has to be smart enough to slow down to that system, or it will crash. Crashes are common. The newest addition (2025) is Deep Speech, a CLI for Mozilla's version of tensorflow which seems to adapt to various system resources. A model is required, same as VOSK based nerd dictation, however deep speech seems less involved with PulseAudio, so...

Nerd-dictation can be massaged into functionalily using config tweaks and a script, but remains unpolished (esp startup/stop). Installation pulls-in a Python VOSK API and a keyboard voice simulator, xdotool. Dependencies include VOSK language models and PulseAudio. A full PulseAudio install (not just libpulse) is required due to nerd-dictation's pa_context_connect() calls.

1. deepspeech (install)

quick list

yay -S deepspeech-bin.
run script or $ nerd-dictation [options]
speak as long as desired. Assign an exit hotkey, or assign so many seconds of silence to timeout close.

quirks

The application has to exit natively, not CTRL-C, or the output file won't be written.

scripts, conf file

We'll want an SH script to avoid avoid long commands. A simple example that would exit 10 seconds after a person stops talking.

2. nerd-dictation (usage)

quick list

plug-in mic and $ arecord -l for mic verification
$ pavucontrol, set the mic levels
$ pactl list sources get the device name needed for nerd-dictation after "Name", eg, alsa_input.hw_1_0
run script or $ nerd-dictation [options]
speak as long as desired. Assign an exit hotkey, or assign so many seconds of silence to timeout close.

quirks

The application has to exit natively, not CTRL-C, or the output file won't be written.

scripts, conf file

We'll want an SH script to avoid avoid long commands. A simple example that would exit 10 seconds after a person stops talking.

#!/bin/bash

# created as 644, chmod 744 so only user can execute
# run with "sh startdoc.sh" or "bash startdoc.sh"
# todo: sequential text naming

nerd-dictation begin \
--pulse-device-name=alsa_input.hw_1_0 \
--timeout=10 \
--output=STDOUT \
1> doctate.txt

In addition to a script, a conf file (~/.config/nerd-dictation/nerd-dictation.py) is where to handle captitalization, substitutions, and so on. It's slightly delicate: startup pipe errors will arise if the config file is blank, or if it contains incorrect Python syntax.

3. nerd-dictation (install) 1 hr +/-

pulseaudio
binary install (AUR or Git)
vosk language model
possible scripts and conf file (see sect 1 above)

pulseaudio (full install)

Unfortunately, nerd-dictation needs a full PulseAudio install to handle its pa_context_connect() calls. Here's what the fail looks like if PA is not installed...

$ nerd-dictation begin
Connection failure: Connection refused
pa_context_connect() failed: Connection refused

The lightest complete install method I've found -- about 20Mb, 18 of that the optional pavucontrol -- and which also verifies proper ALSA interaction

# pacman -S pulseaudio-alsa pavucontrol

how install the app/binary?

$ yay -S nerd-dictation-git
:: Checking for conflicts...
:: Checking for inner conflicts...
[Repo:1] xdotool-3
[Aur:2] python-vosk-bin nerd-dictation-git

:: (1/2) Downloaded PKGBUILD: python-vosk-bin
:: (2/2) Downloaded PKGBUILD: nerd-dictation-git
2 python-vosk-bin
1 nerd-dictation-git

where is the binary installed?

$ which nerd-dictation
/usr/bin/nerd-dictation

where is VOSK model and where installed?

$ strace nerd-dictation begin 2>&1 | tee wheevosk.txt
[snip]
newfstatat(AT_FDCWD, "/home/foo/.config/nerd-dictation/model", 0x7ffe388937c0, 0) = -1 ENOENT (No such file or directory)
write(2, "Please download the model from h"...,
128Please download the model from https://alphacephei.com/vosk/models
and unpack it to '/home/foo/.config/nerd-dictation/model'.

...so that...

$ cd .cache
mkdir nerd-dictation
mkdir model wget https://alphacephei.com/kaldi/models/vosk-model-small-en-us-0.15.zip

... unzip and transfer folders and files to ~/.config/nerd-dictation/model.

3. failure mode (kaldi)

Installing kaldi is apparently two packages off the AUR.

$ yay -S kaldi
:: Checking for conflicts...
:: Checking for inner conflicts...
2 kaldi-openfst
1 kaldi

But it appears there's a version issue. We're installing 1.7.2 but it's looking for 1.6.7 at some points.

extras/check_dependencies.sh
extras/check_dependencies.sh: Intel MKL does not seem to be installed.
... Run extras/install_mkl.sh to install it. Some distros (e.g., Ubuntu 20.04) provide
... a version of MKL via the package manager, but verify that it is up-to-date.
... You can also use other matrix algebra libraries. For information, see:
... http://kaldi-asr.org/doc/matrixwrap.html
rm -f openfst
ln -s openfst-1.7.2 openfst
==> Entering fakeroot environment...
==> Starting package()...
cp: cannot stat '/home/foo/.cache/yay/kaldi-openfst/src/kaldi-master/tools/openfst-1.6.7/bin': No such file or directory
cp: cannot stat '/home/foo/.cache/yay/kaldi-openfst/src/kaldi-master/tools/openfst-1.6.7/include': No such file or directory
cp: cannot stat '/home/foo/.cache/yay/kaldi-openfst/src/kaldi-master/tools/openfst-1.6.7/lib': No such file or directory
cp: cannot stat '/home/foo/.cache/yay/kaldi-openfst/src/kaldi-master/tools/openfst-1.6.7/Makefile': No such file or directory
==> ERROR: A failure occurred in package().
Aborting...
-> error making: kaldi-openfst

Let's see what's in the directory -- can we even iron-out the discrepancy?

$ cd /home/foo/.cache/yay/kaldi-openfst/src/kaldi-master/tools/
$ ls -l
total 1280
drwxr-xr-x 3 foo foo 4096 Mar 7 04:43 ATLAS_headers
drwxr-xr-x 2 foo foo 4096 Mar 7 04:43 CLAPACK
-rw-r--r-- 1 foo foo 1206 Mar 7 04:43 INSTALL
-rw-r--r-- 1 foo foo 6817 Mar 7 04:43 Makefile
drwxr-xr-x 2 foo foo 4096 Mar 7 04:43 config
drwxr-xr-x 2 foo foo 4096 Mar 7 04:43 extras
lrwxrwxrwx 1 foo foo 29 Mar 7 04:43 install_pfile_utils.sh -> extras/install_pfile_utils.sh
lrwxrwxrwx 1 foo foo 27 Mar 7 04:43 install_portaudio.sh -> extras/install_portaudio.sh
lrwxrwxrwx 1 foo foo 23 Mar 7 04:43 install_speex.sh -> extras/install_speex.sh
lrwxrwxrwx 1 foo foo 23 Mar 7 04:43 install_srilm.sh -> extras/install_srilm.sh
lrwxrwxrwx 1 foo foo 13 Mar 7 19:23 openfst -> openfst-1.7.2
drwxr-xr-x 7 foo foo 4096 Mar 7 19:19 openfst-1.7.2
-rw-r--r-- 1 foo foo 1269292 Jul 17 2019 openfst-1.7.2.tar.gz
drwxr-xr-x 2 foo foo 4096 Mar 7 18:54 python

No, we see that we cannot iron-out the problem. The softlink from "openfst" is working correctly to install the latest version, however there's no way to get around the version check which will always yield an error when the program starts. So there's no installing manually using

# pacman -U [package.tar.gz]

"X"...in a box

Saturday, March 26, 2022

video -- text

subtitles

extract subtitles

from prior post*

PNG incorporation

additional options (scripts, text files, captions, proprietary)

screenwriting

Monday, March 14, 2022

paperhater - classification issues (minus database)

overview

first cut

naming - files

naming - folders(directories)

plan - file & folder

A. non-research example

B. research/library example

Step 1 - gather info

Step 2 - PDF name - first cut

Step 3 - BibTeX it

Step 4 - filename and folder again

C. database or not?

Tuesday, March 8, 2022

STT - Kaldi [fail], VOSK, and DeepSpeech

1. deepspeech (install)

quick list

quirks

scripts, conf file

2. nerd-dictation (usage)

quick list

quirks

scripts, conf file

3. nerd-dictation (install) 1 hr +/-

pulseaudio (full install)

how install the app/binary?

where is the binary installed?

where is VOSK model and where installed?

3. failure mode (kaldi)

table of contents

tags

18 other sites