"X"...in a box: February 2020

contents

1. voice: a) detection* b) levels and record [alsamixer, audacity] c) cut and polish [goldwave]	3. sync: solution is slowdown video (to about .4) for clap, then work pieces
2. tts	4. notes

* typically requires ~/.asoundrc authorship

This post concerns two voiceover methods -- one human, the other using Text-To-Speech (TTS). Both of course have to be synced to video. Hopefully, the post also helps those doing audio w/o video.

1. Voice

input

I currently use a USB connected, non XLR, microphone, the Fifine K053 ($25, USB ID 0c76:161f).

Cleaner options are available at higher price points, eg phantom power supplies and so on to prevent hum, or XLR pre-amping. Either way, what we'd prefer is to see our mic's input levels as we're speaking, and to be able to adjust those levels in real time. Of course, the first step is its detection.

detection (and .asoundrc)

Links: mic detection primer :: asoundrc primer :: Python usage

If using ALSA, we need to know how the system detects and categorizes a mic within an OS. Then we'll put the information in ~/.asoundrc, so it can be used by any sound software. However, if you have the dreaded two sound card situation, which always occurs with HDMI, you've got to straighten that out first, and reboot. The HDMI situation works best when I unplug the mic before booting, and then plug it in after login. Typically, the mic is Device 2, after Device 0 and HDMI 1. Everything else with Audacity and alsamixer below is the same.

$ arecord -l
**** List of CAPTURE Hardware Devices ****
card 0: SB [HDA ATI SB], device 0: ALC268 Analog [ALC268 Analog]
Subdevices: 1/1
Subdevice #0: subdevice #0
card 1: Device [USB PnP Audio Device], device 0: USB Audio [USB Audio]
Subdevices: 1/1
Subdevice #0: subdevice #0

...indicating we have plughw:1,0 for the input, double check and...

$ cat /proc/asound/cards
0 [SB ]: HDA-Intel - HDA ATI SB
HDA ATI SB at 0xf6400000 irq 16
1 [Device ]: USB-Audio - USB PnP Audio Device
USB PnP Audio Device at usb-0000:00:12.0-3, full speed

...appears good. If we'd also like to know the kernel detected module...

$ lsusb -t
[snip]
|__ Port 1: Dev 2, If 0, Class=Human Interface Device, Driver=usbhid, 1.5M
|__ Port 3: Dev 3, If 0, Class=Audio, Driver=snd-usb-audio, 12M
|__ Port 3: Dev 3, If 1, Class=Audio, Driver=snd-usb-audio, 12M
|__ Port 3: Dev 3, If 2, Class=Audio, Driver=snd-usb-audio, 12M
|__ Port 3: Dev 3, If 3, Class=Human Interface Device, Driver=usbhid, 12M
[snip]

... so we know it's using snd-usb-audio in the kernel on

Test the hardware config for 15 seconds

$ arecord -f cd -D plughw:1,0 -d 15 test.wav

real-time monitoring

When recording, we need to be able to see the waveform and adjust levels, otherwise it wastes a halfhour with arecord test WAV's and adjusting alsamixer, just to set levels. Audacity is the only application I currently know of that shows real-time waveform during recording. I open AUDACITY in the GUI, and then overlay ALSAMIXER in a terminal, from which I can set mic levels while recording a test WAV.

There are typically problems during playback. Sample rate may need to be set to 48000 depending on chipset. Audacity uses PortAudio, not PulseAudio or ALSA. I've found it's critical to select accurately from perhaps 12 playback options -- "default" almost never works. I go to this feature...

... often the best playback device will have some oddball name like "vdownmix", but it doesnt' matter, just use it.

TTS

tts - colab account

Create a notebook in Colab, then go to Cloud and authorize a text-to-speech API, and get a corresponding JSON credential. When running the TTS python code in Colab, the script will call to that API, using the JSON credential. Of course, it's the API which does the heavy work of translating the text into an MP3 (mono, 32Kbs, 22000Hz).

Voiceover in Collab (1:39) DScience4you, 2020. Download and watch at 40% speed. Excellent.
The Json API process (8:14) Anders Jensen, 2020.
Google TTS API (page) Google, continuously updated. Explains the features of API, for example the settings for language and pitch.

tts - local system

TTS workflow has four major steps. Some pieces are reuseable: 1) producing the text to be read, 2) writing the code translating the text to speech, 3) in the code, selecting translation settings for that project and, 4) editing output WAV or MP3's. All four must be tweaked to match formats, described in another post:

script :: any text editor will do.
code :: typically Python, as described here (11:25), and using gTTS.
- # pacman -S python-pip python-pipreq pipenv to get pip. Pipenv is a VE sandbox, necessary so that pip doesn't break pacman's index or otherwise disturb the system. Also, pip doesn't require sudo. Eg: $ pip install --user gtts.
name one's script anything besides "gtts.py" since that will spawn errors -- it's one of the libraries. It is the gtts library which interfaces with the Cloud Google TTS API.
The "=" is used for variables. The output file is not a variable, so just put the parens near the method: output.save("output.mp3")

TTS engine :: varies from eSpeak to the Google API gtts, from which we import gTTS.
audio :: levels, pauses, file splittingm=, and so on. WAV is easiest format, eg with SoX, GoldWave, Jokosher, etc.

hardware

Some current and planned audio and video hardware

video

GoPro Hero
various old cell phones,if they charge up and have room for a SDC, i can use.

public domain films - archive.org
Video editing. The first problem are the inputs. Video clips from any of, say, 10 devices, has meant that the organization of video, audio, software, and hardware have seemed impossible to me. For years. Appropriating UML toward this chaos became the first relief, then other pieces eventually fell into place.

A second problem with TTS, is it relies on Python for something with as much speech as a script. And this means doing a separate, non-Arch, Python install and pip to get packages - or - sticking with Arch and avoiding pip in favor of yay. Blending pip updates with Arch means disaster the next time you attempt a simple pacman -Syu. The problem with yay/AUR is it may not have the modules needed to run TTS.

I do separate installs on anything that has its own updaters: Python, Latex, GIT. It's a slight pain in the ass making sure all paths are updated, but I can keep them current without harming my Arch install and updates.

UML

Universal Modeling Language ("UML") is for software, but we know its various diagrams work well for, eg. workflow, files, and concepts. Derek, per usual, has a concise conceptual overview (12:46, 2012).

We can create UML in UMLet or Dia, though we will eventually want to do them in LaTeX. Lists, esp with links and so on, are best created in HTML (Geany/GTK). It's good to keep backups of these overarching UML and HTML documents on Drive.

Dia: # pacman -S dia/$ dia :: select the UML set of graphic icons, switch to letter-size from A4. Files are natively saved with DIA extensions, but can be exported to PNG.
UMLet: $ yay -S umlet/$ umlet :: UMLet is the software in Derek's video above. Sadly, allows Oracle into one's system by requiring Java headers and some other files. UMLET extension is UXF (their XML schema), but can be exported in graphical formats.

audio

(UML on-file) Like video, audio arrives in any number of formats and must be standardized prior to remixing with video. Some say to sample 24 bit, some say 16 is just fine. Capturing old CD's, maybe 24/192kbv, but it's also fine to do 16/192kbv. For video or backup DVD, 16/192kb at 44100 seems to be a good output for remixing.

GoldWave: (proprietary) :: easy wine bottle. Does well with WAV.
Removed audio: audio source. audio separated from raw video for further processing
TTS: audio source. creating a script and using some method to change to audio for the video.
Mixxx: combine music, TTS, and anything else at proper levels.
Qjackctl: # pacman -S qjackctl/$ qjackctl :: JACK manager to work alongside ALSA.

pulseaudio notes

Links:PulseAudio configuration :: PulseAudio examples

Go to /etc/pulse/ for the two main configuration files: default.pa, and client.conf. Any sink you add in ALSA will be detected in PulseAudio, eg. # modprobe snd_aloop will turn on at least one loopback device.

Multiple Audio Collection (11:29) Kris Occhipinti, 2017. How to do multiple audio sources.

Running Windows Programs on Linux (20:10), ExplainingComputers, 2017.
Wine Installation (19:58), Chris Titus Tech, 2019.

Simplistically, when one wishes to install or run a Windows app in Linux, one can open a terminal and...

$ wine fooapp.exe

This command creates a hidden ~/.wine/ folder (or "bottle" in wine-speak). In the folder is a spurious C drive and an imitation Windows file structure. With such a structure in place, Wine operates fooapp.exe consistent with the app's Windows file and library calls.

However, suppose one has more than one Windows app? Perhaps an older Windows app that ran best under WindowsXP, as well as newer Windows apps which run best on Windows10. The single ~/.wine bottle folder can be adjusted to match various Windows applications, but not without slowed performance and/or occasional errors. I wanted a way to reliably run any number of Windows applications with reasonably good performance.

application-level bottles

After reading and viewing several vids¹, my solution was to create a Wine bottle for each Windows application I use. A per-application bottle approach is not unwieldy if a person takes the time to keep each bottle lean, and it nearly ensures each Windows app will run predictably. Further, the subfolders for the Wine bottles can be placed into an unhidden directory, eg. ~/wine, making the entire configuration transparent and comprehensible.

configuring

For Wine installation, there were only three...

# pacman -S wine winetricks zenity

Zenity is needed for the winetricks GUI and wine is obviously the emulator. Think of winetricks as the Wine installation aid. I tried PlayOnLinux, but it was always coming-up short on DLL's and sending confusing errors. Some ppl will probably do a MONO installation as well, but I just do whatever version of NET is needed by the Windows app itself, in that bottle.

launching

Suppose I have some Windows app, "fooapp.exe". As noted above, I would configure a bottle for it, eg ~/wine/fooapp. To launch, I want to see any spawned errors and possibly avoid the generic 64-bit Wine version noted at the top of this post. A terminal launch can solve both...

$ WINEARCH=win32 WINEPREFIX=~/wine/fooapp wine fooapp.exe

The two environment setting commands keep Wine locked into the application's bottle and at 32bits. From the terminal I can watch for errors and relaunch winetricks to add additional DLL's. Rinse and repeat.

Once I have trouble-shot the application DLL's and its launch parameters, I'm done with the terminal; I'll want to create a menu item for easy launch. For me, this was a problem. My Windows Manager is IceWM, a great WM, but which only allows a single command per menu item. I needed to set environmental variables also. The only way to do this was with a script, as there was no lightweight application for multiple command menus.

$ nano fooapp.sh
#!/bin/bash
WINEARCH=win32 WINEPREFIX=~/wine/fooapp wine ~/wine/fooapp/drive_c/fooapp.exe
exit 0

$ chmod +x fooapp.sh
$ ./fooapp.sh

...and then in my menu, I have...

$ nano /home/foo/.icewm/windows
prog foo foo sh "/home/foo/wine/fooapp.sh"

shutdown

If Wine crashes, I always run $wineserver -k, just to be sure I've killed all Wine windows and memory usage.

NET

Wine installation will prompt to install mono, so NET operations will work.

safety

Windows stuff is susceptible to viruses, and Wine is not a bona fide sandbox.

there's no reason to run Wine as root, and I make a special effort to avoid mistakenly doing so.
when exiting a Windows app, I can use htop to be sure all processes have been killed.

"X"...in a box

Saturday, February 29, 2020

toolbox :: voice-over