Showing posts with label screencast. Show all posts
Showing posts with label screencast. Show all posts

Monday, April 20, 2020

PiP screencast pt 1

contents
capture organizing the screen, opening PiP, capture commandsspeed changes slow motion, speed ramps
cutsscripting
precision cutsaudio and sync
other effects fade, text, saturationsubtitles/captions

NB: Try to make all cuts at I-frame keyframes, if possible.


Links: 1) PiP Pt II 2) capture commands 3) settings

Editing video in Linux becomes a mental health issue after a decade or more of teeth grinding with Linux GUI video editors. There are basically two backends:ffmpeg and MLT. After a lost 10 years, some users like me resign themselves to command line editing with ffmpeg and melt (the MLT CLI editor).

This post deconstructs a simple PiP screencast, perhaps 6 minutes long. A small project like this exposes nearly all the Linux editing problems which appear in a production length film. This is the additional irony of Linux video editing -- having to become practically an expert just to do the simplest things; all or nothing.

At least five steps are involved, even for a 3.5 minute video.

  1. get the content together and laid out, an impromptu storyboard. What order do I want to provide information?
  2. verify the video inputs work
  3. present and screencapture - ffplay, ffmpeg CLI
  4. cut clips w/out render - ffmpeg CLI
  5. assemble clips with transitions - ffmpeg CLI

capturing the raw video

The command-line PiP video setup requires 3 terminals to be open, 1) for the PiP, 2) for the document cam, 3) for the screen capture. Each terminal has a command. 1) ffplay, 2) ffplay, 3) ffmpeg.

1. ffplay :: PiP (always on top)

The inset window of the host narrating is a PiP that should always be on top. Open a terminal and get this running first. The source is typically the built in webcam, trained on one's face.
$ ffplay -i /dev/video0 -alwaysontop -video_size 320x240

The window always seems to open at 640x480, but then resized down to 160x120 and moved anywhere on the desktop. And then to dress it up with more brightness, some color sat, and mirror flipped...

ffplay -i /dev/video0 -vf eq=brightness=0.09:saturation=1.3,hflip -alwaysontop -video_size 320x240

2. ffplay :: document cam

I start this secondly, and make it nearly full sized, so I can use it interchangeably with any footage of the web browser.
$ ffplay -i /dev/video2 -video_size 640x480

3. ffmpeg :: screen and sound capture

Get your screensize with xrandr, eg 1366x768, then eliminate the bottom 30pixels (20 on some systems) to omit the toolbar. If the toolbar isn't shown, it can be used during recording to switch windows. Syntax: put the 3 flags in this order:

-video_size 1366x738 -f x11grab -i :0
...else you'll probably get only a small left corner picture or errors. Then come all your typical bitrate and framerate commands
$ ffmpeg -video_size 1366x738 -f x11grab -i :0 -r 30 output.mp4

This will encode a cleanly discernable screen at a cost of about 5M every 10 minutes. The native encoding is h264. If a person wanted to instead be "old-skool" with MPEG2 (codec:v mpeg2video), the price for the same quality is about 36 times larger: about 180M for the same 10 minutes. For MPEG2, we set a bitrate around 3M per second (b:v 3M), to capture similarly to h264 at 90K.

Stopping the screen capture is CTRL-C. However: A) be certain CTRL-C is entered only once. The hard part is, it doesn't indicate any change for over a minute so a person is tempted to CTRL-C a second time. Don't do that (else untrunc). Click the mouse on the blinking terminal cursor to be sure the terminal is focused, and then CTRL-C one time. It could be a minute or two and the file size will continue to increase, but wait. B) Before closing the terminal, be certain ffmpeg has exited.

If you CTRL-C twice, or you close the terminal before ffmpeg exits, you're gonna get the dreaded "missing moov atom" error. 1) install untrunc, 2) make another file about as long as the first but which exits normally, and 3) run untrunc against it.

Explicitly setting the screencast bitrate (eg, b:v 1M b:a 192k) typically spawns fatal errors, so I only set the frame rate.

Adding sound...well you're stuck with PulseAudio if you installed Zoom, so just add -f pulse -ac 2 -i default...I've never been able to capture sound in a Zoom meeting however.

$ ffmpeg -video_size 1366x738 -f x11grab -i :0 -r 30 -f pulse -ac 2 -i default output.mp4

manage sound sources

If a person has a Zoom going and attempts to record it locally, without benefit of the Zoom app, they typically only hear sound from their own microphone. Users must switch to the sound source of Zoom itself to capture the conversation. This is the same with any VOIP, of course. This can create problems -- a person needs to make a choice.

Other people will say that old school audio will be 200mV (0.002), p-p (peak-to-peak). Unless all these signals are changed to digital, gain needs to be set differently. One first needs to know the name of the devices. Note that strange video tells more about computer mic input at than I've seen anywhere.

basic edits, separation, and render

Link: Cuts on keyframes :: immense amounts of information on cut and keyframe syntax


Ffmpeg can make non-destructive, non-rerendered cuts, but they may not occur on an I-frame (esp. keyframe) unless seek syntax and additional flags are used. I first run $ ffprobe foo.mp4 or $ ffmpeg -i foo.mp4on the source file: bitrate, frame rate, audio sampling rates, etc. Typical source video might be 310Kb h264(high), with 128 kb/s, stereo, 48000 Hz aac audio. Time permitting, one might also want to obtain the video's I-frame (keyframe) timestamps, and send them to a text file to reference during editing...

$ ffprobe -loglevel error -skip_frame nokey -select_streams v:0 -show_entries frame=pkt_pts_time -of csv=print_section=0 foo.mp4 >fooframesinfo.txt 2>&1
  • no recoding, save tail, delete leading 20 seconds. this method places seeking before the input and it will go to the closest keyframe to 20 seconds.
    $ ffmpeg -ss 0:20 -i foo.mp4 -c copy output.mp4
  • no recoding, save beginning, delete tailing 20 seconds. In this case, seeking comes after the input. Suppose the example video is 4 minutes duration, but I want it to be 3:40 duration.
    $ ffmpeg -i foo.mp4 -t 3:40 -c copy output.mp4
    Do not forget "-c copy" or it will render. Obviously, some circumstances require this level of precision, and a person has little choice but to render.
    $ ffmpeg -i foo.mp4 -t 3:40 -strict 2 output.mp4
    This gives cleaner transitions.
  • save an interior 25 second clip, beginning 3:00 minutes into a source video
    $ ffmpeg -ss 3:00 -i foo.mp4 -t 25 -c copy output.m4
...split-out audio and video
$ ffmpeg -i foo.mp4 -vn -ar 44100 -ac 2 sound.wav
$ ffmpeg -i foo.mp4 -c copy -an video.mp4
...recombine (requires render) with mp3 for sound, raised slightly above neutral "300", for transcoding loss
$ ffmpeg -i video.mp4 -i sound.wav -acodec libmp3lame -ar 44100 -ab 192k -ac 2 -vol 330 -vcodec copy recombined.mp4

precision cuts (+1 render)

Ffmpeg doesn't allow for frame number cutting. If you set a time without recoding, it will rough cut to a number of seconds and a decimal. This works poorly for transitions. So what you'll have to do is recode it and enforce strict time limits, then take it time the number of frames. You can always bring the clip into Blender to see the exact number of frames. Even though Blender is backended with Python and ffmpeg, it somehow counts frames a la MLT.

other effects (+1 render)

Try to keep the number of renders as low as possible, since each is lossy.

fade in/out

...2 second fade-in. It's covered directly here, however, it requires the "fade" and "afade" filters which don't come standardly compiled in Arch, AND, it must re-render the video for this.
$ ffmpeg -i foo.mp4 -vf "fade=type=in:duration=2" -c:a copy output.mp4

For the fade-out, the location must be made in seconds, most recommend using ffmprobe, then just enter the information 2 seconds before you want it. This video was 7:07.95, or 427.95 seconds. Here it is embedded with some other filters I was color balancing and de-interlacing with.

$ ffmpeg -i foo.mp4 -max_muxing_queue_size 999 -vf "fade=type=out:st=426:d=2,bwdif=1,colorbalance=rs=-0.1,colorbalance=bm=-0.1" -an foofinal.mp4

text labeling +1 render

A thorough video 2017,(18:35) exists on the process. Essentially a filter and a text file, but font files must be specified. If you install a font manager like gnome-tweaks, the virus called PulseAudio must be installed, so it's better to get a list of fonts from the command line
$ fc-list
...and from this pick the font you want in your video. The filter flag will include it.
-vf "[in]drawtext=fontfile=/usr/share/fonts/cantarell/Cantarell-Regular.otf:fontsize=40:fontcolor=white:x=100:y=100:enable='between(t,10,35)':text='this is cantarell'[out]"
... which you will want to drop into the regular command
$ ffmpeg -i foo.mp4 -vf "[stuff from above]" -c:v copy -c:a copy output.mp4

...however this cannot be done because streamcopying cannot be accomplished after a filter has been added -- the video must be re-encoded. Accordingly, you'll need to drop it into something like...

$ ffmpeg -i foo.mp4 -vf "[stuff from above]" -output.mp4

Ffmpeg will copy most of the settings, but I do often specify the bit rate, since ffmpeg occasionally doubles it unnecessarily. This would just be "q:v "(variable), or "b:v "(constant). It's possible to also run multiple filters; put a comma between each filter statement.

$ ffmpeg -i foo.mp4 -vf "filter1","filter2" -c:a copy output.mp4

saturation

This great video (1:08), 2020, describes color saturation.

$ ffmpeg -i foo.mp4 -vf "eq=saturation=1.5" -c:a copy output.mp4

speed changes

1. slow entire, or either end of clip (+1 render)

The same video shows slow motion.

$ ffmpeg -i foo.mp4 -filter:v "setpts=2.0*PTS" -c:a output.mp4
OR
$ ffmpeg -i foo.mp4 -vf "setpts=2.0*PTS" output.mp4

Sometimes the bitrate is too low on recode. Eg, ffmpeg is likely to choose around 2,000Kb if the user doesn't specify a bitrate. Yet if there's water in the video, it will likely appear jerky below a 5,000Kb bitrate...

$ ffmpeg -i foo.mp4 -vf "setpts=2.0*PTS" -b 5M output.mp4

2. slowing a portion inside a clip (+2 render)

Complicated. If we want to slow a 2 second portion of a 3 minute normal-speed clip, but those two seconds are not at either end of the clip, then ffmpeg must slice-out the portion, slow the portion (+1 render), then concatenate the pieces again (+1 render). Also, since the single clip temporarily becomes more than one clip, a filter statement with a labeling scheme is required. It's covered here. It can be covered in a single command, but it's a big one.

Suppose we slow-mo a section from 10 through 12 seconds in this clip. The slow down adds a few seconds to the output video.

$ ffmpeg -i foo.mp4 -filter_complex "[0:v]trim=0:10,setpts=PTS-STARTPTS[v1];[0:v]trim=10:12,setpts=2*(PTS-STARTPTS)[v2];[0:v]trim=12,setpts=PTS-STARTPTS[v3];[v1][v2][v3] concat=n=3:v=1" output.mp4

supporting documents

Because of the large number of command flags and commands necessary for even a short edit, we can benefit from making a text file holding all the commands for the edit, or all the text we are going to add to the screen, or the script for TTS we are going to add, and a list of sounds, etc. With these three documents we end up sort of storyboarding our text. Finally, we might want to automate the edit with a Python file that runs through all of our commands and calls to TTS and labels.

basic concatenation txt

Without filters, file lists (~17 into video) are the way to do this with jump cuts.

python automation

Python ffmpeg scripts are a large topic requiring a separate post; just a few notes here. A relatively basic video 2015,(2:48) describing Python basics inside text editors. The IDE discussion can be lengthy also, and one might want to watch this2020, (14:06), although if you want to avoid running a server (typically Anaconda), you might want to run a simpler IDE (Eric, IDLE,), PyCharm, or even avoid IDE's2019,(6:50). Automating ffmpeg commands with Python doesn't require Jupyter since the operations just occur on one's desktop OS, not inside a browser.

considerations

We want to have a small screen of us talking about a larger document or some such and not just during recording
  • we want the small screen PiP to always be on top :: use -alwaysontop flag
  • we'd like to be able to move it
  • we'd like to make it smaller than 320x240
link: ffplay :: more settings

small screen

$ ffplay -f video4linux2 -i /dev/video0 -video_size 320x240
OR
$ ffplay -i /dev/video0 -alwaysontop -video_size 320x240
...then to keep it always on top

commands

The CLI commands run long. This is because ffmpeg defaults run high. Without limitations inside the commands, ffmpeg pulls 60fps, h264(high), at something like 127K bitrate. Insanely huge files. For a screencast, we're just fine with
  • 30fps
  • h264(medium)
  • 1K bitrate
flag note
 b:v4Kb if movement in the PiP is too much, up this
 fx11grab must be followed immediately with a second option "i", and eg, "desktop" this will also bring h264 codec
 framerate30. Some would drop it to 25, but I keep with YouTube customs even when making these things. Production level would be 60fps
 b:v1M if movement in the PiP is too much, up this
Skype1-1, MicroSoft data collection for the US Govt

video4linux2

This is indespensable for playing one's webcam on the desktop, but it tends to default to highest possible framerates (14,000Kbs), and to a 640x480 window-size though the latter is resizeable. The thing is, it's unclear whether this is due to the vidoe4linux2 codec settings, or upon the ffplay which uses it. So is there a solid configuration file to reset these? This site does show a file to do this.

scripting

You might want to run a series of commands.The key issue is figuring the chaining. Do you want to start 3 programs at once, one after the other, one after the other as each one finishes, one after the other with the input of the prior program as the input for the next?

Bash Scripting (59:11) Derek Banas, 2016. Full tutorial on Bash scripting.
Linking commands in a script (Website) Ways to link commands.

$ nano pauseandtalk.sh (don't need sh, btw)
#!/bin/bash

There are several types of scripts. You might want a file that sequentially runs a series of ffmpeg commands, or you might want to just have a list of files for ffmpeg to look at to do a concatanation, etc.

Sample Video Editing Workflow using FFmpeg (19:33) Rick Makes, 2019. Covers de-interlacing to get rid of lines, cropping, and so on.
Video Editing Comparison: Final Cut Pro vs. FFmpeg (4:44) Rick Makes, 2019. Compares editing on the two interfaces, using scripts for FFmpeg

audio and narration/voiceover

Text-to-speech has been covered in another post, however there are commonly times when a person wants to talk over some silent video. $ yay -S audio-recorder. How to pause the video and speak at a point, and still be able to concatenate.

inputs

If you've got a desktop with HDMI output, a 3.5mm hands-free mic won't go into the video card, use the RED 3.5mm mic input, then filter out the 60hz hum. There are ideal mics with phantom power supplies, but even a decent USB mic is $50.

For syncing, you're going to want to have your audio editor running and Xplayer running same desktop. This is because it's easier to edit the audio than the video, there's no rendering to edit audio.

Using only Free Software (12:42) Chris Titus Tech, 2020. Plenty of good audio information (including Auphonic starting at 4:20; mics (don't use the Yeti - 10:42) and how to sync (9:40) at .4 speed.
Best for less than $50 (9:52) GearedInc, 2019. FifinePNP, Blue Snowball. Points out that once we get to $60, it's an "XLR" situation with preamps and so forth to mitigate background noise.
Top 5 Mics under $50 (7:41) Obey Jc, 2020. Neewer NW-7000Compares editing on the two interfaces, using scripts for FFmpeg

find the microphone - 3.5mm

Suppose we know we're using card 0

$ amixer -c0
$ aplay -l
These give us plenty of information. However, it's still likely in an HDMI setup to hit the following problem
$ arecord -Ddefault test-mic.wav
ALSA lib pcm_dsnoop.c:641:(snd_pcm_dsnoop_open) unable to open slave
arecord: main:830: audio open error: No such file or directory

This means there is no "default" configured in ~./asoundrc. There would be other errors too, if not specified. The minimum command specifies the card, coding, number of channels, and rate.

$ arecord -D hw:0,0 -f S16_LE -c 2 -r 44100 test-mic.wav

subtitles/captions

Saturday, February 29, 2020

toolbox :: voice-over

contents
1. voice: a) detection* b) levels and record [alsamixer, audacity] c) cut and polish [goldwave] 3. sync: solution is slowdown video (to about .4) for clap, then work pieces
2. tts4. notes

* typically requires ~/.asoundrc authorship


This post concerns two voiceover methods -- one human, the other using Text-To-Speech (TTS). Both of course have to be synced to video. Hopefully, the post also helps those doing audio w/o video.

1. Voice

input

I currently use a USB connected, non XLR, microphone, the Fifine K053 ($25, USB ID 0c76:161f).

Cleaner options are available at higher price points, eg phantom power supplies and so on to prevent hum, or XLR pre-amping. Either way, what we'd prefer is to see our mic's input levels as we're speaking, and to be able to adjust those levels in real time. Of course, the first step is its detection.

detection (and .asoundrc)

Links: mic detection primer :: asoundrc primer :: Python usage


If using ALSA, we need to know how the system detects and categorizes a mic within an OS. Then we'll put the information in ~/.asoundrc, so it can be used by any sound software. However, if you have the dreaded two sound card situation, which always occurs with HDMI, you've got to straighten that out first, and reboot. The HDMI situation works best when I unplug the mic before booting, and then plug it in after login. Typically, the mic is Device 2, after Device 0 and HDMI 1. Everything else with Audacity and alsamixer below is the same.

$ arecord -l
**** List of CAPTURE Hardware Devices ****
card 0: SB [HDA ATI SB], device 0: ALC268 Analog [ALC268 Analog]
Subdevices: 1/1
Subdevice #0: subdevice #0
card 1: Device [USB PnP Audio Device], device 0: USB Audio [USB Audio]
Subdevices: 1/1
Subdevice #0: subdevice #0
...indicating we have plughw:1,0 for the input, double check and...
$ cat /proc/asound/cards
0 [SB ]: HDA-Intel - HDA ATI SB
HDA ATI SB at 0xf6400000 irq 16
1 [Device ]: USB-Audio - USB PnP Audio Device
USB PnP Audio Device at usb-0000:00:12.0-3, full speed
...appears good. If we'd also like to know the kernel detected module...
$ lsusb -t
[snip]
|__ Port 1: Dev 2, If 0, Class=Human Interface Device, Driver=usbhid, 1.5M
|__ Port 3: Dev 3, If 0, Class=Audio, Driver=snd-usb-audio, 12M
|__ Port 3: Dev 3, If 1, Class=Audio, Driver=snd-usb-audio, 12M
|__ Port 3: Dev 3, If 2, Class=Audio, Driver=snd-usb-audio, 12M
|__ Port 3: Dev 3, If 3, Class=Human Interface Device, Driver=usbhid, 12M
[snip]
... so we know it's using snd-usb-audio in the kernel on
Test the hardware config for 15 seconds
$ arecord -f cd -D plughw:1,0 -d 15 test.wav

real-time monitoring

When recording, we need to be able to see the waveform and adjust levels, otherwise it wastes a halfhour with arecord test WAV's and adjusting alsamixer, just to set levels. Audacity is the only application I currently know of that shows real-time waveform during recording. I open AUDACITY in the GUI, and then overlay ALSAMIXER in a terminal, from which I can set mic levels while recording a test WAV.

There are typically problems during playback. Sample rate may need to be set to 48000 depending on chipset. Audacity uses PortAudio, not PulseAudio or ALSA. I've found it's critical to select accurately from perhaps 12 playback options -- "default" almost never works. I go to this feature...

... often the best playback device will have some oddball name like "vdownmix", but it doesnt' matter, just use it.

TTS

tts - colab account

Create a notebook in Colab, then go to Cloud and authorize a text-to-speech API, and get a corresponding JSON credential. When running the TTS python code in Colab, the script will call to that API, using the JSON credential. Of course, it's the API which does the heavy work of translating the text into an MP3 (mono, 32Kbs, 22000Hz).

Voiceover in Collab (1:39) DScience4you, 2020. Download and watch at 40% speed. Excellent.
The Json API process (8:14) Anders Jensen, 2020.
Google TTS API (page) Google, continuously updated. Explains the features of API, for example the settings for language and pitch.

tts - local system

TTS workflow has four major steps. Some pieces are reuseable: 1) producing the text to be read, 2) writing the code translating the text to speech, 3) in the code, selecting translation settings for that project and, 4) editing output WAV or MP3's. All four must be tweaked to match formats, described in another post:
  • script :: any text editor will do.
  • code :: typically Python, as described here (11:25), and using gTTS.
    • # pacman -S python-pip python-pipreq pipenv to get pip. Pipenv is a VE sandbox, necessary so that pip doesn't break pacman's index or otherwise disturb the system. Also, pip doesn't require sudo. Eg: $ pip install --user gtts.
  • name one's script anything besides "gtts.py" since that will spawn errors -- it's one of the libraries. It is the gtts library which interfaces with the Cloud Google TTS API.
  • The "=" is used for variables. The output file is not a variable, so just put the parens near the method: output.save("output.mp3")
  • TTS engine :: varies from eSpeak to the Google API gtts, from which we import gTTS.
  • audio :: levels, pauses, file splittingm=, and so on. WAV is easiest format, eg with SoX, GoldWave, Jokosher, etc.


hardware

Some current and planned audio and video hardware

video

  • GoPro Hero
  • various old cell phones,if they charge up and have room for a SDC, i can use.

public domain films - archive.org
Video editing. The first problem are the inputs. Video clips from any of, say, 10 devices, has meant that the organization of video, audio, software, and hardware have seemed impossible to me. For years. Appropriating UML toward this chaos became the first relief, then other pieces eventually fell into place.

A second problem with TTS, is it relies on Python for something with as much speech as a script. And this means doing a separate, non-Arch, Python install and pip to get packages - or - sticking with Arch and avoiding pip in favor of yay. Blending pip updates with Arch means disaster the next time you attempt a simple pacman -Syu. The problem with yay/AUR is it may not have the modules needed to run TTS.

I do separate installs on anything that has its own updaters: Python, Latex, GIT. It's a slight pain in the ass making sure all paths are updated, but I can keep them current without harming my Arch install and updates.

UML

Universal Modeling Language ("UML") is for software, but we know its various diagrams work well for, eg. workflow, files, and concepts. Derek, per usual, has a concise conceptual overview (12:46, 2012).

We can create UML in UMLet or Dia, though we will eventually want to do them in LaTeX. Lists, esp with links and so on, are best created in HTML (Geany/GTK). It's good to keep backups of these overarching UML and HTML documents on Drive.
  • Dia: # pacman -S dia/$ dia :: select the UML set of graphic icons, switch to letter-size from A4. Files are natively saved with DIA extensions, but can be exported to PNG.
  • UMLet: $ yay -S umlet/$ umlet :: UMLet is the software in Derek's video above. Sadly, allows Oracle into one's system by requiring Java headers and some other files. UMLET extension is UXF (their XML schema), but can be exported in graphical formats.

audio

(UML on-file) Like video, audio arrives in any number of formats and must be standardized prior to remixing with video. Some say to sample 24 bit, some say 16 is just fine. Capturing old CD's, maybe 24/192kbv, but it's also fine to do 16/192kbv. For video or backup DVD, 16/192kb at 44100 seems to be a good output for remixing.
  • GoldWave: (proprietary) :: easy wine bottle. Does well with WAV.
  • Removed audio: audio source. audio separated from raw video for further processing
  • TTS: audio source. creating a script and using some method to change to audio for the video.
  • Mixxx: combine music, TTS, and anything else at proper levels.
  • Qjackctl: # pacman -S qjackctl/$ qjackctl :: JACK manager to work alongside ALSA.

pulseaudio notes

Links:PulseAudio configuration :: PulseAudio examples

Go to /etc/pulse/ for the two main configuration files: default.pa, and client.conf. Any sink you add in ALSA will be detected in PulseAudio, eg. # modprobe snd_aloop will turn on at least one loopback device.

Multiple Audio Collection (11:29) Kris Occhipinti, 2017. How to do multiple audio sources.

Saturday, March 9, 2013

[solved] incorrect duration converting wav's to mp3's

problem

User creates a 3:15 screencast "smite.mp4". They extract the WAV soundtrack to opitimize or edit it. After this, suppose they wanted to convert the WAV to a space-saving MP3 before recombining w/video? In this case, imagine a simple 192k continuous bitrate is OK, that no Variable Bit Rate (VBR) audio is required. Also, they may want to keep the separate audio file and slow it down or speed it up or change the quality. These typically can be done with lame and/or sox.

We do our conversion, say with:

$ ffmpeg -i smite.wav -ab 192k -af "volume=1.1" smite.mp3

Note: The volume was tweaked because it's sometimes decreased in conversion. Great resource.

but the duration is incorrect

Now we run a test of the MP3 file in some player, perhaps Audacious. On occasion, we'll find that the duration stamp is corrupted, perhaps appearing as 28:15. Typically, the slider can't move in a corrupted timeline either.

Incorrect duration stamps in media files are an occasional problem. The original media file might have it. More often if a person accelerate or slows-down the WAV, which we can do between half-speed (0.5) or double speed (2.0) with a filter.

$ ffmpeg -i smite.mp4 -af "atempo=0.85","volume=1.1" -vn -ar 44100 -ac 2 smiteslower.wav

the reason

Ffmpeg and avconv use the bitrate setting as part of their duration calculations. Both ffmpeg and avconv will calculate the duration correctly if we don't specify a bitrate. Unfortunately, if we don't specify the bitrate, ffmpeg and avconv will use their native bitrates, which both happen to be the low quality of 128Kb. So how do we achieve the 192K bitrate we desire in the example above and still obtain a correct duration stamp on the resulting MP3?

solution

Install lame. For example to achieve the 192Kb, with a correct time stamp, and with the conversion volume setting just a bit above 100% (scale), we could use:

$ lame --scale 1.2 -b 192 smite.wav

I can change the bitrate to whatever I want, even into a VBR, and the resulting duration stamp is accurate. With respect to the volume, if I wanted to double it, the scale I would select would be "2", and so on. Finally, the output file name, in this case smite.mp3, will be created automatically using the input WAV file's name. Alternatively, one can force an output name. Now, when we re-render our audio back to our video, they will be properly synced, since the timestamps are correct.

solution going mp3 to wav

$ lame --decode file.mp3 output.wav

memory issues

Sometimes there's not enough /tmp space to handle processing a large media file. You'd imagine the solution is to increase the size of /tmp beyond the Gb's of installed RAM, so that the system overflows into SWAP. This will not work. This is because they now have both /tmp AND /tmpfs. Tmpfs is what is actually being used. Its default is half the GB of RAM. Systems put tmpfs in RAM to save on resources -- it makes the system more efficient. However, when dealing with a large media file, I modify /etc/fstab as below, and then reboot.

# nano /etc/fstab
tmpfs /tmp tmpfs rw,nodev,nosuid,size=10G 0 0

When the media work is complete, I comment out the following fstab and reboot again.

slowing tempo

links: sox cheat sheet

Can be done with ffmpeg, lame or sox. Lame only takes WAV and MP3 files as inputs. Sox can read and manipulate almost any filetype, though it needs to be specified as a flag, and it will output a WAV.

In this example, I take an input OPUS file, slow it to 80% of original, and boost the volume 10%, while converting to MP3 at a bitrate 320K. The syntax is counter-intuitive, IMO. Eg, there's no hyphen before "speed".

$ sox -v 1.1 foo.opus -C 320 foo.mp3 speed 0.8

If the result clips in a few places due to the increased bass of slower speed, the output can be equalized to decrease the bass.

video note

When converting the video of a screencast, the only way I've found to get the proper duration is to be sure to use the switch:
-target ntsc-dvd

Monday, October 13, 2008

screencasting - slackware

links: ffmpeg commands
Lee Lefever videos are not screencasts but they reveal the value of a simple idea. His guiding philosophical considerations are well-described in his blog. Educational screencasts do well to follow similar lines. Screencasts from teacher Joe Wood, have a similar flavor to Lee Lefever's videos, and Wood clarifies his ideas well.

What about screencasts in Linux? My previous work making a required video for an education class left me feeling underwhelmed. It was initially raw AVI from the camera, but I muxed it to MPEG2. There appeared to be resolution issues - the picture wasn't as clean as I hoped. This time, I'm starting with screencasts. I'll eventually work back to recording and cutting video when I can easily manage screencasts. Another side of it is the computing power required for rendering -- I want to increase the efficiency there.

istanbul


First attempt was with Istanbul, which was pre-installed with Zenwalk 5.1. Something was wrong with the framerates, and all I was seeing was flashing screens, which appeared to indicate a large number of frame drops. Then I looked at this guy's post and it appears that there might also be screen size as well as framerate issues. Istanbul has no sound supplied, it must subsequently be muxed. What could possibly be more annoying?

recordmydesktop


Includes sound and screen, and outputs in Ogg-Theora as .ogv files. Seems the most useful, but the ability to record sound varies with systems. When it works, it works well. The .ogv file can be shifted to .flv format for YouTube uploads or website. I used a script for doing so from here  though I'm sure it's also around at other sites. I had to modify the script slightly for CLI use, and libmp3lame.so.0 must be installed for the mencoder inside it to follow the script properly. I renamed the script ogv2flv.sh and it runs on command line once libmp3lame.so.0 is installed:
$ ogv2flv.sh input.ogv
There is no config file for the CLI version of recordmydesktop, which means hideously long command line entries. Further, typing $ man recordmydesktop only produced "No manual entry for recordmydesktop". Nice going. It appears the best thing is the sourceforge version, until that URL breaks. Strike two. However, a nicely sized screen which works well from command line for capturing browser without the status bar, but including the URL bar is
$ recordmydesktop -x 14 -y 55 -width 988 -height 674 -fps 12 -o wobbly.ogv
This seems to default constantly to 44100 and 2 channels in spite of my entering 1 channel and 22050 frequency and the device hw:0,0, so I eventually deleted these parameters from the line.

Recordmydesktop also has a python based GtK front-end available to those who are interested. This program does have a config file .gtk-recordmydesktop, which appears to be an advantage over remembering the complex CLI commands necessary to avoid the inevitable "broken pipe" commands if you forget one parameter. Each time I edited the config file by hand, the program overwrote my screen size settings each time it opened. Strike three.

xvidcap


Solid, but a few quirks. 1) .xvidcap.scf is supposed to be available for a config file, it apparently doesn't work well or is not read. Accordingly, right clicking on the gui controls provides preferences. Not too bad and they can be saved there too. 2) Have to adjust the box size each time. 3) On at least one occasion, it might have muted my microphone during start-up. 3). Sound is garbled unless using oss emulation aoss. So, I might start xvidcap to do 10 frames per second like this:
$ aoss xvidcap --fps 10
Once screencasting is complete, ffmpeg can shift the mpeg into a YouTube flv in a single command
$ ffmpeg -i test2.mpeg -ar 44100 test2.flv

sound levels


Microphone settings become significant in screencasting. Here are a couple of cards.

Realtek ALC660-D
I set "Mic" as the capture source, and vary the relationship between Mic Boost, Digital, and the Capture bar. The settings which avoid clipping and feedback have been Mic Boost=33, Digital~65-70, and Capture~77-82. Digital seems to be the most important for hiss and I play trade-off between Digital and Capture until the hiss disappears while attempting to avoid clipping distortion if Capture is set too high.