Monday, December 19, 2022

statistics -- some foundation

Statistics overlaps into Math of course. We can record samples into a CSV or spreadsheet and do histograms, and (Algebra 1) regressions. Yet CSV data are arrays. Arrays are matrices, and thus vector, evaluable (Calc 3), as well as Python and R friendly. IMO, because it spans both simple and complex Math, and CS, Stats are a fun side project during one's Math progression, at any time.

Basics are Algebraic if one uses tables for area under curves. If not, a person could do them using simple Calc 1 calculations. TI-84's -- if available -- are also great for checking that kind of calculation (eliminating tables and Calc) and for offloading the drudgery of long lists.

HS AP Stats is essentially an Algebra-only college Stats1 class, and slightly more user friendly. A review book for AP Stats, the Princeton Review, Table of Contents is a list of roughly 35 subjects, split over 4 categories. I've used it below with some modifications to describe what a person might want. Beyond basic or AP, I've added a "next level" section at the bottom, since matrices connect Math to Statistics and arrays/data structures connect Stats to Data Science.

Introductory data
data collection
tabular methods
graphical methods - qualitative
graphical methods -quantitative
numerical methods - continuous
boxplots
add/mult a constant
comparing mult. groups
bivariate data: covar,regression
categorical frequency
Sampling and experiments
plan a study
data collection
plan a survey
bias in surveys
plan an experiment
Anticipating outcomes
probability
random variables
probability dist. - discrete random var's
probability dist. - continuous random var's
normal distribution
combining independent random var's
sampling distributions
Statistical inference
confirming models
parameters
point estimation
interval estimation
confidence interval
inference: significance tests
hypothesis: testing and accepting
estimation and inference: population proportion
estimation and inference: population mean
estimation and inference: 2 population proportions
estimation and inference: 2 population means
inference: categorical data

For a comprehensive course, there's Khan Academy. On YouTube, I prefer Brandon Foltz, M.Ed video compilation, esp at 0.5x speed. They were made 2010-ish, but ahead of their YouTube time1. Below there are some from Foltz and from many great teachers or vids which include nuggets in some way. As one goes along in review, it's a chance to relearn the annoying Greek symbols for parameters. The same Greeks as in finance, but different usage.

1Another brilliant info-sharing hero with early YT chops was Derek Banas. One of his best might be his investing video, which crosses data science with financials and Python. And of course Barry Brown for programming any of it in C.

distributions of data obtained

degrees of freedom, clt

To Z or t (38:16) Brandon Foltz, 2012. The notion behind this choice, without calculation. Almost all stats are samples, not entire population, ie, "the prevalence of depression of those over 65" - how to give all questionnaire? Can't. The smaller the number, the more chance of error. Avg temp in NYC, what if we just used a day, or 5 days? The larger the sample the more confidence we have captured reality.16:00, 33:00 decision. 19:00,30:00 degrees of freedom
Buying land (32:06) Brantley Blended, 2018. listing, PLAT recency, survey recency (due diligence period), boundaries,

bivariate data: linear regressions, covar

covariance (5:55) Ben Lambert, 2013. intuition behind covariance. Positive, negative, and none. Some formulas, but conceptual.
Variance vs. covariance (Webpage) Investopedia.

combinatorics, permutations, probability

nCr and nPr. These are the denominators of probability, since they give the universe by which we determine our odds. If "statistics" are the collected data, nested inside we have probability (based on prior events, statistics), and inside this is the "combinatorics and permuatations" which create the denominators of probability fractions.

Permutations and Combinations (17:40) Organic Chemistry Tutor, 2017. Simple review of when to use nCr, nPr, or just the factorial by itself. 14:00 How many ways can we arrange the letters in the word "Alabama".
problem - probablility (replace/don't replace) (10:11) Amy Krusemark, 2020. a slight political note at start, AP question. example of non-replacement probability, example of understanding permutation/combinatorics effecting denominator. 2:40 without other notice, 0.05 alpha is threshold for a reason to doubt.


next level

KL Divergence

KL Divergence (18:13) ritvikmath, 2023. Non-negative comparison of two distributions.

vectors and matrices in statistics

As my friend Bart texted...

Matrices are really just notation for a list [of] equations. Not so profound, not for a long time. Like if I have ten data points then i have that equation ten times. Ten y values. Ten x1 values, ten x2 values, etc. Thirty diff x values, so x could be written as a 10x3 grid. Ok enough [of] that
Beta is 1x3 and and x is 3x10. Then when multiplied out gives a 1x10 vector. To say that vector equals the 1x10 y vector is to say each component of y equals the corresponding for RHS. ie ten equations

And of course wherever we have a matrix, we have a potential vector.

Comp Sci v. Data Sci Matrix (10:34) ritvikmath, 2019. Reveals some clear differences between computer matrix use and math/datasci use and how they overlap (eg. CompSci efficiency can work on any math app).

data structure

necessary for data science. Data Science will evaluate these further, sometimes using Calculus, but we at least need to know what they are, IMO. Not on the AP test. The terminology transform is Statistics and Math use the term "matrix", but Data science/Computer science use the term "array". Depending on what we need to do with the matrix, computing will perform some function on a data array. Usage: imagine an R2 scatter plot, but where we have a third dimension with error information attached to each data point.

Linear Algebra: Transformational Matrices Part I (15:43) Computer Science, 2021. transformations on R2 matrices, using geometric examples. There's an entire playlist that's valuable.
Linear Algebra: Transformational Matrices Part II (9:23) Computer Science , 2021. transformations on R3 matrices, as we might do with data structures.
Trig Functions (9:15) PatrickJMT, 2011. this hero scores yet again. Just in case you need them again for the stuff above. So rotten.
1,2, and 3d structures (8:32) GridoWit, 2017. basic terminology and location tracking within different types of arrays and data structures. The inuition that arrays solve a storage problem: we don't want to have a new variable for each piece of data we have collected. C syntax is also provided.

aur - git encryption (arch)

So far, it appears affected: standardnotes-desktop.

I don't know if this will happen again, but could be helpful. Apparently, git recently changed to an authenticated SSL (encrypted) connection. AUR actions which attempt to use an unencrypted git download fail with the following.

The unauthenticated git protocol on port 9418 is no longer supported.
Please see https://github.blog/2021-09-01-improving-git-protocol-security-github/ for more information.
==> ERROR: Failure while downloading desktop git repo

One of the reported fixes is to simply change http to https, however this is not possible within AUR actions performed by yay -- yay will make the call to git, without any option for the user to change the URL.

solution

As time goes on, AUR package builders will write calls to "https" instead of "http". For the time being, it's a difficult problem.

Friday, December 16, 2022

colord - freedesktop.org severe device conflict (printer)

solution

NB: This takes the TN-630 toner cartridge.The TN-660 will work. If the entire drum is replaced, it's the DR-630.

Before describing the problem, here's a process for installing this (04f9:0092) Brother printer. This method is faster than the 3 days I spent understanding how colord had undermined the old reliable install method from a previous post. It's still an hour or two but compare that with salvaging configurations from older setups and losing days.

  1. physically connect the printer and lsusb to verify detection.
  2. pacman to install colord and cups. At the Brother site, download the RPM and xarchive to extract the PPD (and filter, if you want to go old skool).
  3. if it won't archive the RPM, then be sure to install its software # pacman -S cpio.
  4. new: $ yay -S brother-hll2315dw. This saves having to harvest, install and do permissions in several directories on executable scripts and data files. This is a 2 day time-saving driver.
  5. can rename (or not rename) the PPD to anything "BrotherA.ppd" for example. Open it up and change the default size to 'letter' from 'A4' and/or any other such changes. Save, and put it into /usr/share/model/. Next, do either 6 or 7.
  6. (CLI, more fun, less reliable)
    # lpadmin -p BrotherA -E -v [insert the USB URI from lpinfo -v] -m BrotherA.ppd
    The USB URI gained from lpinfo -v may be lengthy and will include a serial number.
  7. (GUI, more reliable) Go into CUPS, http://localhost:631/, and "add printer"
  8. inside the admin page, select "have PPD" and go to it in /usr/share/model to install the PPD from step 4 above.
  9. still in the admin page, "maintenance" has the menu to set the device as the default printer.
  10. # systemctl restart cups.service

the reason this conflict is a 3 day kludge

The solution is above, but here is the days-long struggle. We used to just install the printer with the CLI in about 10 minutes.

freedesktop: systemd and now colord

Freedesktop.org developed systemd of course. It's fairly comprehensive and complex, compared with the old Init.d flexibility of simple config files. Recently freedesktop began taking control of session color options too, and by device. Questionable. But in a horrible design decision, their colord daemon was given control not just of colors, but the DBus, apparently to make device queries/directives for color options. In practice, this means simple software conflicts about colors can lead to Dbus failures. Enter CUPS, attempting to set printer color profiles. Disaster.

CUPS circularity

It's a vicious circle. CUPS needs to write a PERL printer profile which contains (among many other settings) color settings. Meanwhile colord retains absolute control of color by controlling the DBus connection. When CUPS detects that colord has already created a device profile, it aborts its entire profile creation attempt, the one with all the settings for the printer. The result of no CUPS profile/filter is a half-installed printer. The printer doesn't print, and the error message "filter error" appears in the CUPS dashboard.

These errors can also be found in the error_log. If we attempt to install a printer, when colord is operating, we'll see the following:

$ cat /var/log/cups/error_log
W [16/Dec/2022:01:50:02 -0800] CreateProfile failed: org.freedesktop.ColorManager.AlreadyExists:profile id \'BrotherA-Gray..\' already exists

...and we can observe the details of the conflicting profile which colord had already installed...

$ colormgr get-devices
Object Path: /org/freedesktop/ColorManager/devices/cups_BrotherA
Owner: root
...
Colorspace: gray
Device ID: cups-BrotherA

colord's phony paths

It's hard to work with colord because it creates phony paths. The path above, /org/freedesktop/ColorManager/[etc], does not exist. No file is there. This is a "path" only in the sense of it being a line in an XML file or an ICC file. Colord knows how to evalute the line, but there's no file.

As for the XML files which contain the phony paths, these are apprently...

/usr/share/dbus-1/interfaces

... and two homes for the ICC files defining colors, for colord and ghostscript...

/usr/share/color/icc/colord
/usr/share/ghostscript/iccprofiles

order of operations circularity

When we install the printer, colord moves more quickly than CUPS. Colord detects the printer on the dbus and creates a device color profile faster than CUPS can write a CUPS filter. So, CUPS detects an already-made ICC colord color file, logs this color conflict as an error, and exits its entire filter creation. Accordingly we cannot fix the necessary filter later from some other CUPS process: it was never created and thus cannot be modified.

It seems we must either 1) remove colord, 2) configure colord not to create a profile for the printer, 3) modify CUPS to not to create its own color profile or to accept colord's profile, 4) modify the PPD (if possible) not to seek to create a color profile, or possibly to accept colord's profile, 5) place a filter in opt from a duplicate working setup. It might also require some combination. This is a kludge.

As a short preliminary to the 4th option, the simple act of commenting out a PPD's color settings doesn't stop the conflict.

1. remove colord

So let's pull colord, install the printer, then reinstall colord after CUPS has written a filter. It seems that we can do this: CUPS claims to only use colord ICC profiles "optionally".

# pacman -Rsn colord
checking dependencies...
:: cups optionally requires colord: for ICC color profile support

Packages (2) libgusb-0.4.2-1 colord-1.4.6-1

Total Removed Size: 8.49 MiB

:: Do you want to remove these packages? [Y/n]

...meaning CUPS should be able to install the printer without colord installed. Let's delete the error log, reinstall the printer and check for new errors...

# rm /var/log/cups/error_log
# lpadmin -p BrotherA -E -v usb:/dev/bus/usb/lp0 -m brother_HLL2315.ppd
lpadmin: Printer drivers are deprecated and will stop working in a future version of CUPS.
$ cat /var/log/cups/error_log
cat: /var/log/cups/error_log: No such file or directory

No errors -- seems we're home free. Nope

colord/dbus integration overreach

The printer installed without errors, but attempts to print failed.

FindDeviceById failed: org.freedesktop.DBus.Error.ServiceUnknown:The name org.freedesktop.ColorManager was not provided by any .service files

Yes, CUPS apparently relied on color management from colord as "optional", however reliance on DBus is in no way optional for any application. And, since colord manages both colors and DBus connections to devices, if we take out color management, our printer loses DBus access. The printer cannot then communicate, eg print. A person with a similar problem seems to confirm this. Colord must apparently remain.

2. configure colord

I could find no colord configuration file. Using colormgr, we can delete a colord device profile. But... that's not the problem. The problem is when the color profile (ICC) is created -- which is conflicting with the printer installation. Again, the only time CUPS creates a print filter in /opt/ is its initial read of the PPD file.

3. configure CUPS

The /etc/cups folder has several configuration files.

  • /etc/cups/printers.conf: don't edit when CUPS is running.
  • /etc/cupsd.conf: I know this file is read during systemctl restart cups.service. I've changed log levels from Warn to Debug before and seen its effects inside /var/log/cups/error_log. It also has permissions. It does not look effective for device install conflicts.

4. modify the PPD

I could find no way to modify the PPD that solved the conflict and created a device. It *is* worthwile to go into the PPD and switch the defaults to "Letter" from "A4" if a person is in the US. Saves having to worry about updates to the device file later on which may disrupt other settings.

5. duplicate a filter

During install, the PPD wants to write a PERL filter to the directory specified in the wrapper shebang. This is the filter that is not written and which causes the print failure.

$ ls /opt/brother/Printers/HLL2315DW/cupswrapper
total 76
drwxr-xr-x 2 0 0 4096 May 24 2020 .
drwxr-xr-x 5 0 0 4096 May 24 2020 ..
-rw-r--r-- 1 0 0 18351 Apr 18 2020 Copying
-rw-r--r-- 1 0 0 15010 Apr 18 2020 brother-HLL2315DW-cups-en.ppd
-rwxr-xr-x 1 0 0 24436 Apr 18 2020 brother_lpdwrapper_HLL2315DW
-rwxr--r-- 1 0 0 7650 Apr 18 2020 paperconfigml1

We can see that the PPD copied itself over and that an executable (755) PERL script (brother_lpdwrapper_HLL2315DW) was written. "Copying" is a license but should also be copied. I put the entire HLL2315DW directory - as is - on a USB key and copied it to the non-printing system. I saved to my installation directory.

HL-L2315DW old-skool steps

Somewhere between my old post and the newest way at the top.

  • install colord, cups, do all the download and xarchive of the PPD and filter from the Brother site, described in this post, repeated here in brief
  • can rename (or not rename) the PPD to anything "BrotherA.ppd" for example, but the filter (must) keep its full given name. Its only content will be the shebang and the /opt location.
  • Attempt to install the printer normally.
    # lpadmin -p BrotherA -E -v usb:/dev/bus/usb/lp0 -m brother2315.ppd
    But now the problem is the URI. This has become extremely finnicky with colord running the dbus. So instead of usb:/dev/[etc] above, use lpinfo -v to get a better USB descriptor. Copy and paste it into the command.
  • install and unpack the entire HLL2315DW directory described above and set 755 permissions on appropriate files in the subdirectories lpd, inf, and cupswraper, as well as chown them to "0" (root). This is time consuming so for Arch users, the AUR package will handle this step

Wednesday, December 14, 2022

ldd - libraries for deficient applications

As noted , there are times when an app is missing a necessary library, which is a PITA.

The process is to LDD the for the missing lib, and then pacman -F to find which package has the lib. In this case, the poster used.

$ ldd /usr/lib/cups/filter/rastertortlabel

...

libcrypt.so.1 => not found

...
$ pacman -F libcrypt.so.1 # to find which package to install