Tuesday, February 8, 2022

statistics -- simplistic error catching in studies

Note: I've also included some statistics-relevant information from my 2020 Maths post within a "resources" section at the bottom of this post.

This list so far only includes traditional "p-value" errors, and not Bayesian style errors. But when we look at published papers in Nature or at the NCIB, what are some rule-of-thumb errors we can look for?

short list

  • correct test for the hypothesis
    Look for a hypothesis to be clearly stated and that the null has been rejected. That said, having pre-determined hypothesis and the requirement for rejection of null is an outdated method to some: a more modern approach is to crunch an immense amount of data through arrays and let the data itself identify anomalies from which to draw conclusions. However, this method is still used in many studies required by, eg. the FDA and many scientific papers.

    Types of tests (19:53) Amour Learning, 2019. Generally t tests make less presumptions than Z tests and are better. 12 main types of tests and when to use for which hypothesis.
    Selecting the correct test (12:31) Erich Goldstein, 2015. What type of data (categorical or quantitative)

  • a priori standards
    Is the alpha value determined prior to the study and clearly stated? A typical alpha value of 0.05, looks for a 1 in 20 chance the results would happen by chance, and obviously an alpha value of 0.10 that 1 in 10. NCBI:nih.gov
    Counter: the most common reduction is for a third variable. I'd like to propose another: studies designed with an a priori alpha presuppose the a priori hypothesis on which is reliable. However, data gathered in the study may reveal other information. There may be some statistical significance for although it's not clinical.
  • publication bias
    My friend Bart notes
    If ten people examine a question and none of them do any datamining and only the people who happen to get a positive result submit or publish, then the stats seem ten times as strong as they are.
  • randomization problems
    Front-end randomization is a thing, but there are other error possibilities later in the study also, well-described here.
  • control group
    Perhaps too obvious to include, but included anyway

resources

p value/alpha value

Only Bayesian statistics can prove theories, but P value stuff can prove data. It's a lower standard than Bayesian but useful, of course.

What's a P value (20:30) Daniel Lakens, 2019. some accent -- good at 0.75x. Good backgrounder. No math. Notes Bayesian necessary on theories. 8:45, otherwise all we can do is rule out something (using P).
Baye's theorem (15:10) 3Blue1Brown, 2019. new information, and how to use in research. Graphically illustrated.

12 main types of hypothesis tests

As noted above, they must be matched to the hypothesis, but what are they?

concept instruction

The clearest statistics instruction videos for my time are from Brandon Foltz. "Only" an M.Ed (probably now a Phd), found ways to create clean graphics, and years before clean graphics were common. Another excellent early adopter, although only in an overlapping sense with statistics, is Derek Banas. One of his best might be his more recent comprehensive portolio using Python. I believe Derek had a job with Apple when he was 16.

spreadsheets, Python, R, SPSS

Most people cannot afford SPSS, ergo PSPP is a a good option. If using Linux, most repo's have it. In Arch, it's in the AUR. There's plenty of PSPP videos on YT, as well as SPSS videos that a person can adapt.

This guy's series is *incalculably* helpful, har har. He as MSoft AF, Vstudio, Excel, Jupyter Notebook, but it's just as easy to do all of his stuff in Google Colab with Google Sheets as-is, or download them as CSV's first and then merge them.

Pandas Merge (Excel) (9:08) Alan Hettinger, 2022. Similar to a SQL join. When we want to put things together from various tables into a combined output.
Pandas Merge (Sheets) (10:48) Tobias Willman, 2020. Adapt the Jupyter above using this guy's vid.

No comments: