Statistics overlaps into Math of course. We can record samples into a CSV or spreadsheet and do histograms, and (Algebra 1) regressions. Yet CSV data are arrays. Arrays are matrices, and thus vector, evaluable (Calc 3), as well as Python and R friendly. IMO, because it spans both simple and complex Math, and CS, Stats are a fun side project during one's Math progression, at any time.
Basics are Algebraic if one uses tables for area under curves. If not, a person could do them using simple Calc 1 calculations. TI-84's -- if available -- are also great for checking that kind of calculation (eliminating tables and Calc) and for offloading the drudgery of long lists.
HS AP Stats is essentially an Algebra-only college Stats1 class, and slightly more user friendly. A review book for AP Stats, the Princeton Review, Table of Contents is a list of roughly 35 subjects, split over 4 categories. I've used it below with some modifications to describe what a person might want. Beyond basic or AP, I've added a "next level" section at the bottom, since matrices connect Math to Statistics and arrays/data structures connect Stats to Data Science.
Introductory data data collection tabular methods graphical methods - qualitative graphical methods -quantitative numerical methods - continuous boxplots add/mult a constant comparing mult. groups bivariate data: covar,regression categorical frequency Sampling and experiments plan a study data collection plan a survey bias in surveys plan an experiment Anticipating outcomes probability |
random variables probability dist. - discrete random var's probability dist. - continuous random var's normal distribution combining independent random var's sampling distributions Statistical inference confirming models parameters point estimation interval estimation confidence interval inference: significance tests hypothesis: testing and accepting estimation and inference: population proportion estimation and inference: population mean estimation and inference: 2 population proportions estimation and inference: 2 population means inference: categorical data |
For a comprehensive course, there's Khan Academy. On YouTube, I prefer Brandon Foltz, M.Ed video compilation, esp at 0.5x speed. They were made 2010-ish, but ahead of their YouTube time1. Below there are some from Foltz and from many great teachers or vids which include nuggets in some way. As one goes along in review, it's a chance to relearn the annoying Greek symbols for parameters. The same Greeks as in finance, but different usage.
distributions of data obtained
degrees of freedom, cltTo Z or t (38:16) Brandon Foltz, 2012. The notion behind this choice, without calculation. Almost all stats are samples, not entire population, ie, "the prevalence of depression of those over 65" - how to give all questionnaire? Can't. The smaller the number, the more chance of error. Avg temp in NYC, what if we just used a day, or 5 days? The larger the sample the more confidence we have captured reality.16:00, 33:00 decision. 19:00,30:00 degrees of freedom
Buying land (32:06) Brantley Blended, 2018. listing, PLAT recency, survey recency (due diligence period), boundaries,
bivariate data: linear regressions, covar
covariance (5:55) Ben Lambert, 2013. intuition behind covariance. Positive, negative, and none. Some formulas, but conceptual.
Variance vs. covariance (Webpage) Investopedia.
combinatorics, permutations, probability
nCr and nPr. These are the denominators of probability, since they give the universe by which we determine our odds. If "statistics" are the collected data, nested inside we have probability (based on prior events, statistics), and inside this is the "combinatorics and permuatations" which create the denominators of probability fractions.
Permutations and Combinations (17:40) Organic Chemistry Tutor, 2017. Simple review of when to use nCr, nPr, or just the factorial by itself. 14:00 How many ways can we arrange the letters in the word "Alabama".
problem - probablility (replace/don't replace) (10:11) Amy Krusemark, 2020. a slight political note at start, AP question. example of non-replacement probability, example of understanding permutation/combinatorics effecting denominator. 2:40 without other notice, 0.05 alpha is threshold for a reason to doubt.
next level
KL Divergence
KL Divergence
KL Divergence (18:13) ritvikmath, 2023. Non-negative comparison of two distributions.
vectors and matrices in statistics
As my friend Bart texted...
Matrices are really just notation for a list [of] equations. Not so profound, not for a long time. Like if I have ten data points then i have that equation ten times. Ten y values. Ten x1 values, ten x2 values, etc. Thirty diff x values, so x could be written as a 10x3 grid. Ok enough [of] that
Beta is 1x3 and and x is 3x10. Then when multiplied out gives a 1x10 vector. To say that vector equals the 1x10 y vector is to say each component of y equals the corresponding for RHS. ie ten equations
And of course wherever we have a matrix, we have a potential vector.
Comp Sci v. Data Sci Matrix (10:34) ritvikmath, 2019. Reveals some clear differences between computer matrix use and math/datasci use and how they overlap (eg. CompSci efficiency can work on any math app).
data structure
necessary for data science. Data Science will evaluate these further, sometimes using Calculus, but we at least need to know what they are, IMO. Not on the AP test. The terminology transform is Statistics and Math use the term "matrix", but Data science/Computer science use the term "array". Depending on what we need to do with the matrix, computing will perform some function on a data array. Usage: imagine an R2 scatter plot, but where we have a third dimension with error information attached to each data point.
Linear Algebra: Transformational Matrices Part I (15:43) Computer Science, 2021. transformations on R2 matrices, using geometric examples. There's an entire playlist that's valuable.
Linear Algebra: Transformational Matrices Part II (9:23) Computer Science , 2021. transformations on R3 matrices, as we might do with data structures.
Trig Functions (9:15) PatrickJMT, 2011. this hero scores yet again. Just in case you need them again for the stuff above. So rotten.
1,2, and 3d structures (8:32) GridoWit, 2017. basic terminology and location tracking within different types of arrays and data structures. The inuition that arrays solve a storage problem: we don't want to have a new variable for each piece of data we have collected. C syntax is also provided.