Sunday, March 1, 2020

toolbox - data

Statistics knowledge comes first, of course. Looking at Python apart from Jupyter, we can make some data-related assumptions about modules. Now of course, Google Colab is even easier than arranging Jupyter or Virtual Environments on my own system, so let's leave aside system setups and sandboxes which I cover in another post on environments and their variables.

Data Science, which I think of "dynamic statistics" is overtaking classical statistics. in Classical stats, we had to accurately create hypotheses before null-testing them. In dynamic statistics one must have accurate code to let the data bubble up its own conclusison. In software, Python is rapidly overtaking R, esp since Google made Colab and TensorFlow available via browsers, impossibly. On a single system, it's more complex, as noted above. For learning Python in a Data Science centric manner, practicals can be stock or derivatives market, climate, or epidemial information. To try models against wall street quants, one can play with models at quantopian.com.

Getting Started with Colab (7:17) ProgrammingKnowledge, 2020. Intro to what is essentially Jupyter notebook, cloud version, which Google now hosts.
TensorFlow in 10 minutes (9:00) edureka, 2019. Google recently began including TensorFlow into its Colab, so we now have a complete machine learning environment.
How I Would Learn Data Science (8:35) Ken Jee, 2020. Several websites and specific methods, 5 years for this guy. He emphasizes practical projects.
Practical Scraping (31:56) Computer Science, 2020. Colab project. How to work the practical on Python. He gets to his function around 15:00, prior is visualizing.

type note
Geanytext-based for nearly any code via plugins
Jupyterweb-integrated Python, designed to display output in a browser.
Eclipsejava-based, takes plugins for, eg RStudio
RStudioR-specific IDE
PSPPGNU version of SPSS. Does most. $ yay -S pspp. GUI psppire
gretleconometrics. $ yay -S gretl. GUI gretl
octaveGNU version of MATLAB $ pacman -S octave. GUI octave

Data and Statistics (code)

10 Python Tips (39:20) Corey Schafer, 2019
Python NumPy overview w/arrays (58:40) Keith Galli, 2019
Jupyter - Python Sales Analysis Example (1:26:07) Keith Galli, 2020
Pandas - Data Science Tutorial (1:00:27) Keith Galli, 2018 CSV reading, beautiful soup
Python Stock Prediction (49:48) Computer Science, 2020
Beautiful Soup stock prices (10:47) straight_code, 2019
Options analysis in Python (1:02:22) Engineers.SG, 2016 Black-Scholes (emotional volatility) in Pandas.
Derivative analytics in Python (1:29:27) O'Reilly, 2014 Data frames and Monte Carlo (brownian).

Data and Statistics (classic)

Combinations vs. Permutations (20:59) Brandon Foltz, 2012 For either finite math or stats
Linear Regression Playlist (multiple) Brandon Foltz, 2013
Covariance basics (26:22) Brandon Foltz, 2013 stock examples, vs correlation/linear regression
Pandas - Data Science Tutorial (1:00:27) Keith Galli, 2018 CSV reading, beautiful soup
Python Stock Prediction (49:48) Computer Science, 2020

No comments: