"X"...in a box: postgresql

Showing posts with label postgresql. Show all posts

Thursday, October 21, 2021

database options

Right tool for the right job. An easy case might be a password database, essentially a flat file database, Excel 2-D type of thing. A more difficult case is a video edit. You've got clips, which might expand in number, you've got recipes or methods for various transitions and combinations, and you've got an overall project view which pulls from these. Whatever we have we'd like it to be ACID compliant. Try to use umlet or some such to do the UML.

7 Database Paradigms (9:52) Fireship, 2020. Seven different ways of looking at data, with suggested applications.
issues univ of utah (19:49) Mark Durham, 2018. Has a great chart.

relational algebra, calculus

Relational Algebra is just a form of logical operators (most are shown below), easy to understand for those who've taken logic courses.

Linguistically, Relational Algebra imparted its first name, "relational", to databases constructed around its principles, ergo "relational databases". In other words, relational databases are databases built around Relational Algebra.

Relational Algebra itself was created by Edgar Codd. His main book on this is The Relational Model for Database Management, and his Turing Award paper on relational algebra (1981) is available online (PDF). The most important skill level to reach is the capacity for fast database normalizing. This takes practice, similar to the logic skill for taking statements, text or otherwise, and symbolizing.

Relational Algebra Lecture 1 (50:21) Database Psu, 2015. Penn State. The first in their amazing free online course. Lecture begins about 19:12.
Relational Algebra Pt 3 lecture (51:18) Database Psu, 2015. The various operators and their uses.

Relational calculus is a way of making queries using relational algebra to get what we want from our queries. It's less atomic than relational algebra.

relational keys (primary, foreign, composite, blank)

Link: postgres constraints ::

Relationals are a bit of a pain. See my later post on how to configure PostgreSQL for local access. Restrictions make searches easier. The main idea is a unique column (Cust ID), can then access a unique row in the next table. In the pic below, the primary is green, the foreign red. A column with a "references constraint" on it is a foreign key.

Primary and foreign keys (2:08) Udacity, 2015. Primary and foreign keys explained. Foreign key is column with references constraint. No lookup table is used, but it might be possible by indexing somehow.
Primary and foreign keys (5:24) Prescott Computer Guy, 2011. This has an intermediate lookup table. He wants every table to have an ID column, to be sure to have a unique value for that table.
Postgres version (5:42) Trent Jones, 2020. Unfortunately uses PgAdmin not DBeaver.
SQL from teaching perspective (10:12) Mike Dane, 2017. Exam configuration

primary key: Postgres/SQL constraint equivalent - NOTNULL, UNIQUE
foreign key: Postgres/SQL constraint equivalent - none. Verifies that a column from keyed table always has a value in another table (is linked).

constraints

PostgreSQL	MySQL
PRIMARY KEY	PRIMARY KEY
FOREIGN KEY	FOREIGN KEY
UNIQUE (column)	UNIQUE
NOT NULL/NULL (null allowed)	NOT NULL
CHECK	CHECK
DEFAULT	DEFAULT
CREATE INDEX	CREATE INDEX
EXCLUDE

start-up/shutdown (postgresql)

See my other post for how to configure as user. That done, it's a pretty simple....

$ postgres -D '/home/foo/postgres/data'

With the server is up and running, you can open another terminal and run psql commands. If you run as user postgres (see below), you'll have admin authority.

$ psql postgres

If a lot of data has been written, a good exit phrase to make sure all is safely written:

$ psql -c CHECKPOINT && pg_ctl stop -m fast

Saturday, September 25, 2021

csv, sms, postgres, dbeaver, python project?

I began this post 10 years ago only thinking about a LAPP or LAMP. Even that was too wide a scope. This post only covers importing a CSV into a stand-alone PostgreSQL instance. We can then take the data for additional processing or practice or whatever. Or for Business Intelligence (BI), since every database is an opportunity for BI meta-analysis. There are several pre-made tools for this, but they may not address our focus. They're still worth a look.

dash and plotly introduction (29:20) Charming Data, 2020. A bit outdated, but the fundamental concepts still helpful.
PBX - on-site or cloud (35:26) Lois Rossman, 2016. Cited mostly for source 17:45 breaks down schematically.
PBX - true overhead costs (11:49) Rich Technology Center, 2020. Average vid, but tells hard facts. Asteriks server ($180) discussed.

file/record plan issues

We can save psql and sql scripts as *.psql and *.sql files respectively. If we use, eg DBeaver, it will generate SQL scripts, which we can subsequently transform, with some work, into PSQL scripts. We may wish to learn more Math also, though this can be difficult on one's own.

Back to our project. At the lowest levels, we need a system which

provides document storage. folders receive a number relevant to their path, and we save documents in them without changing the file names. Of course this has to account for updates of new file addition or any which are retired/deleted.
queries either on meta information (dates), or upon document content (word search)
reports on the database itself; its size, category structures, number of files pointed to, and so on. This is good BI app territory.
back-up configurable or cloud hosting
if office access, possible GUI to delimit employees queries. browser friendly desktop development required for a GUI (PyGObject - GTK, or QT), maybe even Colab.

Python is now flexible enough, especially through become advanced enough to establish client side connections with a database which used to require PHP.


	Grafana apparently works well in a time-series but also could allow us perhaps display counts minutes or numbers of texts tied to various numbers. Pandas and even R have some visualization options.
	Dash is another API similar to Grafana. We can use one or the other, or GraphQL.
	Fluentd can help us if we need to manage several relevant logs around this database project.
	Logs are somewhat specific, but prometheus monitors selected system traits - memory use, HDD access. It can also do this over HTTP in a disributed system, as described here. These kinds of time series integrate well with grafana.

	No particular logo, but SMS related, being able to run some kind of programs against databases to find mean, mode, and standard deviation, even in a table. For example a report that showed average number of texts, word length, top 3 recipients and their tallies, etc. Easily run queries against a number and report to screen. Use browser to access DB.

SMS

A possible project for document managment may be to establish SMS message storage, query and retrieval. The data are row entries from a CSV, rather than disparate files with file names. A reasonable test case.

pacman postgresql dbeaver python. By using dbeaver, we don't have to install PHP and Pgadmin. Saves an extra server.
activate postgreql server
import CSV into Gnumeric and note column names and data types. This can even be codified with UML
populate database with correct column names and formats. Also check data file with text editor to verify comma numbers are accurrate, etc
Python script or manual dBeaver to import CSV file(s).
rclone backup of data files to Cloud

PostgreSQL - install and import CSV (11:19) Railsware Product Academy, 2020. Also PgAdmin install and walkthrough.
PBX - on-site or cloud (35:26) Lois Rossman, 2016. Cited mostly for source 17:45 breaks down schematically.
PBX - true overhead costs (11:49) Rich Technology Center, 2020. Average vid, but tells hard facts. Asteriks server ($180) discussed.

1 & 2. postgresql install/activate (1.5 hrs)

Link: Derek Banas' 3.75 hr PostgreSQL leviathan

See my earlier post on the steps for this. I rated the process at 2 hrs at the time. However, with the steps open in a separate page, 1.5 hrs seems reasonable.

At this point, we should have a PostgreSQL server (cluster) operating on a stand-alone system out of our home directory, with our own username, eg "foo". We can CREATE or DROP databases as needed as we attempt to hit upon a workable configuration.

$ postgres -D /home/foo/postgres/data
$ psql -U foo postgres
postgres=#: create database example;
postgres=#: drop database example;
OR
postgres=#: drop if exists database example;
postgres=#: exit

It would be nice to have a separate cluster for each import attempt, so I don't have to carefully name databases, but I don't want to run "initdb" a bunch of times and create a different data directory for each cluster, which is required. So I'm leaving everything under "postgres" cluster and when I get the final set of databases I like, I'll do a script (eg. *.psql) or some UML so I can re-install the solution after deleting the test project.

3. csv import

Most things on a computer can be done 100 different ways so I started with the simplest -- psql and moved to other ways. This is all covered below. But my first step was to pick a CSV of backed-up SMS's for the process, and clean it.

3a. csv cleaning

I first selected "backup.csv" with 7 columns and 456 rows. I simply opened it with Gnumeric and had a look. I noted advertising in the first and last rows and stripped these rows from Geany. This left me a 7x454, with the first row the column titles. What's interesting here is some of the texts had hard returns in their message contents, so that there were 581 lines. I therefore made a second version, 7x10, with no returns in the message contents; "backup02.csv", for simplest processing.

3b. data types

The efforts below have taught me that we need to understand several basic data types. There were a lot of failures until then. I found this video and this post helpful. Matt was only making a table for a two column price list, but it gives us a start.

And here the more complex scenario Raghav skillfully addresses with multiple tables.

3c. monolithic import attempt

Without considering keys or any normalization, let's try to just bring in an entire CSV, along the lines of Matt's import above. Of note, the times were in the format, "2021-03-07 21:40:25", apparently in the timezone where the call occurred.

$ psql -U foo postgres
postgres=#: create database test01;
postgres=#: \c test01
test01=#: create table try01 (between VARCHAR, name VARCHAR(30), phone VARCHAR(20), content VARCHAR, date TIMESTAMP, mms_subject VARCHAR(10), mms_link VARCHAR(20));
test01=#: select * from try01;

test01=#: COPY try01 FROM '/home/foo/backup01.csv' DELIMITER ',' CSV HEADER;

This accurately brought in and formatted the data, however it also brought a first row that was all dashes, and I don't really need the 1st, 6th, and 7th columns. I looked at the PostgreSQL documentation for COPY. The columns look easiest to fix so i created a smaller table without them.

test01=#: create table try01 (name VARCHAR(30), phone VARCHAR(20), content VARCHAR, date TIMESTAMP);

test01=#: COPY try01 FROM '/home/foo/backup01.csv' DELIMITER ',' CSV HEADER;

4. dbeaver tweaks (1 hr)

The most important thing in DBeaver, after we've connected to our server, is to remember to R-Click on the server, go into settings and select "Show all Databases". If you forget this step, you will go insane. I didn't know about that step and... just do it. The other thing is a helpful "Test Connection" button down in the L corner.

Finding that some columns have dates, times, numbers, and do we want to use the telephone number as the primary key? Once we have a working concept, we'll want to UML it.

dbeaver - some basics (16:23) QAFox, 2020. Begins about 10:00. Windoze install but still useful.
PBX - on-site or cloud (35:26) Lois Rossman, 2016. Cited mostly for source 17:45 breaks down schematically.
PBX - true overhead costs (11:49) Rich Technology Center, 2020. Average vid, but tells hard facts. Asteriks server ($180) discussed.

python

As noted above, however, we are now interested in letting Python do some of the scripting, so that we don't need two languages. To do this, we install PL/Python on the PostgreSQL side. Other options are available for other languages too -- for example if we want to run statistical "R" programs against a PostgreSQLdatabase, we'd install PL/R. Or we can write-out PL/pgSQL commands and put them into Python or R scripts if we wish.

On the Python side, we obtain modules from PyPi, such as Psycopg2 (also available in pacman repos in Arch). With PL/Python in the database, and Psycopg2 modules in our Python programs, our connections and commands to PostgreSQL become much simpler. And of course, one can still incorporate as much or as little PHP as they wish, and we'd still use judicious amounts of JavaScript in our Apache-served HTML pages. To summarize, in a LAMP we might want:

server-side

PostgreSQL - open source database.
Python - general purpose language, but with modules such as Psycopg2, we can talk to the database and also serve dynamic webpages in HTML, replacing PHP, or even create some GUI application which does this from our laptop.
Apache - still necessary, since this is what interacts with the client. Pytahon or PHP feeds the dynamic content into the HTML, but Apache feeds these pages to the client (browser).
PHP - still available, if we want to use it.

Monday, March 29, 2010

Python - PHP - PostgreSQL - Android - C++

Links: Python GUI thread wxPython tutorial

A lot to learn, but probably the only way to properly get to the point where I can catalog files. PHP first (web-based), Python next (desktop), PostgreSQL (desktop, but growing on web), and then step-up from Python to C++. C++ is apparently more efficient than Python, but takes deeper knowledge. I'm not fond of Java until its Run Time Environment becomes more refined and stable.

Python flavors
The issue with Python is apparently that, once one gets to a GUI level with it, there are two main flavors. One, TKinter is essentially Python using the TK/GTK libraries. wxPython has its own library set. It seems that wxPython is more on a growth path than the older Tkinter. I read one description that said so. So far, I've been happy with wxPython.

Python/Postgresql
This does not seem problematic. It imports a module called psycopg2 which seems to have hooks for the DB. At this site, there is a simple tutorial about how to make the connection.

For programming, using arrays is the real.

Python/MySQL
Don't care, but have to learn it.

PHP
Difficult part is using arrays in best possible ways.

Sunday, November 23, 2008

postgresql - user level install

Want to set-up a relational database and have browser access through localhost on our local machine. If successful, we can next learn cloud install or connect clusters. DBeaver is installed as our GUI admin tool. PgAdmin is good when running but often spawns lib errors. Regardless, the install portions of administration are CLI.

installation steps (2 hrs)

1. server (1 hour)

An entirely initalized PostgreSQL cluster has two parts, a database and a server. The server acts like Apache and manages connections. We start with that portion first, the databases need to be created from within the server. The one exception is when we run "initdb", a default database "postgres is created, so it can communicate with an admin tools.

Pacman installs the PostgreSQL server into a "bin" directory. This is fine. But, we want all configuration and data writing into our home directory, and we don't want to have to become another user or admin to run PostgreSQL. The idea is simple backup, portability, and user-level permissions; nothing root or weird PAM errors.

1a. install server

# pacman -S postgresql dbeaver

Postgresql install is about 51MB, not too bad. Service runs on port 5432 (change this in postgresql.conf ), and is not a systemctl service. At the user level we can start it, and stop it without becoming root.

1b. intitialize default cluster

Like any other application, I want to access all PostgreSQL files and tasks at the user level: never want to become root or change to another user. Yet when pacman installs Postgresql, it creates a "postgres" group in /etc/group and does not append $USER (eg. 'foo') to this group.

When initdb initializes PostgreSQL, it creates a default cluster and DB named "postgres" and puts them in /var/lib/postgres/data. No! I want the files in the home "foo" directory, like any other app.

Solution: 1) Append ourselves ("foo") to the "postgres" group in /etc/group. 2) run $ initdb and specify our home directory as the data repository.

# nano /etc/group [add self to postgrep group]
$ mkdir postgres
$ initdb -D /home/foo/postgres/data

1c. lockfile hack

There's a third permissions issue. When PostgreSQL runs at user-level, it is thwarted attempting to write a session lockfile into a root controlled /run/postgresql folder. To properly solve this error probably requires manipulating groups, but I simply changed the permissions on /run/postgresql to $USER. The problem disappeared.

# mkdir /run/postgresql/
# chown -R foo:foo /run/postgresql

1d. start server

With all the above accomplished, we can start the server as a user.

$ postgres -D '/home/foo/postgres/data'

1e. make permanent

We'd prefer not to have to delineate the data directory each time we start the posgres server. So we want set the path correctly. This initdb wiki notes the "PGDATA" variable added to the /home/foo/.bashrc file will make this permanent.

$ nano .bashrc
export PGDATA="/home/foo/postgres/data"

Logout and back in and then verify using "$ listenv". Now we can start our cluster (server) with $ postgres

1e. logs, tables, configuration

Unless modified in postgresql.conf, logs will appear in our home directory. Tables are in /home/foo/postgres/data.We can also configure the server, apart from the database and cluster, in /home/foo/postrgres/data/postgresql.conf. This would arrange which port we listen on, and other features. The security related conf is pg_hba.conf. There are others. All of these control the server, not the databases themselves

2. connect to defaults (10 minutes)

With a lot of work, our server environment has been completed. default "postgres" cluster and database were created by the initdb process. Let's leave them so we don't disrupt any administrative dependencies, but let's also move on to creating our own cluster and databases.

From the CLI, there is plenty we can arrange without dbeaver. DBeaver kicks butt once our databases are filled with data. For now though, to check our connection and create databases, we want to use the psql command. Our default username is "foo" when we connect because that's the user we ran initdb. The default database "postgres" however also must be called. So here is our connection command.

$ psql -U foo postgres

Again, there are three entitites: the cluster and the default DB are named "postgres", and the username is our own home directory name.

3. create clusters and databases (whatever no. of hours)

Our server environment is complete and our plans for data begin. We might have created a UML schema or it might be trial and error against a CSV or something else.

We're connected to the database through psql, so now we want to make a cluster and a database

BTW, there are many pages about dropping clusters and databases, and we can delete them from the data folder as well, so it's not that difficult to drop. Ok, but let's add.

4. dbeaver

Now that we're at step 3, we can add dbeaver to visualize our setup. Dbeaver is java and so needs JDBC compliant databases. Postgres is Type 4 java-compliant, inherently java-compatible. Dbeaver connects to a user at startup, 'postgres' by default, so change it to your $USER, eg "foo".

concepts

Roles and users

Somewhat confusing -- an imperfect analogy is that roles are governments and users are those operating within those governments. None of these will be created until we create a database. Just turning on the server ("postgres" command) does not create a database nor its associated roles.

postgres - users and roles (22:52) E-Multi-Skills Database Tutorials, 2020. users can login, roles cannot. Role is a function, user is a client. Dbeaver seeks to connect to a role. It will operate as an agent so it nees a role with significant privileges.
postgres - roles, schema, security (32:31) databasetorque, 2021. role can cally.
PBX - true overhead costs (11:49) Rich Technology Center, 2020. Average vid, but tells hard facts. Asteriks server ($180) discussed.

"X"...in a box