Let's move on from our recipe database that we crafted. While before I was showing you how to set up schemas and how to write to a database, the next sections are going to be more focused on reading from a large dataset.
The database I've prepared for us to work on is the Open Media Database. This is a wonderful project to catalog movies into a useful database. As part of this they offer a database dump that is explictly created for folks to learn PostgreSQL with.
I went ahead wrapped the above into a public container and put it on Docker Hub. It's a pretty simple container, see here to see the Dockerfile.
Let's run it. Run the following in your CLI.
docker run -e POSTGRES_PASSWORD=lol --name=sql -d -p 5432:5432 --rm btholt/complete-intro-to-sql
You need to stop the recipe container for the above command to work since both will try to bind to port 5432. If you want both to run, you can do
5433:5432
instead of5432:5432
to expose port 5433 on your local machine so there won't be a conflict.
Okay, now to connect to that container in psql, do:
docker exec -u postgres -it sql psql omdb
The last omdb connects you directly to the omdb database instead of having to run
\c omdb
when you first connect
Okay, let's explore a bit.
- Run
\l
to see all database. - Run
\d
to see all tables.
We can see there are a ton of tables which means there are a lot of relations going on.
Let's look at one of them. Run \d movies
to see everything about movies. You can see all the columns and indexes (we're about to see more about indexes here in a bit.)
Before we start learning some more concepts, let's have some fun doing some queries:
Which movie made the most money?
- Some movies have null revenues. Use
COALESCE(column_name, 0)
to make nulls into 0s. - Won't need any joins here.
ORDER BY
is your friend, as isDESC
How much revenue did the movies Keanu Reeves act in make?
COALESCE
will be useful againcasts
holds the links betweenmovies
andpeople
- Multiple joins necessary
Which 5 people were in the movies that had the most revenue?
- Multiple joins here too
Which 10 movies have the most keywords?
movie_keywords
contains the connectionscategories
contains the names of the keywords- Multiple joins necessary
- This database contains multiple languages of keywords. Don't worry about it.
Which category is associated with the most movies
- Similar to the above query, just starting from a different place.