SQL is an abbreviation for structured query language and is a critical skill for any analyst.
In this post I'm going to introduce the key concepts of SQL and help you on your way to learning this powerful language.
I learnt SQL on my own from free online resources and today I use it to provide value to my clients, and to help grow my business. I learnt the fundamentals of SQL in a few hours of study and so can you.
SQL is a language used to talk to databases. If you need to fetch, modify or delete data from a database you'll need to write commands in SQL. We call these commands queries. An analyst may write thousands of queries in their career.
Analysts work with data and therefore need to know how to talk to databases. SQL is one of the most important skills for analysts but thankfully learning SQL is quite easy.
SQL is the most common querying language but it isn't the only language used to interact with data. SQL is typically used to query relational databases like MySQL, Postgres and BigQuery. An example of a non-relational database is MongoDB.
Imagine you need to pull a specific set of data from your company's database. Let's say you want a list of users and their respective ages and genders, and you'd like to organize these users by their respective organizations.
In your database you may have the following:
Table #1 - Users
The first table holds the users of the company's product. As you'd expect, we have a row per user with "id" as the primary key. Then we have a number of columns which tell us about the users. Notice the last column, organization_id. This is a foreign key which allows us to join users to their respective organizations. This column can be used to join the users table to the organizations table.
Table #2 - Organizations
The second table holds the organizational information of our users.
So now that we have identified where the data we need resides we can structure an SQL query to fetch the relevant data.
Below is an example of an SQL query with a medium-level of complexity. This query will help us pull a list of users from the example database discussed earlier. Don't worry, I'm going to walk you through it step-by-step.
users.id as user_id,
users.gender as user_gender,
users.age as user_age,
organizations.id as organization_id,
organizations.name as organization_name
organizations on organizations.user_id = users.id
users.is_deleted = 0
The first part of the query is used to tell the database that we are either fetching, updating or deleting data. I've colored this piece of the query in purple. In our example we have a "SELECT".
The most common query that you will write will be "SELECT" queries. These queries are used to fetch data from the database.
The next part of the query holds a list of columns that we are requesting from the database. Remember, we are talking about standard, relational databases that have tables made up of columns and rows. In order to tell the database what to show us, we need to indicate which columns we want.
In the example the list of columns are colored in orange. We see 5 columns, id, gender and age belonging to the users table, and id and name belonging to the organizations table. The "as" statements that follow each column are called aliases. We're able to rename the columns that we are returning to anything we want. It is a good habit to set aliases since you'll often want to standardize the naming conventions of your columns.
The 3rd part of the query is to tell the database from which table to fetch the data. In our example it is the users table. But hold on, we also requested columns from the organizations table so how does this work?
The reason we were able to bring columns from 2 tables was because we did a join. Notice the section in black. This part of the query tells the database to join two columns so that data from the second table can be included.
Joining tables is very common and one of the more complex aspects of SQL.
The last part of the query is our "WHERE" clause. The "WHERE" clause is used to filter out data we don't want. In our example we are telling the database to return users which are not deleted (is_deleted = 0).
Note how I wrote the example query. Do you see how I capitalized the main commands and spaced out the query in an organized way? As an analyst you should start writing SQL in a clean and organized manner from day 1. This will help you get into the habit. Try and QA a 40 line query which is written in an unorganized manner and you'll understand why I've made this suggestion.
To fetch all data from a table in SQL you simply need to use the asterisk operator (*). An example of such a query can be seen below.
SQL is a powerful, popular language with a lot of advanced functionality. The example above simply pulls data as is from two tables but SQL can be used to manipulate your data before the output is returned.
Lets say that instead of returning a list of users, I wanted to count the number of users belonging to each organization? In this case I'd add the group by and count functions to my query.
Below is a list of the most frequently used SQL functions:
There are a number of ways you can learn SQL online. Below are a list of my favorite resources.
Below are some of the best YouTube resources I found on SQL.
Thanks for reading.