# 1 - Introduction to SQL

### Learning Objectives <a href="#learning-objectives" id="learning-objectives"></a>

By the end of this lesson students should be able to

* Explain why databases are needed in Web Development
* Define what SQL is and what it is used for
* Describe what `table`, `rows` and `columns` are used for in a relational database
* Install and create a database using PostgreSQL
* Create a table in a database using PostgreSQL
* Insert data into a table using PostgreSQL
* Retrieve data from a table using PostgreSQL
* Retrieve data from a table using conditionals in PostgreSQL
* List the different kinds of data that can be held in a PostgreSQL database

### Introduction to databases <a href="#introduction-to-databases" id="introduction-to-databases"></a>

A database is a structured set of data held in a computer. It provides ways to store, retrieve and organize information.

#### Why do we need them? <a href="#why-do-we-need-them" id="why-do-we-need-them"></a>

In the past few weeks, you stored and retrieved data using files. This is fine for simple data but it can quickly become an issue as your application becomes more complex and needs to store and manipulate more complicated data. For example, imagine you want to develop the next biggest hotel booking application. You will need to store the list of hotels available for booking somewhere, and as you add more features, you will need to save users information, the reviews they post for each hotel, but also the bookings each user makes. You can see that the data you need to handle can become very complicated, especially when you need to consider that data are not static, as they can be updated or deleted. To work more effectively with data, we can then use a database, which present the following benefits:

* A database defines a structure for your data and the relationships between entities
* A database provides convenient and performant ways to safely store and retrieve data
* A database provides a mechanism to check the validity of your data

#### Different types of database <a href="#different-types-of-database" id="different-types-of-database"></a>

There are many different kinds of database and different implementations. Sometimes, a database type is a better fit to certain use case or certain problems. The most well-known database types include relational database, key/value database, graph database and document database (also known as NoSQL). For this class, we will focus specifically on relational database as they are the most widely used and supported. You can consult [DB-Engines](https://db-engines.com/en/ranking) to see a ranking of the most used database, as you can see, there is a lot of them!

### Introduction to PostgreSQL <a href="#introduction-to-postgresql" id="introduction-to-postgresql"></a>

*"PostgreSQL is a powerful, open source object-relational database system that uses and extends the SQL language combined with many features that safely store and scale the most complicated data workloads. The origins of PostgreSQL date back to 1986 as part of the POSTGRES project at the University of California at Berkeley and has more than 30 years of active development on the core platform."* (source: [postgresql.org](https://www.postgresql.org/about/))

#### What is SQL? <a href="#what-is-sql" id="what-is-sql"></a>

* Pronounced S-Q-L or sequel
* Stands for Structured Query Language
* SQL is the standard language used to communicate with relational database
* SQL statements are used to query, create, update, delete records in a database
* SQL statements are executed by a RDBMS.

#### What is a RDBMS? <a href="#what-is-a-rdbms" id="what-is-a-rdbms"></a>

* Stands for Relational Database Management System
* It is a program that processes SQL statements to manage a relational database
* PostgreSQL is a RDBMS.

#### What characterizes a relational database? <a href="#what-characterizes-a-relational-database" id="what-characterizes-a-relational-database"></a>

As mentioned previously, a relational database is a specific type of database. Data is stored in *tables* of *rows* and *columns* as per the example below:

![](https://4042228299-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MRebciU3NcuLgsX3ijf%2F-MhiFXjNay4AnWB3ya32%2F-MhiFz5yKBfisWo0_p0v%2Fimage.png?alt=media\&token=c9c3bbf9-1ca1-46ba-a197-5eaec06829e5)

**How about storing everything in one big table as shown below? Why isn't it a good idea?**

A customer could have several bookings. If the customer changes their telephone number, you would have to update every single rows for this customer with their new number, which is more prone to errors. As a general rule, try to avoid duplication of data, and instead design your system in a way that you have a single source of truth for each piece of data. The example below is **NOT** a good solution.

![](https://4042228299-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MRebciU3NcuLgsX3ijf%2F-MhiFXjNay4AnWB3ya32%2F-MhiG152-Ti509EsZ8uW%2Fimage.png?alt=media\&token=1aa216c0-0bcf-4086-ad42-5f41621c48a8)

#### Database modeling exercise <a href="#database-modeling-exercise" id="database-modeling-exercise"></a>

**Scenario:** You've been hired to create a database for a new company which wants to revolutionize the hotel booking market. The first task you've been given is to model how the company would store its data in a database. Here are your requirements:

* The company wants to store in the database all the hotels available on their website
* For each hotel, the company wants to record the name and the number of rooms. Also each hotel can have several room types and each room type has a specific price.
* The company also needs to store the information of customers who registered on their website with a name, an email and an address.
* Customers need to be able to record their bank details which consist of an account number and a sort code. Each customer can register several bank accounts if they want.
* Finally, as customers can book a room in an hotel starting on a specific date for a specific number of nights, the company wants to store the bookings.

With mentors help, model the database for this company. In particular, show the different entities, fields and relationships between each entity.

#### Check your PostgreSQL installation <a href="#check-your-postgresql-installation" id="check-your-postgresql-installation"></a>

Open a terminal in your laptop and verify the command `psql -V` returns the version of PostgreSQL. In psql, you can type use the command `help` to show the help menu. Within the command prompt, you can enter SQL statements and run them against PostgreSQL. To quit psql, enter the command `\q`.

### Communicating with the database using SQL <a href="#communicating-with-the-database-using-sql" id="communicating-with-the-database-using-sql"></a>

All commands in the following need to be entered in a psql command prompt. However, sometimes it's easier to write the code in a file and then load the file with psql. For example, if you write your SQL code in a file called `test.sql`, you can then execute it with `psql -d DATABASE_NAME -f test.sql`.

#### Creating a new database <a href="#creating-a-new-database" id="creating-a-new-database"></a>

In a terminal, create a new database named `cyf_hotels` with the following command:

```
createdb cyf_hotels
```

Then connect to your database with:

```
psql cyf_hotels
```

#### Creating a table <a href="#creating-a-table" id="creating-a-table"></a>

Data are stored in tables. Let's first create a `customers` table to hold the details of customers.

```
CREATE TABLE customers (
  id        SERIAL PRIMARY KEY,
  name      VARCHAR(30) NOT NULL,
  email     VARCHAR(120) NOT NULL,
  address   VARCHAR(120),
  city      VARCHAR(30),
  postcode  VARCHAR(12),
  country   VARCHAR(20)
);
```

Few things to mention from the SQL statement above:

* `SERIAL PRIMARY KEY` defines the column `id` as a unique identifier for each row. Moreover, this identifier will automatically incremented every time data is inserted. `id` is called the primary key of the table `customers`.
* `VARCHAR(20)` defines the column to hold text data with a maximum length of 20 characters
* `NOT NULL` defines the column as not nullable, which means that you must set a value.
* Other useful types include `INT`, `TEXT`, `BOOLEAN` and `DATE`.
* The database will reject any values which don't match the type.

#### DBeaver, a SQL client software <a href="#dbeaver-a-sql-client-software" id="dbeaver-a-sql-client-software"></a>

[DBeaver](https://dbeaver.com/) is a SQL client software application and a database administration tool. If you think create tables in the command line is a bit confusing to you, this is your solution.

* [Download Community Version](https://dbeaver.io/download/)
* [Overview](https://dbeaver.com/docs/wiki/Application-Window-Overview/)
* [Database Navigator](https://dbeaver.com/docs/wiki/Database-Navigator/)
* [SQL Editor](https://dbeaver.com/docs/wiki/SQL-Editor/)

**Exercise 1**

* Create the `customers` table in the `cyf_hotels` database.
* Verify that the table `customers` is created with the psql command `\dt` which lists the existing tables, or update the list of tables using DBeaver.
* Display the table `customers` definition with the command `\d customers` and verify that it matches what you expect, or use DBeaver
* Create a new table `hotels` in the `cyf_hotels` database with the following columns: an `id`, a `name`, the number of `rooms` and the hotel `postcode`. Use the commands above to verify that the table is correctly created.

Now that we have a table to store `customers` and a table to store `hotels`, we can create a table to hold the bookings of customers for an hotel with the checkin date and the number of nights they intend to stay:

```
CREATE TABLE bookings (
  id               SERIAL PRIMARY KEY,
  customer_id      INT REFERENCES customers(id),
  hotel_id         INT REFERENCES hotels(id),
  checkin_date     DATE NOT NULL,
  nights           INT NOT NULL
);
```

In the above, `customer_id` and `hotel_id` are called **foreign keys** as they reference an id from a different table. This set a very strong constraint as you will not be able to create a booking for a customer id which does not exist in the customers table!

**Exercise 2**

* Create the table `bookings` in your `cyf_hotels` database and verify that it is correctly created.

#### Inserting data <a href="#inserting-data" id="inserting-data"></a>

Once your `customers`, `hotels` and `bookings` table are created, you can insert data with the following SQL statements:

```
INSERT INTO customers (name, email, address, city, postcode, country) VALUES ('John Smith','j.smith@johnsmith.org','11 New Road','Liverpool','L10 2AB','UK');
INSERT INTO hotels (name, rooms, postcode) VALUES ('Triple Point Hotel', 10, 'CM194JS');
INSERT INTO bookings (customer_id, hotel_id, checkin_date, nights) VALUES (1, 1, '2019-10-01', 2);
```

The data you insert should be of the same type with your table definition. For example, the following insert statement will **fail**:

```
INSERT INTO bookings (customer_id, hotel_id, checkin_date, nights) VALUES (1, 1, '2019-14-01', 2);
```

**Exercise 3**

* Run the 3 SQL statements above.
* Insert yourself in the `customers` table.
* Insert the following 3 hotels in the `hotels` table:
  * The `Triple Point Hotel` has 10 rooms, its postcode is `CM194JS`
  * The `Royal Cosmos Hotel` has 5 rooms, its postcode is `TR209AX`
  * The `Pacific Petal Motel` has 15 rooms, its postcode is `BN180TG`
* Try to insert a booking for a customer id which does not exist in the `customers` table (for example ID `100`). What is happening and why?

#### Retrieving data <a href="#retrieving-data" id="retrieving-data"></a>

Previously, you have inserted data in your tables. How do you make sure these data have been inserted correctly? The following SQL statement is used to request data from a specific table:

```
SELECT * FROM customers;
```

**Exercise 4**

* Use the above SQL statement to display all the data inserted in the `customers` table.
* Use the above SQL statement to display all the data inserted in the `hotels` table.
* Use the above SQL statement to display all the data inserted in the `bookings` table.

#### Retrieving data with conditions <a href="#retrieving-data-with-conditions" id="retrieving-data-with-conditions"></a>

Actually, the `SELECT` statement is very powerful and you will see you can request a lot of different things with it. Have you seen the `*` character in the SQL statement above? It means that you want to see the data for all the columns of the table. What if you want to only return specific columns? For example, to retrieve all customers `name` and `address` from the table `customers`:

```
SELECT name,address FROM customers;
```

Sometimes, you want to retrieve only data which verify a specific condition. In this case, you can use a `WHERE` clause. For example, to retrieve all hotels having more than 7 rooms:

```
SELECT * FROM hotels WHERE rooms > 7;
```

To retrieve the customer name and address with id 1:

```
SELECT name,address FROM customers WHERE id = 1;
```

To retrieve all the bookings starting after 2019/10/01:

```
SELECT * FROM bookings WHERE checkin_date > '2019/10/01';
```

To retrieve all the bookings starting after 2019/10/01 for a minimum of 2 nights:

```
SELECT * FROM bookings WHERE checkin_date > '2019/10/01' AND nights >= 2;
```

To retrieve all the hotels with the postcode `CM194JS` or `TR209AX`:

```
SELECT * FROM hotels WHERE postcode = 'CM194JS' OR postcode = 'TR209AX';
```

**Exercise 5**

{% file src="<https://4042228299-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MRebciU3NcuLgsX3ijf%2F-MctcbHxY10DZkAIDQgV%2F-MctcmUQUsQojl8HShEo%2Fcyf_hotels_exercise5.sql?alt=media&token=a5ed1123-4d94-41a5-be33-869f9c8ab89d>" %}
cyf\_hotels\_exercise5.sql
{% endfile %}

* Execute the file [`cyf_hotels_exercise5.sql`](https://firebasestorage.googleapis.com/v0/b/gitbook-28427.appspot.com/o/assets%2F-MRebciU3NcuLgsX3ijf%2F-MctcbHxY10DZkAIDQgV%2F-MctcmUQUsQojl8HShEo%2Fcyf_hotels_exercise5.sql?alt=media\&token=a5ed1123-4d94-41a5-be33-869f9c8ab89d) which will reset your existing tables and insert more data in the `customers`, `hotels` and `bookings` tables. (hint: in the terminal, use `psql -d cyf_hotels -f cyf_hotels_exercise5.sql`).
* Retrieve all information for the customer Laurence Lebihan.
* Retrieve all customers name living in UK.
* Retrieve the address, city and postcode of Melinda Marsh.
* Retrieve all hotels located in the postcode DGQ127.
* Retrieve all hotels with more than 11 rooms.
* Retrieve all hotels with more than 6 rooms but less than 15 rooms.
* Retrieve all hotels with exactly 10 rooms or 20 rooms.
* Retrieve all bookings for customer id 1.
* Retrieve all bookings for more than 4 nights.
* Retrieve all bookings starting in 2020.
* Retrieve all bookings before 2020 for less than 4 nights.

### Reference material <a href="#homework" id="homework"></a>

Learn more about PostgreSQL using these [tutorials](https://www.postgresqltutorial.com).
