Which query is used to combine records from multiple tables?

To combine data from two tables we use an SQL

SELECT surveys.year, surveys.month, surveys.day, species.genus, species.species
FROM surveys
JOIN species
ON surveys.species_id = species.species_id;
7 clause, which comes after the
SELECT surveys.year, surveys.month, surveys.day, species.genus, species.species
FROM surveys
JOIN species
ON surveys.species_id = species.species_id;
8 clause.

Database tables are used to organize and group data by common characteristics or principles.
Often, we need to combine elements from separate tables into a single tables or queries for analysis and visualization. A JOIN is a means for combining columns from multiple tables by using values common to each.

The JOIN keyword combined with ON is used to combine fields from separate tables.

A

SELECT surveys.year, surveys.month, surveys.day, species.genus, species.species
FROM surveys
JOIN species
ON surveys.species_id = species.species_id;
7 clause on its own will result in a cross product, where each row in the first table is paired with each row in the second table. Usually this is not what is desired when combining two tables with data that is related in some way.

For that, we need to tell the computer which columns provide the link between the two tables using the word

SELECT surveys.year, surveys.month, surveys.day, species.genus, species.species
FROM surveys, species
WHERE surveys.species_id = species.species_id;
0. What we want is to join the data with the same species id.

SELECT *
FROM surveys
JOIN species
ON surveys.species_id = species.species_id;

SELECT surveys.year, surveys.month, surveys.day, species.genus, species.species
FROM surveys, species
WHERE surveys.species_id = species.species_id;
0 is like
SELECT surveys.year, surveys.month, surveys.day, species.genus, species.species
FROM surveys, species
WHERE surveys.species_id = species.species_id;
2. It filters things out according to a test condition. We use the
SELECT surveys.year, surveys.month, surveys.day, species.genus, species.species
FROM surveys, species
WHERE surveys.species_id = species.species_id;
3 format to tell the manager what column in which table we are referring to.

The output from using the

SELECT surveys.year, surveys.month, surveys.day, species.genus, species.species
FROM surveys
JOIN species
ON surveys.species_id = species.species_id;
7 clause will have columns from the first table plus the columns from the second table. For the above statement, the output will be a table that has the following column names:

record_idmonthdayyearplot_idspecies_idsexhindfoot_lengthweightspecies_idgenusspeciestaxa …             96820199712DMM3641DMDipodomysmerriamiRodent …             

Alternatively, we can use the word

SELECT surveys.year, surveys.month, surveys.day, species.genus, species.species
FROM surveys, species
WHERE surveys.species_id = species.species_id;
5, as a short-hand.
SELECT surveys.year, surveys.month, surveys.day, species.genus, species.species
FROM surveys, species
WHERE surveys.species_id = species.species_id;
5 only works on columns which share the same name. In this case we are telling the manager that we want to combine
SELECT surveys.year, surveys.month, surveys.day, species.genus, species.species
FROM surveys, species
WHERE surveys.species_id = species.species_id;
7 with
SELECT surveys.year, surveys.month, surveys.day, species.genus, species.species
FROM surveys, species
WHERE surveys.species_id = species.species_id;
8 and that the common column is
SELECT surveys.year, surveys.month, surveys.day, species.genus, species.species
FROM surveys, species
WHERE surveys.species_id = species.species_id;
9.

SELECT *
FROM surveys
JOIN species
USING (species_id);

The output will only have one species_id column

record_idmonthdayyearplot_idspecies_idsexhindfoot_lengthweightgenusspeciestaxa …            96820199712DMM3641DipodomysmerriamiRodent …            

We often won’t want all of the fields from both tables, so anywhere we would have used a field name in a non-join query, we can use

SELECT surveys.year, surveys.month, surveys.day, species.genus, species.species
FROM surveys, species
WHERE surveys.species_id = species.species_id;
3.

For example, what if we wanted information on when individuals of each species were captured, but instead of their species ID we wanted their actual species names.

SELECT surveys.year, surveys.month, surveys.day, species.genus, species.species
FROM surveys
JOIN species
ON surveys.species_id = species.species_id;

yearmonthdaygenusspecies …     1977716Neotomaalbigula 1977716Dipodomysmerriami …     

Many databases, including SQLite, also support a join through the

SELECT surveys.year, surveys.month, surveys.day, species.genus, species.species
FROM surveys, species
WHERE surveys.species_id = species.species_id;
2 clause of a query.
For example, you may see the query above written without an explicit JOIN.

SELECT surveys.year, surveys.month, surveys.day, species.genus, species.species
FROM surveys, species
WHERE surveys.species_id = species.species_id;

For the remainder of this lesson, we’ll stick with the explicit use of the

SELECT surveys.year, surveys.month, surveys.day, species.genus, species.species
FROM surveys
JOIN species
ON surveys.species_id = species.species_id;
7 keyword for joining tables in SQL.

Challenge:

  • Write a query that returns the genus, the species name, and the weight of every individual captured at the site

Solution

SELECT species.genus, species.species, surveys.weight
FROM surveys
JOIN species
ON surveys.species_id = species.species_id;

Different join types

We can count the number of records returned by our original join query.

SELECT COUNT(*)
FROM surveys
JOIN species
USING (species_id);

Notice that this number is smaller than the number of records present in the survey data.

SELECT COUNT(*) FROM surveys;

This is because, by default, SQL only returns records where the joining value is present in the joined columns of both tables (i.e. it takes the intersection of the two join columns). This joining behaviour is known as an

SELECT species.genus, species.species, surveys.weight
FROM surveys
JOIN species
ON surveys.species_id = species.species_id;
3. In fact the
SELECT surveys.year, surveys.month, surveys.day, species.genus, species.species
FROM surveys
JOIN species
ON surveys.species_id = species.species_id;
7 keyword is simply shorthand for
SELECT species.genus, species.species, surveys.weight
FROM surveys
JOIN species
ON surveys.species_id = species.species_id;
3 and the two terms can be used interchangably as they will produce the same result.

We can also tell the computer that we wish to keep all the records in the first table by using a

SELECT species.genus, species.species, surveys.weight
FROM surveys
JOIN species
ON surveys.species_id = species.species_id;
6 clause, or
SELECT species.genus, species.species, surveys.weight
FROM surveys
JOIN species
ON surveys.species_id = species.species_id;
7 for short.

Challenge:

  • Re-write the original query to keep all the entries present in the
    SELECT surveys.year, surveys.month, surveys.day, species.genus, species.species
    FROM surveys, species
    WHERE surveys.species_id = species.species_id;
    
    7 table. How many records are returned by this query?

Solution

SELECT * FROM surveys
LEFT JOIN species
USING (species_id);

Challenge:

  • Count the number of records in the
    SELECT surveys.year, surveys.month, surveys.day, species.genus, species.species
    FROM surveys, species
    WHERE surveys.species_id = species.species_id;
    
    7 table that have a
    SELECT COUNT(*)
    FROM surveys
    JOIN species
    USING (species_id);
    
    0 value in the
    SELECT surveys.year, surveys.month, surveys.day, species.genus, species.species
    FROM surveys, species
    WHERE surveys.species_id = species.species_id;
    
    9 column.

Solution

SELECT COUNT(*)
FROM surveys
WHERE species_id IS NULL;

Remember: In SQL a

SELECT COUNT(*)
FROM surveys
JOIN species
USING (species_id);
0 value in one table can never be joined to a
SELECT COUNT(*)
FROM surveys
JOIN species
USING (species_id);
0 value in a second table because
SELECT COUNT(*)
FROM surveys
JOIN species
USING (species_id);
0 is not equal to anything, not even itself.

Combining joins with sorting and aggregation

Joins can be combined with sorting, filtering, and aggregation. So, if we wanted average mass of the individuals on each different type of treatment, we could do something like

SELECT plots.plot_type, AVG(surveys.weight)
FROM surveys
JOIN plots
ON surveys.plot_id = plots.plot_id
GROUP BY plots.plot_type;

Challenge:

  • Write a query that returns the number of animals caught of each genus in each plot. Order the results by plot number (ascending) and by descending number of individuals in each plot.

Solution

SELECT *
FROM surveys
JOIN species
USING (species_id);
0

Challenge:

  • Write a query that finds the average weight of each rodent species (i.e., only include species with Rodent in the taxa field).

Solution

SELECT *
FROM surveys
JOIN species
USING (species_id);
1

Functions SELECT COUNT(*) FROM surveys JOIN species USING (species_id); 5 and SELECT COUNT(*) FROM surveys JOIN species USING (species_id); 6 and more

SQL includes numerous functions for manipulating data. You’ve already seen some of these being used for aggregation (

SELECT COUNT(*)
FROM surveys
JOIN species
USING (species_id);
7 and
SELECT COUNT(*)
FROM surveys
JOIN species
USING (species_id);
8) but there are functions that operate on individual values as well. Probably the most important of these are
SELECT COUNT(*)
FROM surveys
JOIN species
USING (species_id);
5 and
SELECT COUNT(*)
FROM surveys
JOIN species
USING (species_id);
6.
SELECT COUNT(*)
FROM surveys
JOIN species
USING (species_id);
5 allows us to specify a value to use in place of
SELECT COUNT(*)
FROM surveys
JOIN species
USING (species_id);
0.

We can represent unknown sexes with

SELECT COUNT(*) FROM surveys;
3 instead of
SELECT COUNT(*)
FROM surveys
JOIN species
USING (species_id);
0:

SELECT *
FROM surveys
JOIN species
USING (species_id);
2

The lone “sex” column is only included in the query above to illustrate where

SELECT COUNT(*)
FROM surveys
JOIN species
USING (species_id);
5 has changed values; this isn’t a usage requirement.

Challenge:

  • Write a query that returns 30 instead of
    SELECT COUNT(*)
    FROM surveys
    JOIN species
    USING (species_id);
    
    0 for values in the
    SELECT COUNT(*) FROM surveys;
    
    7 column.

Solution

SELECT *
FROM surveys
JOIN species
USING (species_id);
3

Challenge:

  • Write a query that calculates the average hind-foot length of each species, assuming that unknown lengths are 30 (as above).

Solution

SELECT *
FROM surveys
JOIN species
USING (species_id);
4

SELECT COUNT(*)
FROM surveys
JOIN species
USING (species_id);
5 can be particularly useful in
SELECT surveys.year, surveys.month, surveys.day, species.genus, species.species
FROM surveys
JOIN species
ON surveys.species_id = species.species_id;
7. When joining the
SELECT surveys.year, surveys.month, surveys.day, species.genus, species.species
FROM surveys, species
WHERE surveys.species_id = species.species_id;
8 and
SELECT surveys.year, surveys.month, surveys.day, species.genus, species.species
FROM surveys, species
WHERE surveys.species_id = species.species_id;
7 tables earlier, some results were excluded because the
SELECT surveys.year, surveys.month, surveys.day, species.genus, species.species
FROM surveys, species
WHERE surveys.species_id = species.species_id;
9 was
SELECT COUNT(*)
FROM surveys
JOIN species
USING (species_id);
0 in the surveys table. We can use
SELECT COUNT(*)
FROM surveys
JOIN species
USING (species_id);
5 to include them again, re-writing the
SELECT COUNT(*)
FROM surveys
JOIN species
USING (species_id);
0 to a valid joining value:

SELECT *
FROM surveys
JOIN species
USING (species_id);
5

Challenge:

  • Write a query that returns the number of animals caught of each genus in each plot, assuming that unknown species are all of the genus “Rodent”.

Solution

SELECT *
FROM surveys
JOIN species
USING (species_id);
6

The inverse of

SELECT COUNT(*)
FROM surveys
JOIN species
USING (species_id);
5 is
SELECT COUNT(*)
FROM surveys
JOIN species
USING (species_id);
6. This returns
SELECT COUNT(*)
FROM surveys
JOIN species
USING (species_id);
0 if the first argument is equal to the second argument. If the two are not equal, the first argument is returned. This is useful for “nulling out” specific values.

We can “null out” plot 7:

SELECT *
FROM surveys
JOIN species
USING (species_id);
7

Some more functions which are common to SQL databases are listed in the table below:

FunctionDescription
SELECT * FROM surveys
LEFT JOIN species
USING (species_id);
9Returns the absolute (positive) value of the numeric expression n
SELECT COUNT(*)
FROM surveys
WHERE species_id IS NULL;
0Returns the first of its parameters that is not NULL
SELECT COUNT(*)
FROM surveys
WHERE species_id IS NULL;
1Returns the length of the string expression s
SELECT COUNT(*)
FROM surveys
WHERE species_id IS NULL;
2Returns the string expression s converted to lowercase
SELECT COUNT(*)
FROM surveys
WHERE species_id IS NULL;
3Returns NULL if x is equal to y, otherwise returns x
SELECT COUNT(*)
FROM surveys
WHERE species_id IS NULL;
4 or
SELECT COUNT(*)
FROM surveys
WHERE species_id IS NULL;
5Returns the numeric expression n rounded to x digits after the decimal point (0 by default)
SELECT COUNT(*)
FROM surveys
WHERE species_id IS NULL;
6Returns the string expression s without leading and trailing whitespace characters
SELECT COUNT(*)
FROM surveys
WHERE species_id IS NULL;
7Returns the string expression s converted to uppercase

Finally, some useful functions which are particular to SQLite are listed in the table below:

FunctionDescription
SELECT COUNT(*)
FROM surveys
WHERE species_id IS NULL;
8Returns a random integer between -9223372036854775808 and +9223372036854775807.
SELECT COUNT(*)
FROM surveys
WHERE species_id IS NULL;
9Returns the string expression s in which every occurrence of f has been replaced with r
SELECT plots.plot_type, AVG(surveys.weight)
FROM surveys
JOIN plots
ON surveys.plot_id = plots.plot_id
GROUP BY plots.plot_type;
0 or
SELECT plots.plot_type, AVG(surveys.weight)
FROM surveys
JOIN plots
ON surveys.plot_id = plots.plot_id
GROUP BY plots.plot_type;
1Returns the portion of the string expression s starting at the character position x (leftmost position is 1), y characters long (or to the end of s if y is omitted)

Challenge:

Write a query that returns genus names (no repeats), sorted from longest genus name down to shortest.

Solution

SELECT *
FROM surveys
JOIN species
USING (species_id);
8

As we saw before, aliases make things clearer, and are especially useful when joining tables.

SELECT *
FROM surveys
JOIN species
USING (species_id);
9

To practice we have some optional challenges for you.

Challenge (optional):

SQL queries help us ask specific questions which we want to answer about our data. The real skill with SQL is to know how to translate our scientific questions into a sensible SQL query (and subsequently visualize and interpret our results).

Have a look at the following questions; these questions are written in plain English. Can you translate them to SQL queries and give a suitable answer?

  1. How many plots from each type are there?

  2. How many specimens are of each sex are there for each year, including those whose sex is unknown?

  3. How many specimens of each species were captured in each type of plot, excluding specimens of unknown species?

  4. What is the average weight of each taxa?

  5. What are the minimum, maximum and average weight for each species of Rodent?

  6. What is the average hindfoot length for male and female rodent of each species? Is there a Male / Female difference?

  7. What is the average weight of each rodent species over the course of the years? Is there any noticeable trend for any of the species?

Proposed solutions:

  1. Solution:

    SELECT surveys.year, surveys.month, surveys.day, species.genus, species.species
    FROM surveys
    JOIN species
    ON surveys.species_id = species.species_id;
    
    0

  2. Solution:

    SELECT surveys.year, surveys.month, surveys.day, species.genus, species.species
    FROM surveys
    JOIN species
    ON surveys.species_id = species.species_id;
    
    1

  3. Solution:

    SELECT surveys.year, surveys.month, surveys.day, species.genus, species.species
    FROM surveys
    JOIN species
    ON surveys.species_id = species.species_id;
    
    2

  4. Solution:

    SELECT surveys.year, surveys.month, surveys.day, species.genus, species.species
    FROM surveys
    JOIN species
    ON surveys.species_id = species.species_id;
    
    3

  5. Solution:

    SELECT surveys.year, surveys.month, surveys.day, species.genus, species.species
    FROM surveys
    JOIN species
    ON surveys.species_id = species.species_id;
    
    4

  6. Solution:

    SELECT surveys.year, surveys.month, surveys.day, species.genus, species.species
    FROM surveys
    JOIN species
    ON surveys.species_id = species.species_id;
    
    5

  7. Solution:

    SELECT surveys.year, surveys.month, surveys.day, species.genus, species.species
    FROM surveys
    JOIN species
    ON surveys.species_id = species.species_id;
    
    6

Key Points

  • Use a

    SELECT surveys.year, surveys.month, surveys.day, species.genus, species.species
    FROM surveys
    JOIN species
    ON surveys.species_id = species.species_id;
    
    7 clause to combine data from two tables—the
    SELECT surveys.year, surveys.month, surveys.day, species.genus, species.species
    FROM surveys, species
    WHERE surveys.species_id = species.species_id;
    
    0 or
    SELECT surveys.year, surveys.month, surveys.day, species.genus, species.species
    FROM surveys, species
    WHERE surveys.species_id = species.species_id;
    
    5 keywords specify which columns link the tables.

  • Regular

    SELECT surveys.year, surveys.month, surveys.day, species.genus, species.species
    FROM surveys
    JOIN species
    ON surveys.species_id = species.species_id;
    
    7 returns only matching rows. Other join clauses provide different behavior, e.g.,
    SELECT species.genus, species.species, surveys.weight
    FROM surveys
    JOIN species
    ON surveys.species_id = species.species_id;
    
    7 retains all rows of the table on the left side of the clause.

    Which of the following are used to combine the records from multiple tables?

    A JOIN clause is used to combine rows from two or more tables, based on a related column between them.

    Which of the following can be used to combine multiple tables in a single query?

    Joins are used to combine the rows from multiple tables using mutual columns.

    Which query is used to retrieve data from multiple tables?

    In SQL, to fetch data from multiple tables, the join operator is used.

    Which of the following query combines data records from two or more tables?

    Answer: C. A Cartesian join between two tables returns every possible combination of rows from the tables. A Cartesian join can be produced by not including a join operation in the query or by using a CROSS JOIN.