Image of Navigational Map linked to Home / Contents / Search SELECTive Information

by Ross Mack - GUI Computing
Image of Line Break

'SELECT * FROM Sky' - Gary Wisniewski

Talking to a number of other developers who are grappling with learning VB, SQL Server, OOP and half a dozen other things I have often heard the complaint that not everything you need to know is in the VB manuals. This is a fair comment, as when you are programming in VB you also tend to be using [approximately] eighteen other technologies at the same time. Obviously, the specifcs of those technologies are documented elsewhere in various SDKs, online manuals or help files.

The confusion I have found that arises most often is with VB programmers trying to grapple with SQL. It really is its own language and you need to know something about it to write efficient database applications (no matter what your back end is). This prompted me to write this quick introduction to the fundamental SQL statement: SELECT. Once you have mastered SELECT you have the background to master the other SQL statements and it also allows you to get some data from the back end, which is usually the first thing you need to achieve in building database apps.

If you are comfortable with SQL and have been using it for a while, ignore the rest of this article. It will likely bore you. If you are trying to get a database app out the door and you are struggling with SQL, read the rest of this article. It may just help.

People looking for even more introductory information on SQL would be well advised to have a look at Mark Warhol's article Intro to SQL: Jaws II.


The point of the SELECT statement is to retrieve data from the database. In it's simplest form the SELECT statement has the following structure:

SELECT [fieldlist] FROM [recordsource]

the [fieldlist] portion of the select statement indicates which columns (fields) you wish to retrieve. This can either be a simple list of columns like 'Name, Address, Phone' or it can be an asterisk (*) indicating that all available columns should be retrieved. The [recordsource] parameter indicates where the records should come from. Typically this will be a table within the database, but it can be a view also (or a query if you are using an Access database). The [recordsource] may also be a number of tables, but that circumstance is detailed below.

Here are are a couple of examples (all examples are from the pubs database as installed with SQL Server 6.5):

SELECT * FROM publishers

SELECT title, price FROM titles

Sometime, however, you don't need every possible record in the table from which you are retrieving records. Just as you may not need every field. The way you restrict columns is to tell the database to apply some criteria. Essentially, the way this is done is to supply an expression that can be applied to each possible row that might be returned and see if it is true. If it is true the row will end up being returned by the SELECT statement. The expression argument is placed in the SELECT statement as follows:

SELECT [fieldlist] FROM [recordsource] WHERE [expression]

The expression clause can contain any sort of boolean expresison that your database knows how to evaluate. Let's say that you want only the records from the titles table where the price of the book is more than $10. This would be expressed in an SQL statement as:

SELECT title, price FROM titles WHERE price > 10

As the database engine moves through the table determining which records to return from the SELECT statement it basically evaluates that expression (price > 10) for each row. Where the result of the expression is true those rows are returned. You can have multiple conditions that are evaluated together by joining them together with the AND or OR keywords. Consider the following:

SELECT * FROM titles WHERE price > 5 AND price < 10

In that case the only rows that would be returned would have prices between 5 and 10 dollars.

When querying data from a database it is often useful to retrieve data from two related tables in one recordset. The way this is done is to define a join between the two tables within the SELECT statement.

Consider the following SQL Stetements:

SELECT * FROM sales

SELECT * FROM titles

Notice how both lots of data returned have a title_id column ? In the case of the data from the title table this represents the key field, a way to uniquely identify each record in the title table. In the case of the slaes table it is essentially a cross reference to the title table to indicate which title was sold. That's pretty straight forward, right ? OK.

However, when we look at a list of sales on screen or in a report or something we probably want to actually see which title is being sold in some meaningful way. By this I mean that we want to see the name of the book and not the ID. I have no idea what title 8247 is and I quite frankly don't want to know.

So, how do we do that ? We use a SELECT statement that joins the two tables together into one recordset. At this point it is probably worth mentioning that the syntax to define joins in SQL Server has recently changed from a proprietory syntax to an ANSI standard. The ANSI standard syntax is identical to that used in Access so that makes it very convenient for users who use both. Anyway, I will include examples of both syntax types here for your amusement and edification, also because both still work.

To start off with we simply want to tell SQL Server to include both tables in the SELECT statement and that the way to relate them to each other is by comparing title_id from sales to the title_id column from Titles. The resultant SQL Statement looks like this:

[New Syntax]
SELECT sales.*, titles.title
FROM sales INNER JOIN titles ON sales.title_id = titles.title_id

[Old Syntax]
SELECT sales.*, titles.title 
FROM sales, titles
WHERE sales.title_id = titles.title_id

The INNER JOIN keyword tells SQL Server to take the table on the left and join it to the table on the right using the expression after the ON keyword. As you can see the old syntax sort of says the same thing, although it is less declarative. The relationship is more implied by the WHERE clause than it is explicitly stated. One of the benefits of the new syntax (although it is a little more wordy) is that it is more explicit and separates join information for restriction information. You will also notice that I explicitly indicate which columns I am talking about by prefixing them with the table names. For example if the section that read 'sales.title_id = title.title_id' instead read 'title_id = title_id' it would be virtually meaningless.

In fact, due to the ambiguity of the two column names being the same SQL Server would throw an error saying that it was too ambiguous. However, the same principle applies even when the columns have differing names. You will notice that I also use this technique in the field list section of the statement to indicate that I want all columns from the sales table (sales.*) but only the title column from the titles table (titles.title).

You may notice in this case that not all of the titles that are in the titles table actually are represented in the resultant recordset. This is due to the fact that there is no sales data relating to those titles (no entries in the sales table where title_id matches). Depending on your data and what you are trying to do with it this may or may not be a problem. Let's say for example that you needed to see all the titles whether or not they had any sales recorded.

The SQL to do that is as follows:

[New Syntax]
SELECT sales.*, titles.title
FROM sales RIGHT JOIN titles ON sales.title_id = titles.title_id

[Old Syntax]
SELECT sales.*, titles.title 
FROM sales, titles
WHERE sales.title_id =* titles.title_id

As you can see in this case we replace the INNER JOIN keywords with RIGHT JOIN. This simply indicates to SQL Server that we want every entry from the table on the right hand side of the JOIN expression and all the entries from the left hand side where a match can be found.

In the old syntax the join becomes a little more cryptic (I can never remember which side the asterisk is supposed to go, I always have to try it to check). If you run these statements in an ISQL tool you will notice that where data was not available (from the sales table) nulls are returned.

Consider why the following statements are equivalent:

SELECT sales.*, titles.title
FROM sales RIGHT JOIN titles ON sales.title_id = titles.title_id
WHERE NOT sales.stor_id IS NULL

SELECT sales.*, titles.title
FROM sales INNER JOIN titles ON sales.title_id = titles.title_id

Of course it is also possible to SELECT from the two tables where sales records exist and title records do not (although this situation does not occur in the pubs database).

[New Syntax]
SELECT sales.*, titles.title
FROM sales LEFT JOIN titles ON sales.title_id = titles.title_id

[Old Syntax]
SELECT sales.*, titles.title 
FROM sales, titles
WHERE sales.title_id *= titles.title_id

RIGHT JOINs and LEFT JOINs are collective referred to as Outer Joins. Whether they are LEFT or RIGHT [often] really depends on which way the SQL Statement is constructed, so the term Outer Join would be used to refer to the join in a broader context.

Consider why the following statements are equivalent:

SELECT sales.*, titles.title
FROM titles LEFT JOIN sales ON sales.title_id = titles.title_id

SELECT sales.*, titles.title
FROM sales RIGHT JOIN titles ON sales.title_id = titles.title_id

In both cases we can be said to have 'An outer join from titles to sales' regardless of how it is actually constructed in the SELECT statement. I said "from titles to sales" to indicate that the titles table is the one from which all rows are returned and that the sales table is the one from which only matching rows are returned.

We can encapsulate joins like this that we commonly want to do very easily into a reusable database object by compiling them as views. For example, the following view could be created to provide commonly needed data:

CREATE VIEW vwSales AS
   SELECT sales.*, titles.title
   FROM sales INNER JOIN titles ON sales.title_id = titles.title_id

In this way, whenever your application needs the sales data with the book titles filled in it can just say "SELECT * FROM vwSales" or simply "vwSales" depending on how you are connecting to the database.

We can also use joins usefully to return aggregate data. Consider the following SQL Statement:

SELECT titles.title, Sum(sales.qty)
FROM titles LEFT JOIN sales ON titles.title_id = sales.title_id
GROUP BY titles.title

What does it do ? Well firstly it tells SQL Server that the columns required are the title from the titles table and a sum of the qty column from the sales table. It then performs the join (in this case a LEFT JOIN as we want data for all existing titles, but acknowledge that sales data may not exist). We then use the GROUP BY clause to tell SQL Server how to summarise the data. This serves the purpose of indicating that a row should be returned for each title entry in the titles table and that the sum of the qty column from the sales table should be performed for each title.

It should be noted that each entry in the field list must either be some form of aggregate function or must be referenced in the GROUP BY clause.

You will also notice that this query does not return a column name for the Sum(sales.qty) column (or that a column name will be generated automatically by your data access layer). If you want to explicitly set a column name for this column to return this can be achieved by aliasing that column. In SQL Server this can be done in one of two ways, which are equivalent:

SELECT titles.title, Sum(sales.qty) TotalQty
FROM titles LEFT JOIN sales ON titles.title_id = sales.title_id
GROUP BY titles.title

SELECT titles.title, Sum(sales.qty) AS TotalQty
FROM titles LEFT JOIN sales ON titles.title_id = sales.title_id
GROUP BY titles.title

In the first statement the AS keyword is implied. In both cases the result is the same. Of course any column can be aliased, it does not have to be a summary column. Consider the following:

SELECT titles.title AS Book, Sum(sales.qty) AS TotalQty
FROM titles LEFT JOIN sales ON titles.title_id = sales.title_id
GROUP BY titles.title

Of course you can also alias table names in your SQL Statements to make them less long winded (if your table names are long). Here is an example:

SELECT t.title, Sum(s.qty) AS TotalQty
FROM titles AS t LEFT JOIN sales AS s ON t.title_id = s.title_id
GROUP BY t.title



Written by: Ross Mack
June '98

Image of Arrow linked to Next Article
Image of Line Break
[HOME] [TABLE OF CONTENTS] [SEARCH]