Query Ninja: A sub-selecting technique for selecting records from one table that are not in another
|OK. That is a pretty boring title but this information could be very useful to you. |
This article expands on the concepts of sub-selecting in MapInfo Pro. This concept is introduced a separate article. If you did not read that article you might find it to be helpful background. It is called "Query Ninja: Introducing Sub-selecting". To see the whole list of Query Ninja articles click here:
To take the concept of using sub-selects further, we will introduce a technique to select records (or more technically speaking, records with particular values) from one table that are not in a second table. This will require the use of the In() keyword which was also introduced in a previous article called Query Ninja: An alternative to "OR"!.
Example one: Determining all of the regions without a customer in them.
Let's say we have a table of customers and a table of regions. In this example we will use counties for our regions. Our goal is to determine which counties do not have any customers in them. Of course this could be done outside of the SQL capabilities (visually might be one way) and this could be done with other techniques as well. In many cases these might require multiple steps. Using a sub-select we can get a list of the counties without customers in one query.
The map above may help to better explain this example. Which counties have no customers?
Here is the SQL Select dialog box filled in to run the query.
In somewhat "plain language" the above query is doing this; "select all of the counties from the UK_county table where the name of the county never appears in the county field from the list of customers". OK, I did say "somewhat plain language" - I hope that is better than SQL.
And if you are curious about the results, see below! There are twelve counties without any customers in them.
The map above helps to visualise what we are trying to achieve, but ...
...note that this result was achieved by comparing values in the underlying data. It would be possible to have customers map objects appear in a county but the data column is for a different county (maybe as a result of a data or a geocoding error). This example used the County field in the customer table. In future articles we will explore how map objects can be used in sub-selecting.
The following are more examples of this sub-selecting technique. These are written out as they would appear in the MapBasic window.
Comparing two tables of customers/citizens - selecting customers from one table that do not appear in the second
Let's say you have two sets of customer (or citizen) data. Perhaps one set comes in from e-mail based inquiries. We will call this table Email_cust. The second set of data comes from web based inquiries. We will call this table Web_cust. Let's also say that your customers or citizens have a unique identifying number so you can tell when the same customer has used more than one service. This field is called CUST_ID in both tables.
Our goal is to find out all of the customers who have used the Email service who do not use the web based service. As such we want to select all of the customers from the table of e-mail queries that are not also found in the table of customers using the web based service.
Where CUST_ID not in
From Web_cust )
In the above query Not_Using_Web is the temporary table name.
Another useful technique - (sometimes you can) invert a selection
It is possible to use this sub-selecting technique to "select what is not selected". This is the SQL equivalent of the Invert Selection button (from the Main toolbar). You may wonder why one would want to do this in an SQL instead of just using the button - the idea here is to have something automated and easily repeatable in a workspace.
Here is an example query where the selection is inverted.
Where CUST_ID not in
The reason that the title of this example is "(Sometimes you can) invert a selection" is because the column that is used must contain unique values for this technique to be reliable. If the column being used does not contain unique values then it is possible for some records to be excluded that otherwise should not be. If the a value appears more than once in a column then it is possible for it to be in the set of selected records and in the set of unselected records. Inverting the selection would exclude the matching record from the set of unselected records.
An example of how this technique might be used is where a selection of customers within a circle is made. If it is useful to analyse the customers outside of the circle then this query technique can be used. In a previous article we covered how to select within a circle: Query Ninja: Using a Sub-select to select all the objects within a circle.
Making this efficient
I would like to wind up this article with a contribution from the community. Fawaz Bathice, a very experienced MapInfo Pro and MapBasic user wrote a short post in LinkedIn. This was in response to the article on using the In() operator that appeared in the February issue.
When we are using this technique it can be inefficient as the sub-select could be pulling many duplicate values. The first example in this article is a case in point, you can have many customers in a county. Fawaz also pointed out that if you have too many values when are using the In() keyword you may not be able to fit them all in.
In those situations where you have a set list of values, it may be worth the investment to create a special table of these to use in your sub-selects.
Here are Fawaz's comments on this technique.
"If I may add, using the IN or = ANY is ok if there aren't too many values in the list as there is a limit on how many characters the SQL window can hold.
In case of too many values, then I would create a temp table(ListOfValues) which contains ONE column (LOV) and populated with the desired values and save. You can then use one the following queries to get the required records from the main table:"
Where MyColum = ANY (select LOV from ListOfValues)
Where MyColum IN (select LOV from ListOfValues)
From MainTable , ListOfValues
Where MainTable.MyColum = ListOfValues.LOV
Thanks to Fawaz for sharing this. This does introduce the =Any() keyword which we've not covered yet. For the most part this is similar to the In() keyword.
Want to learn more?
There are more articles on sub-selecting as well as many other articles for on using the SQL Select capabilities in MapInfo Pro. Click here for a list.
Article by Tom Probert, Editor of "The MapInfo Pro" journal
When not writing articles for "The MapInfo Pro", Tom enjoys talking to MapInfo Pro users at conferences and events. When not working he likes to see movies with car chases, explosions and kung-fu fighting.