Sponsored by PRISE Ltd.
www.prisetools.com

How to eliminate product joins

What is product join?

Product join is one of the implementation methods of an SQL JOIN operation.
Do not mix up with cross join (Cartesian product), which is one type of SQL joins.

SQL join types, eg.: inner join, left outer join, full outer join, cross (Cartesian) join
Join implementation types, eg.: nested join, merge join, hash join, product join.

Product join (of tables A and B ) is the most simple method of join implementation:

Produce each of <A;B> record combinations, say take each records from A singly, and match it with each records of B one-by-one.
Test the join condition on each produced <A;B> record pairs, and eliminate those combinations where the condition fails.

The two steps are often combined, and the "testing phase" is executed right after a record combination is generated, and the non valid combinations right after dropped. This saves a lot of temp space.

Why don't we like it?

Well, it has a really bad reputation. It is slow, stuffs CPU, etc.
Yes, it usually is, does. It is the brute force method for executing a join, with costs in order of N*M (where N, M are the record numbers of the joinable tables)

Indeed there are situations when it is the best choice, or the only feasible way.

When is it good/necessary?

Please note that product join is the method what is always applicable, independently of all circumstances.

Good

Product join is typically simple, dumb and slow algorithm, this is why we do not like it, but has a very important advantage: requires no pre-processing.* This is why we LIKE IT:)
If we have to join a really large table to a very small table (couple of records) product join is far the most effective method, since the sort of a very large table ( order of N*logN ) can cost a lot, while joining to 1-2 records is really not a big deal.

Necessary

There are join situations when the only way to go is the product join. Why? Because of the join condition. The "clever joins" (merge, hash) require some information and/or condition that somehow enables to cheat the A x B comparisons: reduce them to the ones that really necessary, and be done in a more effective manner.

* OK, in Teradata this means: only requires that the matchable records from both tables must be on the same AMP. This implies the "small" table to be duplicated to all AMPs.

Merge join example

from A

join  B on A.customer_id = B.customer_id

         and A.trx_dt between B.eff_dt and B.exp_dt

Customer_id clause is in AND condition with the others
Customer_id is selective enough that hash(customer_id) can reduce the comparisons reasonably
Note that A and B must be sorted (re-sorted) by the hash of customer_id

Product join example

from A

join   B on substr(A.telephone_no,1,B.prefix_length) = B.telephone_no_prefix

There is no comparison reducing partial-condition
Note that neither of the tables required to be sorted in a specific order.

Unavoidable product joins

Non-eqality condition
Function used (eg. substr())
Dependent expression is used (eg. A.x+B.y = A.z)
Cross join: intentional Cartesian product

Avoidable product joins

Data type mismatch

The merge join example above works only if customer_no in A and B tables have the same "style" data types, since their hash value will match only in this case. Say hash(13674) <> hash('13674'), however integer is compatible with decimal, and char is compatible with varchar.

Pay attention on data type consistence during physical data modeling.

Use domains to eliminate the possibility of mismatch
Align to used data types when defining temp tables, or use "create table as ..." statements
If you cannot avoid mismatch, relocate the necessary data to temp tables with proper data types during processing.

OR condition

Let's assume the following join condition:

select ...

from A
join  B on A.col1 = B.Col1
        OR

A.Col2 = B.Col2

This is equivalent, w/o compulsory product join :


select ...

from A
join  B on A.col1 = B.Col1

UNION

select ...

from A

join  B on A.Col2 = B.Col2

Missing/stale statistics

As I mentioned before product join is the most effective join between a very large and a really small (couple of records) table. If the optimizer thinks that a table is pretty small, but it is not indeed, it may choose a product join in all good faith, misleaded by a stale or missing statistics.

Define and keep fresh those statistics by the optimizer can determine the size of the joinable record sets properly.

How to find avoidable product joins

It is not trivial to list the avoidable product joins. Practically all product joins are required to be examined one-by-one and judged to be avoidable or not. And if avoidable, what to do for.

I strongly recommend to use PRISE Tuning Assistant for both finding the product joins and analyzing the possibility and necessity of elimination:

List top consuming queries with product join(s)
Check the PROD JOIN steps: which tables are processed that way
Check those join conditions for cases described above

What to do if cannot be avoided?

In this case I recommend to try the decomposition, described here.
It can help reducing the number of comparisons, saving CPU and runtime.

Have a successful optimization and happy new year!