Consider the following query that contains a ‘Folder’ predicate:
select r_object_id
from dm_document
where folder('/Temp/Folder1',descend)
The query selects all documents that are in /Temp/Folder1 or any sub-folders underneath. This sort of query generally performs quite well. However if you want to select documents from 2 or more sets of folder it is best to avoid ORing the folder predicates. E.g. this query may scale poorly as the number of dm_document objects in the system increases:
select r_object_id
from dm_document
where folder('/Temp/Folder1',descend)
OR folder('/Temp/Folder2',descend)
Why is this? The SQL generated by the first query looks like this:
select all dm_document.r_object_id from dm_document_sp dm_document, dm_folder_r dm_folder_r1, dm_sysobject_r dm_sysobject_r2 where ( dm_document.r_object_id = dm_sysobject_r2.r_object_id and dm_sysobject_r2.i_folder_id = dm_folder_r1.r_object_id and dm_folder_r1.i_ancestor_id = '0b0000038000e990' ) and ( dm_document.i_has_folder = 1 and dm_document.i_is_deleted = 0 )
The folder predicate is implemented by adding the dm_folder_r table into the join and filtering on the i_ancestor_id field. The database optimiser can evaluate different join orders and work out the best table join order. If there are a relatively small number of objects under the folder /Temp/Folder1 then probably the best table to join first is dm_folder_r; subsequent joins to the dm_sysobject_r and dm_sysobject_r table will involve only a small number of rows.
On the other hand the SQL for the ORed predicate DQL looks like this:
select all dm_document.r_object_id from dm_document_sp dm_document where ( dm_document.r_object_id in ( select dm_sysobject_r2.r_object_id from dm_sysobject_r dm_sysobject_r2, dm_folder_r dm_folder_r1 where dm_sysobject_r2.i_folder_id = dm_folder_r1.r_object_id and dm_folder_r1.i_ancestor_id = '0b0000038000e990') or dm_document.r_object_id in ( select dm_sysobject_r2.r_object_id from dm_sysobject_r dm_sysobject_r2, dm_folder_r dm_folder_r1 where dm_sysobject_r2.i_folder_id = dm_folder_r1.r_object_id and dm_folder_r1.i_ancestor_id = '0b0000038000e995') ) and ( dm_document.i_has_folder = 1 and dm_document.i_is_deleted = 0 )
The SQL is completely different. In this case the folder predicates are implemented using a correlated sub-query for each folder predicate. The database is unable to find a query plan that allows the optimal join order – the one where dm_folder_r is joined first. The only way the database can execute the query is to select every current dm_document object and attempt to join it to the dm_folder_r. In a large database the work to select every dm_document object could be huge.
So what are the alternatives? I can think of a few but a UNION ALL would be top of my list in this particular case:
select r_object_id
from dm_document
where folder('/Temp/Folder1',descend)
UNION ALL
select r_object_id
from dm_document
This article is from http://robineast.wordpress.com
where folder('/Temp/Folder2',descend)
By the way this performance tip applies whether the folder contains the ‘descend’ clause or not.