Thursday, July 29, 2010 22:16

Avoid Folder predicates

Tagged with:
Posted by ukoom on Saturday, March 28, 2009, 21:10
This news item was posted in Documentum category and has 0 Comments so far.

Consider the following query that contains a ‘Folder’ predicate:

select 	r_object_id
from 	dm_document
where 	folder('/Temp/Folder1',descend)

The query selects all documents that are in /Temp/Folder1 or any sub-folders underneath. This sort of query generally performs quite well. However if you want to select documents from 2 or more sets of folder it is best to avoid ORing the folder predicates. E.g. this query may scale poorly as the number of dm_document objects in the system increases:

select 	r_object_id
from 	dm_document
where 	folder('/Temp/Folder1',descend)
OR 	folder('/Temp/Folder2',descend)

Why is this? The SQL generated by the first query looks like this:

select 	all
	dm_document.r_object_id
from 	dm_document_sp dm_document,
	dm_folder_r dm_folder_r1,
	dm_sysobject_r dm_sysobject_r2
where 	(    dm_document.r_object_id = dm_sysobject_r2.r_object_id
	 and dm_sysobject_r2.i_folder_id = dm_folder_r1.r_object_id
	 and dm_folder_r1.i_ancestor_id = '0b0000038000e990'
	)
and 	(    dm_document.i_has_folder = 1
	 and dm_document.i_is_deleted = 0
	)

The folder predicate is implemented by adding the dm_folder_r table into the join and filtering on the i_ancestor_id field. The database optimiser can evaluate different join orders and work out the best table join order. If there are a relatively small number of objects under the folder /Temp/Folder1 then probably the best table to join first is dm_folder_r; subsequent joins to the dm_sysobject_r and dm_sysobject_r table will involve only a small number of rows.

On the other hand the SQL for the ORed predicate DQL looks like this:

select 	all
	dm_document.r_object_id
from 	dm_document_sp dm_document
where 	(    dm_document.r_object_id in (
		select 	dm_sysobject_r2.r_object_id
		from 	dm_sysobject_r dm_sysobject_r2,
			dm_folder_r dm_folder_r1
		where 	dm_sysobject_r2.i_folder_id
				= dm_folder_r1.r_object_id
		and 	dm_folder_r1.i_ancestor_id
				= '0b0000038000e990')
	 or  dm_document.r_object_id in (
		select 	dm_sysobject_r2.r_object_id
		from 	dm_sysobject_r dm_sysobject_r2,
			dm_folder_r dm_folder_r1
			where 	dm_sysobject_r2.i_folder_id
					= dm_folder_r1.r_object_id
			and 	dm_folder_r1.i_ancestor_id
					= '0b0000038000e995')
	)
and 	(    dm_document.i_has_folder = 1
	 and dm_document.i_is_deleted = 0
	)

The SQL is completely different. In this case the folder predicates are implemented using a correlated sub-query for each folder predicate. The database is unable to find a query plan that allows the optimal join order – the one where dm_folder_r is joined first. The only way the database can execute the query is to select every current dm_document object and attempt to join it to the dm_folder_r. In a large database the work to select every dm_document object could be huge.

So what are the alternatives? I can think of a few but a UNION ALL would be top of my list in this particular case:

select 	r_object_id
from 	dm_document
where 	folder('/Temp/Folder1',descend)
UNION ALL
select 	r_object_id
from 	dm_document

This article is from http://robineast.wordpress.com
where 	folder('/Temp/Folder2',descend)

By the way this performance tip applies whether the folder contains the ‘descend’ clause or not.

Related Posts

Leave a Reply

You can leave a response, or trackback from your own site.