Imbalanced Join Conditions in Database Queries

RDBMS or Relational Database Management Systems use queries to fetch data from the database. Queries can be constructed on more than one table using equi or non equi joins. Queries can also specify where conditions, where data will be retrieved as per some conditions such as Select student_name from students where place_of_residence = “Chennai”. Join conditions are specified as Select a.customer_name, b.order_id from customer a, order b where a.customer_id = b.customer_id.

In the above case a join is being made between two tables customer and order. Prior to a query being executed the SQL engine computes an execution plan. The query execution plan is a sequence of steps determined by the query optimizer. These steps would yield a sequence of operations which would result in the lowest cost of executing the query. In complex situations when queries are constructed between many tables specifying many query conditions one or two missed join conditions can lead to prolonged query run times if the database has high volumes of data.

A join of two tables without a join condition is just a cross product of two sets. If Table A has 10,000 rows and Table B has 5000 rows a cross product of two tables will result in 5,00,00,000 records. Whereas if a join condition is introduced the result set will contain 10,000 rows or 5,000 rows in the resultant search space. If a query for example contains 10 tables and cross products of two or three tables taken at a time, and if the query expression contains 20 – 30 joins 3 or 4 join conditions can be missed out due to negligence or error. This can be the case in case of SQL queries running in large manufacturing or production outlets doing heavy volume data processing. The size of each table is very large and so are the number of tables.

A designer may compose a query to record an Actualization process in a production house in which actual sales data against projections are uploaded into the database. Due to the complexity of the database a single query may use 20 tables holding more than 100000 records per table on the average. The query may use a lot of joins and if due an accident some join conditions get skipped then the query would search in a space of 10 power 20 records instead of just 10 power 5 records. This would lead to prolonged query execution time sometimes a single program containing any such erroneous query can run for even 20 hours without completing the updations. But on identifying and adding missed join conditions such as table3.column3 = table4.column7 or table7.column2 = table1.column9 etc., into the query the query can be made to run in acceptable time.

Article Source:

Leave a Reply

Your email address will not be published. Required fields are marked *