Search

Join this Blog for direct reference of any post into your inbox. To join just click on "Join this Site" under "Follower" then Login with your Email.*** DECLARATION: I Maintain this Blog for Helping Myself While at Work and Welcome any body Needing Help!!!.*** CAUTION: Using any of the script from this Blog may contain at Own Risk. These scripts May or May not have been Tested.***

Monday, 12 November 2012

How to Write Efficient SQL Statements

1. Deleting Duplicate Rows:
A common problem that many DBAs and programmers face is trying to purge duplicate rows from a single table. This usually happens after data conversion/migration processes have run, and either business rules or missing data integrity constraints allowed the duplicates from the source system into the target system. Sometimes duplicates are introduced inadvertently by the DBA during an import or other rare mistake. Either any case the duplicates need to be removed to achieve clean data again.
To find and delete the duplicate payslip
Select * FROM pay_payment_master e
WHERE  e.rowid > ( SELECT MIN(x.rowid) FROM   pay_payment_master x
WHERE  x.payslip_number = e.payslip_number );

Delete FROM pay_payment_master e
WHERE  e.rowid > ( SELECT MIN(x.rowid) FROM   pay_payment_master x
WHERE  x.payslip_number = e.payslip_number );
2. Use EXISTS instead of DISTINCT in query
Incase joins query using EXISTS is a faster alternative because that when the subquery has been satisfied once, there is no need to proceed any further as in the case of Distinct the SQL will actually fetch all rows satisfying the table join and then sort and filter out duplicate values, thus performing unnecessary work.
SELECT DISTINCT d.department_id, d.department_name
FROM departments d, employees  e
 WHERE d.department_id = e.employee_id;
------------------------------------------------------------------------
| Operation                     | Name    | Rows  | Bytes | Cost (%CPU)|
------------------------------------------------------------------------
| SELECT STATEMENT              |         |     9 |   144 |     7  (29)|
|  HASH UNIQUE                  |         |     9 |   144 |     7  (29)|
|   MERGE JOIN                  |         |    14 |   224 |     6  (17)|
|    TABLE ACCESS BY INDEX ROWID| DEPT    |     4 |    52 |     2   (0)|
|     INDEX FULL SCAN           | PK_DEPT |     4 |       |     1   (0)|
|    SORT JOIN                  |         |    14 |    42 |     4  (25)|
|     TABLE ACCESS FULL         | EMP     |    14 |    42 |     3   (0)|
------------------------------------------------------------------------
SELECT d.department_id, d.department_name
 FROM departments d
 WHERE EXISTS (SELECT 'X'  FROM employees e
        WHERE e.department_id = d.department_id);
-----------------------------------------------------------------------
| Operation                    | Name    | Rows  | Bytes | Cost (%CPU)|
-----------------------------------------------------------------------
| SELECT STATEMENT             |         |     3 |    48 |     6  (17)|
|  MERGE JOIN SEMI             |         |     3 |    48 |     6  (17)|
|   TABLE ACCESS BY INDEX ROWID| DEPT    |     4 |    52 |     2   (0)|
|    INDEX FULL SCAN           | PK_DEPT |     4 |       |     1   (0)|
|   SORT UNIQUE                |         |    14 |    42 |     4  (25)|
|    TABLE ACCESS FULL         | EMP     |    14 |    42 |     3   (0)|
-----------------------------------------------------------------------
From the result of above two examples you can clearly see the Exist would consume less CPU resource and processes fewer bytes to get the same results. Ensure the column used in the query must supported by index.
3. Minimizing Sub-queries:
Executing sub-queries with the SQL statement must return one value for the field of each record. To improve performance, minimize the number of scalar and nested sub-queries.
For Example:
SELECT ename emp_name  FROM emp
 WHERE sal > (SELECT losal FROM salgrade
               WHERE grade = (SELECT MAX(grade) FROM salgrade))
   AND sal < (SELECT hisal  FROM salgrade
               WHERE grade = (SELECT MAX(grade) FROM salgrade));
From the above query you get the desired result but it does more work than necessary. It visits salgrade at least four times. Modifying the above query with the join operation that performs less work and getting the same result.
SELECT e.ename emp_name  FROM emp  e,  salgrade s
 WHERE s.grade = (SELECT MAX(grade) FROM salgrade)
   AND e.sal BETWEEN s.losal AND s.hisal;
By eliminating all sub-queries and visiting SALGRADE only once will minimize the cost and %CPU time.
SELECT e.ename emp_name  FROM emp e,
       (SELECT /*+ INDEX_DESC(salgrade) */   losal,  hisal  FROM salgrade
         WHERE ROWNUM <= 1) s
          WHERE e.sal BETWEEN s.losal AND s.hisal;
4. Reducing Database Accesses:
Every time a SQL statement is executed, Oracle needs to perform many internal processing steps. The Oracle database tries to reuse as much work as possible by caching information in the Shared Pool and accessing previously used blocks in the Buffer Cache. However, by reducing the number of trips to and requests of the database, we can help Oracle conserve resources.
The following examples show four methods of retrieving data about two distinct employees respectively less efficient method to more efficient method respectively.
Method1: Less efficient
  SELECT ename emp_name, sal salary
   FROM emp
    WHERE empno = 7839;
 Method2: More efficient than Method1
   SELECT a.ename emp_name, a.sal salary,  b.ename emp_name, b.sal salary
   FROM emp a, emp b
   WHERE a.empno = 7839 AND b.empno = 7844;
 Method3: More efficient than Method2, Method1
   SELECT ename emp_name, sal salary
   FROM emp
   WHERE empno IN (7839,7844);
Method 4: Most Efficient Method
   CREATE TYPE t_empno_arr AS TABLE OF NUMBER(4);
   CREATE OR REPLACE PACKAGE emp_info AS
   PROCEDURE get_employees (i_emp_arr IN t_empno_arr, o_emp_rc OUT SYS_REFCURSOR);
   END emp_info;
   CREATE OR REPLACE PACKAGE BODY emp_info AS
   PROCEDURE get_employees (i_emp_arr IN t_empno_arr, o_emp_rc OUT SYS_REFCURSOR)
   IS
   BEGIN
      OPEN o_emp_rc FOR
         SELECT e.ename emp_name, e.sal   salary FROM emp e
          WHERE e.empno IN (SELECT a.COLUMN_VALUE empno
           FROM TABLE(i_emp_arr) a);
       END get_employees;
    END emp_info;
Using one SQL statement instead of multiple statements can significantly improve database performance. If possible, have the application retrieve all the information that it needs in one operation rather than making multiple trips to the database. If you are using method 4, consider having the procedure take multiple inputs, process multiple statements, and return multiple ref cursors, thus reducing the number of database calls from N to 1.
5. Table Joins versus EXISTS:
In general, consider joining tables rather than specifying sub-queries when the percentage of successful rows returned from the driving table is high. For example, if we are selecting records from the EMP table and are required to filter those records from the SALES department, then a table join will be more efficient.
Consider the following example:
SELECT ename emp_name
  FROM emp e
  WHERE EXISTS (SELECT 'X'
   FROM dept d
   WHERE d.deptno = e.deptnoAND d.dname = 'SALES');
To improve performance, specify:
SELECT ename emp_name
  FROM emp  e, dept d
  WHERE e.deptno = d.deptnoAND d.dname = 'SALES';
Thus the cost is lower and the anticipated consumption of CPU resources is much lower.

In case of some table joins out to separate sub-queries when the percentage of successful rows returned from the driving table is small. When two tables are joined, all rows need to be matched from the driving table to the second table. If a large number of rows can be filtered from the driving table before having to perform the validation against the second table, the number of total physical reads can be dramatically reduced.
Consider the following example:
SELECT e.ename, d.dname
  FROM dept d, emp  e
  WHERE e.deptno = d.deptnoAND e.JOB = 'MANAGER'AND d.dname = 'SALES';
To test an improvement performance, specify:
SELECT e.ename,e.sal
  FROM emp e
  WHERE EXISTS (SELECT NULL
  FROM dept d
   WHERE d.deptno = e.deptno AND d.dname = 'SALES')AND e.JOB = 'MANAGER';
6. UNION ALL versus UNION:
The UNION clause forces all rows returned by each portion of the UNION to be sorted and merged and duplicate to be filtered before the first row is returned to the calling module. A UNION ALL simply returns all rows including duplicates and does not have to perform any sort, merge, or filter. With large base tables, the sort can be a particularly expensive step.
Most statements that do include a UNION clause can in fact replace it with a UNION ALL. These statements are written in a fashion whereby duplicates cannot be generated.
Consider the following example:
SELECT acct_num, balance_amt
FROM   debit_transactions
WHERE  tran_date = TO_DATE('31-DEC-95',’DD-MON-RR’)
UNION
SELECT acct_num, balance_amt
FROM   credit_transactions
WHERE  tran_date = TO_DATE('31-DEC-95',’DD-MON-RR’);
To improve performance, replace this code with:
SELECT acct_num, balance_amt
FROM   debit_transactions
WHERE  tran_date = TO_DATE('31-DEC-95',’DD-MON-RR’)
UNION ALL
SELECT acct_num, balance_amt
FROM   credit_transactions
WHERE  tran_date = TO_DATE('31-DEC-95',’DD-MON-RR’;
Note: In fact the UNION ALL is 4 times faster than UNION.

0 comments:

Post a Comment