What's the quickest way of deleting all (or a large fraction of) the data in a table ?

2011-01-01

363

The Oracle (tm) Users' Co-Operative FAQ

What's the quickest way of deleting all (or a large fraction of) the data in a table ?

Author's name: Mark D Powell Author's Email: Mark.Powell@eds.com	Date written: 4th Sept 2001 Oracle version(s): 8.1.7.0
What's the quickest way of deleting all (or a large fraction of) the data in a table ?

The answer depends on the circumstances, but if you need to delete all the rows in a table then the truncate command is the way to go as this command minimizes the writing of redo and undo (rollback) entries. The table and any indexes have their high water mark, stored in the header block, reset and except for the dictionary or bitmap updates for freed extents beyond the initial extent the job is done. If the table is to be reloaded and will need to reuse all or nearly all of the space it held then the reuse option can be used to eliminate even the need to update the dictionary space management tables or bitmap header for locally managed objects:

	truncate table owner.tablename [drop storage| reuse storage]复制

The drop storage clause is the default and does not need to be provided.

When all the data cannot be deleted then your options are limited by whether you can get exclusive access to the table to perform the task or the table has to be kept available at all times and how your applications use the table. The next set of approaches all involve the same idea: Instead of deleting 90% of the data, extract the 10% of the data that is good to a new table and discard the original.

In the case where the in use applications only need insert access to the target table then if access can be stopped for a very short window the following approach can be used:

When no FK to target and no insert or delete triggers exist then
  1- rename target to target_old
  2- Re-Create the target table
  3- recreate indexes, re-establish PK and UK constraints, and re-apply grants
  4- allow access
  5- extract good data from target_old and insert it into the recreated target
  6- truncate the target_old table
  7- drop the target_old table
复制

If there is an insert trigger but no delete trigger or FK referencing the table then the re-insertion of the data needs to happen before the users are allowed access to the table since it is not desirable to have the trigger fire for data it has already processed, but as long as the time to select and insert this data is satisfactory then a create tables as select can be used for the re-create table step. This eliminates a step from the list above, but the ordering of the list above is designed to minimize the time the table needs to be unavailable for use.

If the target table has FK references to other tables then since all existing rows either meet the requirement or already violate it because the novalidate option was used in creating the FK constraints then they can be disabled in those cases where the good data is copied prior to allowing access to the table. This can save a fair amount of work and the constraints should again be re-enabled using the novalidate option to save performing unnecessary work. The existence of FK's referencing the target table complicates matters in that either the necessary deletes in the referencing tables must be done in a separate process allowing the FK to be disabled, or the deletes must take place against the target, and the re-naming/re-creating method cannot be used.

If the table is partitioned then native Oracle parallel DML can be used so that each partition has a delete job dedicated to it. This can greatly reduce clock time at the expense of system resources. For a normal table the delete can be logically partitioned to manually duplicate the parallel delete process be running multiple delete statements, each of which has its where clause limited to a specified set of rowid values. If you take this approach then I recommend that you create the indexes for this table with delete job number of transaction work areas itl, such as initrans 4. This will help to ensure each delete job can obtain an existing itl entry so that they do not have to dynamically allocate any, or worse have to wait because the space to allocate an itl is unavailable. When multiple delete jobs are ran the first job should have an unbounded lower range and the last job an unbounded upper range to ensure no section of the target table is missed. It is probably advisable to generate the rowids rather than hardcode them. Sample code to find the boundary rowids follows at the end of the FAQ. It would be fairly easy to change the dbms_output statements to an insert of a single row into a delete job control table that the delete jobs reference in their where clauses to control the delete ranges.

Still another concept is to partition the table such that all rows to be dropped exist in the same partition. This method requires that the data lend itself to partitioning on the columns that are used to determine that the data can be deleted and this is often not practical. If the data requires global indexes or a few rows within the partition key range end up not being valid for deletion then the benefit of portioning the table to support deletes is lost. Still the method should not be overlooked.

set serveroutput on
declare
--             counters
v_ctr          pls_integer := 0 ;
v_ct_rows      pls_integer := 0 ;
v_rowid        rowid            ;
v_rows_per_job pls_integer := 0 ;
--
cursor  c_get_rowids is
 select rowid
 from   po_blanket_hist ;
--
--             pl/sql table to hold boundry rowids
--
type t_rowids is table of varchar2(18)
  index by binary_integer     ;
t_rows         t_rowids       ;
I              binary_integer ;
--
--             get total row count
--
begin
select count(*)
into   v_ct_rows
from   po_blanket_hist ;
--
--             In this example we are going to use 4 jobs
--
v_rows_per_job := floor(v_ct_rows / 4) ;
--
--             While data read, test, save rowid
--
open c_get_rowids ;
I := 1            ;
loop
  fetch c_get_rowids into v_rowid ;
  exit when c_get_rowids%notfound ; -- quit when done
  v_ctr := v_ctr + 1              ; -- count rows returned
  if v_ctr = v_rows_per_job
    then t_rows(I) := v_rowid     ; -- mod takes longer than using a counter
         v_ctr := 0               ;
         I := I + 1               ;
  end if                          ;
end loop                          ;
--
close c_get_rowids ;
--
--          This is where we could save the rowids to a job control row
--
dbms_output.put_line('Row Count is    '||v_ct_rows)      ;
dbms_output.put_line('Rows per job    '||v_rows_per_job) ;
dbms_output.put_line('First  Rowid is '||t_rows(1))      ;
dbms_output.put_line('Second Rowid is '||t_rows(2))      ;
dbms_output.put_line('Third  Rowid is '||t_rows(3))      ;
dbms_output.put_line('Forth  Rowid is '||t_rows(4))      ;
end ;
/
Row Count is    32695051
Rows per job    8173762
First  Rowid is AAAAb+AFQAAALHWAAO
Second Rowid is AAAAb+AFgAAAMiNAA5
Third  Rowid is AAAAb+AGQAAADZnAAL
Forth  Rowid is AAAAb+AGQAAAKCMABB
PL/SQL procedure successfully completed.复制

Further reading: If you have complex delete logic then the section on views for performance in Jonathan Lewis's book Practical Oracle 8i might be of interest. You can also see Metalink document id 19890.996 and 89799.996, which are forum threads. Oracle support did not offer anything not mentioned in this FAQ.

oracle

最后修改时间：2020-04-16 15:12:19

What's the quickest way of deleting all (or a large fraction of) the data in a table ?

The Oracle (tm) Users' Co-Operative FAQ

What's the quickest way of deleting all (or a large fraction of) the data in a table ?

评论