Removing duplicate rows for all databases

Day to day we have to delete duplicate data from the database.

An upgrade requires  Primary key where one didn’t exist before or a db let it guard down and didn’t enforce the PK’s (yeah looking at you postgres!)

Or if an upgrade can’t create a new pk then this is useful. The SQL Server commands to find this are there too.

In the example below we’re looking at the table VPX_TYPE_MAP. The primary key of this table has the following columns.      type_name, array_num, mo_ref

An upgrade of VC was trying to create a primary key with these columns but it failed as duplicate rows existed.

SQL server
WITH CTE AS(
SELECT  TYPE_NAME, ARRAY_NUM, MO_REF,
RN = ROW_NUMBER()OVER(PARTITION BY TYPE_NAME ORDER BY TYPE_NAME)
FROM dbo.VPX_TYPE_MAP
)
delete FROM CTE WHERE RN > 1

Postgres
DELETE FROM vpx_type_map
WHERE ctid IN (SELECT ctid
FROM
(SELECT ctid,
ROW_NUMBER() OVER (partition BY type_name, array_num, mo_ref ORDER BY type_name) AS rnum
FROM vpx_type_map) t
WHERE t.rnum > 1);

Oracle
delete from
vpx_type_map
where rowid in
(select rowid from
(select
rowid,
row_number()
over
(partition by type_name, array_num, mo_ref ORDER BY type_name) dup
from vpx_type_map)
where dup > 1);

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s