Oracle как найти дубликаты

What’s the simplest SQL statement that will return the duplicate values for a given column and the count of their occurrences in an Oracle database table?

For example: I have a JOBS table with the column JOB_NUMBER. How can I find out if I have any duplicate JOB_NUMBERs, and how many times they’re duplicated?

asked Sep 12, 2008 at 15:10

Aggregate the column by COUNT, then use a HAVING clause to find values that appear more than once.

SELECT column_name, COUNT(column_name)
FROM table_name
GROUP BY column_name
HAVING COUNT(column_name) > 1;

answered Sep 12, 2008 at 15:13

Bill the LizardBill the Lizard

397k209 gold badges563 silver badges877 bronze badges

Another way:

SELECT *
FROM TABLE A
WHERE EXISTS (
  SELECT 1 FROM TABLE
  WHERE COLUMN_NAME = A.COLUMN_NAME
  AND ROWID < A.ROWID
)

Works fine (quick enough) when there is index on column_name. And it’s better way to delete or update duplicate rows.

answered Sep 13, 2008 at 9:35

GrreyGrrey

6711 gold badge6 silver badges4 bronze badges

Simplest I can think of:

select job_number, count(*)
from jobs
group by job_number
having count(*) > 1;

answered Sep 12, 2008 at 15:17

JosephStyonsJosephStyons

56.9k63 gold badges159 silver badges232 bronze badges

You don’t need to even have the count in the returned columns if you don’t need to know the actual number of duplicates. e.g.

SELECT column_name
FROM table
GROUP BY column_name
HAVING COUNT(*) > 1

answered Sep 13, 2008 at 14:55

EvanEvan

18.1k8 gold badges41 silver badges48 bronze badges

How about:

SELECT <column>, count(*)
FROM <table>
GROUP BY <column> HAVING COUNT(*) > 1;

To answer the example above, it would look like:

SELECT job_number, count(*)
FROM jobs
GROUP BY job_number HAVING COUNT(*) > 1;

answered Sep 12, 2008 at 15:18

AndrewAndrew

12.9k15 gold badges55 silver badges85 bronze badges

In case where multiple columns identify unique row (e.g relations table ) there you can use following

Use row id
e.g. emp_dept(empid, deptid, startdate, enddate)
suppose empid and deptid are unique and identify row in that case

select oed.empid, count(oed.empid) 
from emp_dept oed 
where exists ( select * 
               from  emp_dept ied 
                where oed.rowid <> ied.rowid and 
                       ied.empid = oed.empid and 
                      ied.deptid = oed.deptid )  
        group by oed.empid having count(oed.empid) > 1 order by count(oed.empid);

and if such table has primary key then use primary key instead of rowid, e.g id is pk then

select oed.empid, count(oed.empid) 
from emp_dept oed 
where exists ( select * 
               from  emp_dept ied 
                where oed.id <> ied.id and 
                       ied.empid = oed.empid and 
                      ied.deptid = oed.deptid )  
        group by oed.empid having count(oed.empid) > 1 order by count(oed.empid);

answered Sep 20, 2012 at 7:25

I usually use Oracle Analytic function ROW_NUMBER().

Say you want to check the duplicates you have regarding a unique index or primary key built on columns (c1, c2, c3).
Then you will go this way, bringing up ROWID s of rows where the number of lines brought by ROW_NUMBER() is >1:

Select *
From Table_With_Duplicates
Where Rowid In (Select Rowid
                  From (Select ROW_NUMBER() Over (
                                 Partition By c1, c2, c3
                                 Order By c1, c2, c3
                               ) nbLines
                          From Table_With_Duplicates) t2
                 Where nbLines > 1)

MT0

139k11 gold badges57 silver badges115 bronze badges

answered Oct 24, 2017 at 8:21

J. ChomelJ. Chomel

8,14215 gold badges41 silver badges68 bronze badges

Doing

select count(j1.job_number), j1.job_number, j1.id, j2.id
from   jobs j1 join jobs j2 on (j1.job_numer = j2.job_number)
where  j1.id != j2.id
group by j1.job_number

will give you the duplicated rows’ ids.

answered Sep 12, 2008 at 15:24

agnulagnul

12.5k14 gold badges63 silver badges85 bronze badges

SELECT   SocialSecurity_Number, Count(*) no_of_rows
FROM     SocialSecurity 
GROUP BY SocialSecurity_Number
HAVING   Count(*) > 1
Order by Count(*) desc

Simon Adcock

3,5243 gold badges25 silver badges41 bronze badges

answered Apr 5, 2013 at 6:48

I know its an old thread but this may help some one.

If you need to print other columns of the table while checking for duplicate use below:

select * from table where column_name in
(select ing.column_name from table ing group by ing.column_name having count(*) > 1)
order by column_name desc;

also can add some additional filters in the where clause if needed.

answered Jul 23, 2018 at 7:57

Here is an SQL request to do that:

select column_name, count(1)
from table
group by column_name
having count (column_name) > 1;

typedef

1,1491 gold badge6 silver badges11 bronze badges

answered Jan 12, 2018 at 11:02

1. solution

select * from emp
    where rowid not in
    (select max(rowid) from emp group by empno);

answered Feb 10, 2016 at 10:08

DoOrDieDoOrDie

3153 silver badges12 bronze badges

Also u can try something like this to list all duplicate values in a table say reqitem

SELECT count(poid) 
FROM poitem 
WHERE poid = 50 
AND rownum < any (SELECT count(*)  FROM poitem WHERE poid = 50) 
GROUP BY poid 
MINUS
SELECT count(poid) 
FROM poitem 
WHERE poid in (50)
GROUP BY poid 
HAVING count(poid) > 1;

Yaron Idan

6,1175 gold badges43 silver badges66 bronze badges

answered Jan 27, 2016 at 14:23

StackerStacker

491 silver badge4 bronze badges

Источник

Summary: in this tutorial, you will learn how to find duplicate records in the Oracle Database.

Let’s start by setting up a sample table for the demonstration.

Setting up a sample table

First, the following statement creates a new table named fruits that consists of three columns: fruit id, fruit name, and color:

CREATE TABLE fruits (
        fruit_id   NUMBER generated BY DEFAULT AS IDENTITY,
        fruit_name VARCHAR2(100),
        color VARCHAR2(20)
);
Code language: SQL (Structured Query Language) (sql)

Second, insert some rows into the fruits table:

INSERT INTO fruits(fruit_name,color) VALUES('Apple','Red');
INSERT INTO fruits(fruit_name,color) VALUES('Apple','Red');
INSERT INTO fruits(fruit_name,color) VALUES('Orange','Orange');
INSERT INTO fruits(fruit_name,color) VALUES('Orange','Orange');
INSERT INTO fruits(fruit_name,color) VALUES('Orange','Orange');
INSERT INTO fruits(fruit_name,color) VALUES('Banana','Yellow');
INSERT INTO fruits(fruit_name,color) VALUES('Banana','Green');
Code language: SQL (Structured Query Language) (sql)

Third, query data from the fruits table:

SELECT * FROM fruits;   
Code language: SQL (Structured Query Language) (sql)

As you can see from the picture above, the fruits table has duplicate records with the same information repeated in both fruit_name and color columns.

Finding duplicate rows using the aggregate function

To find duplicate rows from the fruits table, you first list the fruit name and color columns in both SELECT and GROUP BY clauses. Then you count the number of appearances each combination appears with the COUNT(*) function as shown below:

SELECT 
    fruit_name,
    color,
    COUNT(*)
FROM 
    fruits
GROUP BY 
    fruit_name,
    color;
Code language: SQL (Structured Query Language) (sql)

The query returned a single row for each combination of fruit name and color. It also included the rows without duplicates.

To return just the duplicate rows whose COUNT(*) is greater than one, you add a HAVING clause as follows:

SELECT 
    fruit_name,
    color,
    COUNT(*)
FROM 
    fruits
GROUP BY 
    fruit_name,
    color
HAVING COUNT(*) > 1; 
Code language: SQL (Structured Query Language) (sql)

So now we have duplicated record. It shows one row for each copy.

If you want to return all the rows, you need to query the table again as shown below:

SELECT *
FROM fruits
WHERE (fruit_name, color) IN
    (SELECT fruit_name,
        color
    FROM fruits
    GROUP BY fruit_name,
        color
    HAVING COUNT(*) > 1
    )
ORDER BY fruit_name, color;
Code language: SQL (Structured Query Language) (sql)

Now, we have all duplicate rows displayed in the result set.

Finding duplicate records using analytic function

See the following query:

SELECT f.*,
    COUNT(*) OVER (PARTITION BY fruit_name, color) c
FROM fruits f;
Code language: SQL (Structured Query Language) (sql)

In this query, we added an OVER() clause after the COUNT(*) and placed a list of columns, which we checked for duplicate values, after a partition by clause. The partition by clause split rows into groups.

Different from using the GROUP BY above, the analytic function preserves the result set, therefore, you still can see all the rows in the table once.

Because you can use the analytic function in the WHERE or HAVING clause, you need to use the WITH clause:

WITH fruit_counts AS (
    SELECT f.*,
        COUNT(*) OVER (PARTITION BY fruit_name, color) c
    FROM fruits f
)
SELECT *
FROM fruit_counts
WHERE c > 1 ;
Code language: SQL (Structured Query Language) (sql)

Or you need to use an inline view:

SELECT 
    *
FROM
        (SELECT f.*,
            COUNT(*) OVER (PARTITION BY fruit_name, color) c
        FROM fruits f
        )
WHERE c > 1;
Code language: SQL (Structured Query Language) (sql)

Now, you should know how to how to find duplicate records in Oracle Database. It’s time to clean up your data by removing the duplicate records.

Was this tutorial helpful?

Источник

The following examples return duplicate rows from an Oracle Database table.

Sample Data

Suppose we have a table with the following data:

SELECT * FROM Pets;

Result:

PetId  PetName  PetType
-----  -------  -------
1      Wag      Dog    
1      Wag      Dog    
2      Scratch  Cat    
3      Tweet    Bird   
4      Bark     Dog    
4      Bark     Dog    
4      Bark     Dog

The first two rows are duplicates, as are the last three rows. In this case the duplicate rows contain duplicate values across all columns, including the ID column.

Option 1

We can use the following query to see how many rows are duplicates:

SELECT 
    PetId,
    PetName,
    PetType,
    COUNT(*) AS "Count"
FROM Pets
GROUP BY 
    PetId,
    PetName,
    PetType
ORDER BY PetId;

Result:

PETID	PETNAME	PETTYPE	Count
1	Wag	Dog	2
2	Scratch	Cat	1
3	Tweet	Bird	1
4	Bark	Dog	3

We grouped the rows by all columns, and returned the row count of each group. Any row with a count greater than 1 is a duplicate.

We can order it by count in descending order, so that the rows with the most duplicates appear first:

SELECT 
    PetId,
    PetName,
    PetType,
    COUNT(*) AS "Count"
FROM Pets
GROUP BY 
    PetId,
    PetName,
    PetType
ORDER BY Count(*) DESC;

Result:

PETID	PETNAME	PETTYPE	Count
4	Bark	Dog	3
1	Wag	Dog	2
2	Scratch	Cat	1
3	Tweet	Bird	1

Option 2

If we only want the duplicate rows listed, we can use the the HAVING clause to return only rows with a count of greater than 1:

SELECT 
    PetId,
    PetName,
    PetType,
    COUNT(*) AS "Count"
FROM Pets
GROUP BY 
    PetId,
    PetName,
    PetType
HAVING COUNT(*) > 1
ORDER BY COUNT(*) DESC;

Result:

PETID	PETNAME	PETTYPE	Count
4	Bark	Dog	3
1	Wag	Dog	2

Option 3

Another option is to use the ROW_NUMBER() window function:

SELECT 
    PetId,
    PetName,
    PetType,
    ROW_NUMBER() OVER ( 
        PARTITION BY PetId, PetName, PetType 
        ORDER BY PetId, PetName, PetType
        ) AS rn
FROM Pets;

Result:

PETID	PETNAME	PETTYPE	RN
1	Wag	Dog	1
1	Wag	Dog	2
2	Scratch	Cat	1
3	Tweet	Bird	1
4	Bark	Dog	1
4	Bark	Dog	2
4	Bark	Dog	3

The PARTITION BY clause divides the result set produced by the FROM clause into partitions to which the function is applied. When we specify partitions for the result set, each partition causes the numbering to start over again (i.e. the numbering will start at 1 for the first row in each partition).

Option 4

We can use the above query as a common table expression:

WITH cte AS 
    (
        SELECT 
            PetId,
            PetName,
            PetType,
            ROW_NUMBER() OVER ( 
                PARTITION BY PetId, PetName, PetType 
                ORDER BY PetId, PetName, PetType
                ) AS Row_Number
        FROM Pets
    )
SELECT * FROM cte WHERE Row_Number <> 1;

Result:

PETID	PETNAME	PETTYPE	ROW_NUMBER
1	Wag	Dog	2
4	Bark	Dog	2
4	Bark	Dog	3

This returns just the excess rows from the matching duplicates. So if there are two identical rows, it returns one of them. If there are three identical rows, it returns two, and so on.

Option 5

Given our table doesn’t contain a primary key column we can take advantage of Oracle’s rowid pseudocolumn:

SELECT * FROM Pets
WHERE EXISTS (
  SELECT 1 FROM Pets p2 
  WHERE Pets.PetName = p2.PetName
  AND Pets.PetType = p2.PetType
  AND Pets.rowid > p2.rowid
);

Result:

PETID	PETNAME	PETTYPE
1	Wag	Dog
4	Bark	Dog
4	Bark	Dog

The way this works is, each row in an Oracle database has a rowid pseudocolumn that returns the address of the row. The rowid is a unique identifier for rows in the table, and usually its value uniquely identifies a row in the database. However, it’s important to note that rows in different tables that are stored together in the same cluster can have the same rowid.

One benefit of the above example is that we can replace SELECT * with DELETE in order to de-dupe the table.

Option 6

And finally, here’a another option that uses the rowid pseudocolumn:

SELECT * FROM Pets
WHERE rowid > (
  SELECT MIN(rowid) FROM Pets p2  
  WHERE Pets.PetName = p2.PetName
  AND Pets.PetType = p2.PetType
);

Result:

PETID	PETNAME	PETTYPE
1	Wag	Dog
4	Bark	Dog
4	Bark	Dog

Same result as the previous example.

As with the previous example, we can replace SELECT * with DELETE in order to remove duplicate rows from the table.

Источник

If you are looking for entries with unique id’s in your database where multiple keys in a column can occur, then a simple way finding them is to create two tables like exlplained below:

Here: TICKETID is a primary key, TKTNUMBER can occur multiple times.

CREATE TABLE TEMP
(
   TICKETID    FLOAT,
   TKTNUMBER   FLOAT
);

CREATE TABLE TEMP2
(
   TKTNUMBER   FLOAT,
   COUNTER     INTEGER
);

Put in all the TICKETID’s and TKTNUMBER’s by looking only on the TKTNUMBERS with COUNT(TKTNUMBER)>1:

INSERT INTO TEMP
   SELECT 
       TICKETID, 
       TKTNUMBER
   FROM YOUR_TABLE
   WHERE TKTNUMBER IN (  
            SELECT TKTNUMBER
            FROM YOUR_TABLE
            HAVING COUNT (TKTNUMBER) > 1
            GROUP BY TKTNUMBER);

Finally, to see the counter, put in the TKTNUMBER and COUNT the same way as above:

INSERT INTO TEMP2
    SELECT 
        TKTNUMBER, 
        COUNT (TKTNUMBER) AS COUNTER
    FROM YOUR_TABLE
    HAVING COUNT (TKTNUMBER) > 1
    GROUP BY TKTNUMBER
    ORDER BY 2 DESC

You can select as follows (by joining the two tables on tktnumber):

SELECT 
    T1.TICKETID,
    T1.TKTNUMBER,
    T2.COUNTER
FROM 
    TEMP T1 INNER JOIN 
    TEMP2 T2 ON 
        T2.TKTNUMBER = T1.TKTNUMBER
ORDER BY T2.COUNTER DESC

Источник

Updated: June 12, 2022
Initial: September 8, 2021

Aah…. duplicates! They are everywhere! Look around you – multiple charger cables, headphones, pictures in your smartphone! But we are not here to talk about those duplicates. No, Sir! We are here to address the duplicates in sql, how to find them and possibly resolve them in your SQL code.

In this SQL find duplicates post, let us look at 3 ways to identify duplicate rows/columns and then conclude by looking at 2 ways to mitigate them.

Using Count
MINUS Function
Analytic Functions

Let us start by looking at a very simple database table, USER_DIET. The below listed table shows the Fruit consumption of Sam and John over two days.

Just by looking at the data can you tell if there are duplicates in the table, say for the column “NAME”?

NAME	FRUIT	DAY
John	Apple	Monday
Sam	Orange	Monday
John	Orange	Tuesday
Sam	Banana	Tuesday
John	Peach	Wednesday
Sam	Banana	Wednesday

The most obvious answer is YES! John occurs 3 times and so does Sam.

How about if we were to look at columns NAME and FRUIT? Once again, the answer would be YES, because “Sam” and “Banana” occurs twice. Apparently, Sam loves bananas, while John prefers a different fruit every day.

Finally, let’s look at columns NAME, FRUIT and DAY. Do you see any duplicates now?

The answer is NO. There are no duplicates because both Sam and John had a different fruit on each day.

The point I would like to drive home is this! To truly understand if data is duplicate, you need to understand the context and the functionality behind it.

Note: All SQL examples below use Oracle SQL syntax. However, they should work across most relational databases with minimal changes.

Related post: Apache Spark SQL date functions

1. SQL Find Duplicates using Count

The most common method to find duplicates in sql is using the count function in a select statement. There are two other clauses that are key to finding duplicates: GROUP BY and HAVING.

Let us continue using the database table (USER_DIET) from the previous example and see if we can find duplicates for the NAME column.

a. Duplicates in a single column

SELECT name,count(*)
FROM user_diet
GROUP BY name
HAVING count(*)>1;

Output from SQL statement: 
NAME COUNT(*)
John    3
Sam     3

In this second example, let us look at finding duplicates in multiple columns: NAME and FRUIT.

Lets think this thru and put things in context before diving into our select statement. As yourself, what am I trying to find here ?

We are trying to find if any of the users, in this case, Sam/John had the same fruit twice. That it ! This context is based on the two fields NAME and FRUIT.

b. Duplicates in multiple columns

SELECT name, fruit, count(*)
FROM user_diet
GROUP BY  name, fruit
HAVING count(*)>1;

Output from SQL statement: 
NAME   FRUIT    COUNT(*)
Sam    Banana   2

Key to remember, the columns in the select statement, excluding the count(*) should be the exact same in the group by clause as well.

Also note that using the count(*) function gives you a count of the number of occurrences of a value. In this case, “Sam” + “Banana” occurs twice in the table, but in actuality we only have one duplicate row.

c. SQL to find duplicate rows

The SQL to find duplicate rows in a table is not the same as checking for duplicates in a column.

Ideally, if the database table has the right combination of key columns, you should not have duplicate rows. Regardless, if you are suspicious that your table has duplicate rows, perform the below steps.

Determine they Key columns on your table.
If the table does not have keys defined, determine which column(s) makes a row unique. Often times this depends on the functional use case of the data.
Add the fields from Step 1 or Step 2 to your SQL COUNT(*) clause.

Using the USER_DIET table above, lets assume no keys were defined on the table. Our next option would be determining which column(s) makes a row unique.

Note that the table has 3 rows. If Sam or Jon had the same fruit more than once on the same day, this would create a duplicate row.

Could Sam or Jon eating different fruits on the same day be considered a duplicate row?

The answer – Maybe! It depends on the functional use case of the data.

The SQL to find duplicate rows syntax is as shown below.

SELECT name, fruit, day, count(*) from user_diet
GROUP BY name, fruit, day
HAVING count(*)>1;

2. SQL Find Duplicates using MINUS function

The MINUS function works on two tables ( or datasets) and returns rows from the first table that does not belong in the second table. This option using the MINUS function in SQL, to find duplicates, is specific to Oracle. Use it for awareness and to validate your results using the count(*) method.

Find duplicates using MINUS function and rowid

SELECT name, rowid FROM user_diet
MINUS
SELECT name, MIN(rowid) FROM user_diet
GROUP BY  name;

Output from SQL statement: 
NAME   COUNT(*)
Sam    2

ROWID is a pseudo column in Oracle and contains a distinct ID for each row in a table.

The first select statement (before the MINUS function) returns 6 rows containing NAME and a distinct value for the ROWID column. The second select statement on the other hand returns 2 rows, one for Sam and one for John. Why do you think that is ?

It’s because of the min function on the ROWID column.

The final output contains the “actual” number of duplicate rows, and not the total number of rows like the count(*) function.

Find duplicates using MINUS function and rownum

SELECT name, rownum FROM user_diet 
MINUS
SELECT name, rownum FROM 
            (SELECT DISTINCT name FROM user_diet);

Output from SQL statement: 
NAME   COUNT(*)
Sam    2

In this second example, we used ROWNUM, which is a pseudo column used to uniquely identify the order of each row in a select statement.

So, what’s the difference between ROWNUM and ROWID in our example?

They are both pseudo columns in Oracle.

ROWNUM is a number and is generated on the result of the SQL statement. ROWID on the other hand is associated with each row of a table.

3. Find Duplicates in SQL using Analytic functions

Analytic functions are used to perform calculations on a grouping of data, normally called a “window”. This technique can be a bit confusing if you are just starting off with SQL, but it’s definitely worth knowing.

SELECT name, ROW_NUMBER() OVER ( PARTITION BY ssn ORDER BY ssn) AS rnum 
FROM user_diet;

Output from SQL statement: 
NAME   RNUM
John   1
John   2
John   3
Sam    1
Sam    2
Sam    3

What are we doing here?

We are attempting to find if any duplicates exist for the column NAME.

Let’s break down this SQL and make sense of it.

The function ROW_NUMBER() assigns a number starting at 1 to the rows returned by the PARTITION window.

In our case, since we partitioned our dataset on the NAME column, we have 2 datasets: one for Sam and one for John. ROW_NUMBER() now assigns a unique number to each of the 3 rows for Sam, resets the counter and then does the same for John.

The resulting output is as shown on the right side of the query.

One of the reasons I love this technique is because I can turn the above SQL into a nested subquery and get a distinct set of records as shown below.

SELECT name FROM (
SELECT name, ROW_NUMBER() OVER ( PARTITION BY name ORDER BY name) AS rnum FROM user_diet)
WHERE rnum = 1;

Conclusion

A final tidbit, SQL is not limited to transactional databases.

Apache Spark has a module called Spark SQL to handle structured data. AWS Athena even lets you write SQL against files!

The demand for SQL skills is endless. So play around with what you learned here. Try selecting multiple columns, switch the PARTITIONS, change the SORT order. Practice is the best way to master something !

SQL helpful links

Interested in our services ?

email us at : info@obstkel.com

We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept”, you consent to the use of ALL the cookies.

Источник

Setting up a sample table

Finding duplicate rows using the aggregate function

Finding duplicate records using analytic function

Sample Data

Option 1

Option 2

Option 3

Option 4

Option 5

Option 6

1. SQL Find Duplicates using Count

a. Duplicates in a single column

b. Duplicates in multiple columns

c. SQL to find duplicate rows

2. SQL Find Duplicates using MINUS function

Find duplicates using MINUS function and rowid

Find duplicates using MINUS function and rownum

3. Find Duplicates in SQL using Analytic functions

Conclusion

SQL helpful links

Table of Contents

Interested in our services ?

email us at : info@obstkel.com

Copyright 2022 © OBSTKEL LLC. All rights Reserved