Sql как найти повторяющиеся строки

I have a SQL Server database of organizations, and there are many duplicate rows. I want to run a select statement to grab all of these and the amount of dupes, but also return the ids that are associated with each organization.

A statement like:

SELECT     orgName, COUNT(*) AS dupes  
FROM         organizations  
GROUP BY orgName  
HAVING      (COUNT(*) > 1)

Will return something like

orgName        | dupes  
ABC Corp       | 7  
Foo Federation | 5  
Widget Company | 2

But I’d also like to grab the IDs of them. Is there any way to do this? Maybe like a

orgName        | dupeCount | id  
ABC Corp       | 1         | 34  
ABC Corp       | 2         | 5  
...  
Widget Company | 1         | 10  
Widget Company | 2         | 2

The reason being that there is also a separate table of users that link to these organizations, and I would like to unify them (therefore remove dupes so the users link to the same organization instead of dupe orgs). But I would like part manually so I don’t screw anything up, but I would still need a statement returning the IDs of all the dupe orgs so I can go through the list of users.

asked Jan 21, 2010 at 20:29

select o.orgName, oc.dupeCount, o.id
from organizations o
inner join (
    SELECT orgName, COUNT(*) AS dupeCount
    FROM organizations
    GROUP BY orgName
    HAVING COUNT(*) > 1
) oc on o.orgName = oc.orgName

answered Jan 21, 2010 at 20:32

D’Arcy RittichD’Arcy Rittich

166k40 gold badges287 silver badges283 bronze badges

You can run the following query and find the duplicates with max(id) and delete those rows.

SELECT orgName, COUNT(*), Max(ID) AS dupes 
FROM organizations 
GROUP BY orgName 
HAVING (COUNT(*) > 1)

But you’ll have to run this query a few times.

Bellash

7,4426 gold badges52 silver badges85 bronze badges

answered Jan 21, 2010 at 20:32

You can do it like this:

SELECT
    o.id, o.orgName, d.intCount
FROM (
     SELECT orgName, COUNT(*) as intCount
     FROM organizations
     GROUP BY orgName
     HAVING COUNT(*) > 1
) AS d
    INNER JOIN organizations o ON o.orgName = d.orgName

If you want to return just the records that can be deleted (leaving one of each), you can use:

SELECT
    id, orgName
FROM (
     SELECT 
         orgName, id,
         ROW_NUMBER() OVER (PARTITION BY orgName ORDER BY id) AS intRow
     FROM organizations
) AS d
WHERE intRow != 1

Edit: SQL Server 2000 doesn’t have the ROW_NUMBER() function. Instead, you can use:

SELECT
    o.id, o.orgName, d.intCount
FROM (
     SELECT orgName, COUNT(*) as intCount, MIN(id) AS minId
     FROM organizations
     GROUP BY orgName
     HAVING COUNT(*) > 1
) AS d
    INNER JOIN organizations o ON o.orgName = d.orgName
WHERE d.minId != o.id

answered Jan 21, 2010 at 20:37

PaulPaul

16.1k13 gold badges41 silver badges51 bronze badges

You can try this , it is best for you

 WITH CTE AS
    (
    SELECT *,RN=ROW_NUMBER() OVER (PARTITION BY orgName ORDER BY orgName DESC) FROM organizations 
    )
    select * from CTE where RN>1
    go

Soner Gönül

96.7k102 gold badges205 silver badges360 bronze badges

answered Nov 7, 2013 at 8:01

code savecode save

1,0441 gold badge9 silver badges15 bronze badges

The solution marked as correct didn’t work for me, but I found this answer that worked just great: Get list of duplicate rows in MySql

SELECT n1.* 
FROM myTable n1
INNER JOIN myTable n2 
ON n2.repeatedCol = n1.repeatedCol
WHERE n1.id <> n2.id

answered May 20, 2013 at 20:27

ecairolecairol

6,1731 gold badge27 silver badges25 bronze badges

If you want to delete duplicates:

WITH CTE AS(
   SELECT orgName,id,
       RN = ROW_NUMBER()OVER(PARTITION BY orgName ORDER BY Id)
   FROM organizations
)
DELETE FROM CTE WHERE RN > 1

answered Jun 17, 2016 at 9:51

akdakd

6,46816 gold badges68 silver badges111 bronze badges

select * from [Employees]

For finding duplicate Record
1)Using CTE

with mycte
as
(
select Name,EmailId,ROW_NUMBER() over(partition by Name,EmailId order by id) as Duplicate from [Employees]
)
select * from mycte

2)By Using GroupBy

select Name,EmailId,COUNT(name) as Duplicate from  [Employees] group by Name,EmailId

MD XF

7,8027 gold badges40 silver badges71 bronze badges

answered Nov 18, 2016 at 12:44

Select * from (Select orgName,id,
ROW_NUMBER() OVER(Partition By OrgName ORDER by id DESC) Rownum
From organizations )tbl Where Rownum>1

So the records with rowum> 1 will be the duplicate records in your table. ‘Partition by’ first group by the records and then serialize them by giving them serial nos.
So rownum> 1 will be the duplicate records which could be deleted as such.

answered Mar 10, 2015 at 5:58

Mike ClarkMike Clark

1,85014 silver badges21 bronze badges

select column_name, count(column_name)
from table_name
group by column_name
having count (column_name) > 1;

Src : https://stackoverflow.com/a/59242/1465252

answered Mar 17, 2015 at 4:01

iCrazybestiCrazybest

2,9152 gold badges23 silver badges24 bronze badges

select a.orgName,b.duplicate, a.id
from organizations a
inner join (
    SELECT orgName, COUNT(*) AS duplicate
    FROM organizations
    GROUP BY orgName
    HAVING COUNT(*) > 1
) b on o.orgName = oc.orgName
group by a.orgName,a.id

Roman Marusyk

23.1k24 gold badges71 silver badges114 bronze badges

answered Sep 15, 2015 at 7:01

select orgname, count(*) as dupes, id 
from organizations
where orgname in (
    select orgname
    from organizations
    group by orgname
    having (count(*) > 1)
)
group by orgname, id

answered Jan 21, 2010 at 20:49

JordãoJordão

55.1k12 gold badges112 silver badges144 bronze badges

You have several way for Select duplicate rows.

for my solutions , first consider this table for example

CREATE TABLE #Employee
(
ID          INT,
FIRST_NAME  NVARCHAR(100),
LAST_NAME   NVARCHAR(300)
)

INSERT INTO #Employee VALUES ( 1, 'Ardalan', 'Shahgholi' );
INSERT INTO #Employee VALUES ( 2, 'name1', 'lname1' );
INSERT INTO #Employee VALUES ( 3, 'name2', 'lname2' );
INSERT INTO #Employee VALUES ( 2, 'name1', 'lname1' );
INSERT INTO #Employee VALUES ( 3, 'name2', 'lname2' );
INSERT INTO #Employee VALUES ( 4, 'name3', 'lname3' );

First solution :

SELECT DISTINCT *
FROM   #Employee;

WITH #DeleteEmployee AS (
                     SELECT ROW_NUMBER()
                            OVER(PARTITION BY ID, First_Name, Last_Name ORDER BY ID) AS
                            RNUM
                     FROM   #Employee
                 )

SELECT *
FROM   #DeleteEmployee
WHERE  RNUM > 1

SELECT DISTINCT *
FROM   #Employee

Secound solution : Use identity field

SELECT DISTINCT *
FROM   #Employee;

ALTER TABLE #Employee ADD UNIQ_ID INT IDENTITY(1, 1)

SELECT *
FROM   #Employee
WHERE  UNIQ_ID < (
    SELECT MAX(UNIQ_ID)
    FROM   #Employee a2
    WHERE  #Employee.ID = a2.ID
           AND #Employee.FIRST_NAME = a2.FIRST_NAME
           AND #Employee.LAST_NAME = a2.LAST_NAME
)

ALTER TABLE #Employee DROP COLUMN UNIQ_ID

SELECT DISTINCT *
FROM   #Employee

and end of all solution use this command

DROP TABLE #Employee

answered Nov 7, 2013 at 7:06

Ardalan ShahgholiArdalan Shahgholi

11.8k20 gold badges107 silver badges144 bronze badges

i think i know what you need
i needed to mix between the answers and i think i got the solution he wanted:

select o.id,o.orgName, oc.dupeCount, oc.id,oc.orgName
from organizations o
inner join (
    SELECT MAX(id) as id, orgName, COUNT(*) AS dupeCount
    FROM organizations
    GROUP BY orgName
    HAVING COUNT(*) > 1
) oc on o.orgName = oc.orgName

having the max id will give you the id of the dublicate and the one of the original which is what he asked for:

id org name , dublicate count (missing out in this case) 
id doublicate org name , doub count (missing out again because does not help in this case)

only sad thing you get it put out in this form

id , name , dubid , name

hope it still helps

answered Oct 1, 2014 at 8:13

Suppose we have table the table ‘Student’ with 2 columns:

student_id int

student_name varchar

Records:
+------------+---------------------+
| student_id | student_name        |
+------------+---------------------+
|        101 | usman               |
|        101 | usman               |
|        101 | usman               |
|        102 | usmanyaqoob         |
|        103 | muhammadusmanyaqoob |
|        103 | muhammadusmanyaqoob |
+------------+---------------------+

Now we want to see duplicate records
Use this query:

select student_name,student_id ,count(*) c from student group by student_id,student_name having c>1;

+---------------------+------------+---+
| student_name        | student_id | c |
+---------------------+------------+---+
| usman               |        101 | 3 |
| muhammadusmanyaqoob |        103 | 2 |
+---------------------+------------+---+

Rob♦

27.3k16 gold badges82 silver badges97 bronze badges

answered Feb 9, 2018 at 7:28

I got a better option to get the duplicate records in a table

SELECT x.studid, y.stdname, y.dupecount
FROM student AS x INNER JOIN
(SELECT a.stdname, COUNT(*) AS dupecount
FROM student AS a INNER JOIN
studmisc AS b ON a.studid = b.studid
WHERE (a.studid LIKE '2018%') AND (b.studstatus = 4)
GROUP BY a.stdname
HAVING (COUNT(*) > 1)) AS y ON x.stdname = y.stdname INNER JOIN
studmisc AS z ON x.studid = z.studid
WHERE (x.studid LIKE '2018%') AND (z.studstatus = 4)
ORDER BY x.stdname

Result of the above query shows all the duplicate names with unique student ids and number of duplicate occurances

Click here to see the result of the sql

answered Jun 25, 2018 at 9:22

 /*To get duplicate data in table */

 SELECT COUNT(EmpCode),EmpCode FROM tbl_Employees WHERE Status=1 
  GROUP BY EmpCode HAVING COUNT(EmpCode) > 1

answered Apr 26, 2019 at 14:01

CodeCode

6595 silver badges9 bronze badges

I use two methods to find duplicate rows.
1st method is the most famous one using group by and having.
2nd method is using CTE — Common Table Expression.

As mentioned by @RedFilter this way is also right. Many times I find CTE method is also useful for me.

WITH TempOrg (orgName,RepeatCount)
AS
(
SELECT orgName,ROW_NUMBER() OVER(PARTITION by orgName ORDER BY orgName) 
AS RepeatCount
FROM dbo.organizations
)
select t.*,e.id from organizations   e
inner join TempOrg t on t.orgName= e.orgName
where t.RepeatCount>1

In the example above we collected the result by finding repeat occurrence using ROW_NUMBER and PARTITION BY. Then we applied where clause to select only rows which are on repeat count more than 1. All the result is collected CTE table and joined with Organizations table.

Source : CodoBee

David Buck

3,71335 gold badges31 silver badges35 bronze badges

answered May 5, 2020 at 9:40

IshrarIshrar

1051 silver badge6 bronze badges

Try

SELECT orgName, id, count(*) as dupes
FROM organizations
GROUP BY orgName, id
HAVING count(*) > 1;

answered Jan 22, 2014 at 6:29

Источник

To quickly see the duplicate rows you can run a single simple query

Here I am querying the table and listing all duplicate rows with same user_id, market_place and sku:

select user_id, market_place,sku, count(id)as totals from sku_analytics group by user_id, market_place,sku having count(id)>1;

To delete the duplicate row you have to decide which row you want to delete. Eg the one with lower id (usually older) or maybe some other date information. In my case I just want to delete the lower id since the newer id is latest information.

First double check if the right records will be deleted. Here I am selecting the record among duplicates which will be deleted (by unique id).

select a.user_id, a.market_place,a.sku from sku_analytics a inner join sku_analytics b where a.id< b.id and a.user_id= b.user_id and a.market_place= b.market_place and a.sku = b.sku;

Then I run the delete query to delete the dupes:

delete a from sku_analytics a inner join sku_analytics b where a.id< b.id and a.user_id= b.user_id and a.market_place= b.market_place and a.sku = b.sku;

Backup, Double check, verify, verify backup then execute.

Источник

Сборник запросов для поиска, изменения и удаления дублей в таблице MySQL по одному и нескольким полям. В примерах все запросы будут применятся к следующий таблице:

Поиск дубликатов

Подсчет дублей

Запрос подсчитает количество всех записей с одинаковыми значениями в поле `col1`.

SELECT
	`col1`,
	COUNT(`col1`) AS `count`
FROM
	`table`
GROUP BY
	`col1`
HAVING 
	`count` > 1

SQL

Подсчет дубликатов по нескольким полям:

SELECT
	`col1`,
	`col2`,
	COUNT(*) AS `count`
FROM
	`table`
GROUP BY
	`col1`,`col2`
HAVING 
	`count` > 1

SQL

Все записи с одинаковыми значениями

Запрос найдет все записи с одинаковыми значениями в `col1`.

SELECT 
	*
FROM 
	`table`
WHERE 
	`col1` IN (SELECT `col1` FROM `table` GROUP BY `col1` HAVING COUNT(*) > 1)
ORDER BY
	`col1`

SQL

Для одинаковых значений в `col1` и `col2`:

SELECT 
	*
FROM 
	`table`
WHERE 
	`col1` IN (SELECT `col1` FROM `table` GROUP BY `col1` HAVING COUNT(*) > 1)
	AND `col2` IN (SELECT `col2` FROM `table` GROUP BY `col2` HAVING COUNT(*) > 1)
ORDER BY
	`col1`

SQL

Получить только дубликаты

Запрос получит только дубликаты, в результат не попадают записи с самым ранним `id`.

SELECT 
	`table`.*
FROM
	`table`
LEFT OUTER JOIN 
	(SELECT MIN(`id`) AS `id`, `col1` FROM `table` GROUP BY `col1`) AS `tmp` 
ON 
	`table`.`id` = `tmp`.`id`  
WHERE
	`tmp`.`id` IS NULL

SQL

Для нескольких полей:

SELECT 
	`table`.*
FROM
	`table`
LEFT OUTER JOIN 
	(SELECT MIN(`id`) AS `id`, `col1`, `col2` FROM `table` GROUP BY `col1`, `col2`) AS `tmp` 
ON 
	`a`.`id` = `tmp`.`id`  
WHERE
	`tmp`.`id` IS NULL

SQL

Уникализация записей

Запрос сделает уникальные названия только у дублей, дописав `id` в конец `col1`.

UPDATE  
	`table`
LEFT OUTER JOIN 
	(SELECT MIN(`id`) AS `id`, `col1` FROM `table` GROUP BY `col1`) AS `tmp` 
ON 
	`table`.`id` = `tmp`.`id`
SET
	`table`.`col1` = CONCAT(`table`.`col1`, '-', `table`.`id`)    
WHERE
	`tmp`.`id` IS NULL

SQL

По нескольким полям:

UPDATE  
	`table`
LEFT OUTER JOIN 
	(SELECT MIN(`id`) AS `id`, `col1`, `col2` FROM `table` GROUP BY `col1`, `col2`) AS `tmp` 
ON 
	`table`.`id` = `tmp`.`id`
SET
	`table`.`col1` = CONCAT(`table`.`col1`, '-', `table`.`id`)    
WHERE
	`tmp`.`id` IS NULL

SQL

Удаление дубликатов

Удаление дублирующихся записей, останутся только уникальные.

DELETE
	`table`
FROM  
	`table`
LEFT OUTER JOIN 
	(SELECT MIN(`id`) AS `id`, `col1` FROM `table` GROUP BY `col1`) AS `tmp` 
ON 
	`table`.`id` = `tmp`.`id`  
WHERE
	`tmp`.`id` IS NULL

SQL

По нескольким полям:

DELETE
	`table`
FROM  
	`table`
LEFT OUTER JOIN 
	(SELECT MIN(`id`) AS `id`, `col1`, `col2` FROM `table` GROUP BY `col1`, `col2`) AS `tmp` 
ON 
	`table`.`id` = `tmp`.`id`  
WHERE
	`tmp`.`id` IS NULL

SQL

Источник

Updated: June 12, 2022
Initial: September 8, 2021

Aah…. duplicates! They are everywhere! Look around you – multiple charger cables, headphones, pictures in your smartphone! But we are not here to talk about those duplicates. No, Sir! We are here to address the duplicates in sql, how to find them and possibly resolve them in your SQL code.

In this SQL find duplicates post, let us look at 3 ways to identify duplicate rows/columns and then conclude by looking at 2 ways to mitigate them.

Using Count
MINUS Function
Analytic Functions

Let us start by looking at a very simple database table, USER_DIET. The below listed table shows the Fruit consumption of Sam and John over two days.

Just by looking at the data can you tell if there are duplicates in the table, say for the column “NAME”?

NAME	FRUIT	DAY
John	Apple	Monday
Sam	Orange	Monday
John	Orange	Tuesday
Sam	Banana	Tuesday
John	Peach	Wednesday
Sam	Banana	Wednesday

The most obvious answer is YES! John occurs 3 times and so does Sam.

How about if we were to look at columns NAME and FRUIT? Once again, the answer would be YES, because “Sam” and “Banana” occurs twice. Apparently, Sam loves bananas, while John prefers a different fruit every day.

Finally, let’s look at columns NAME, FRUIT and DAY. Do you see any duplicates now?

The answer is NO. There are no duplicates because both Sam and John had a different fruit on each day.

The point I would like to drive home is this! To truly understand if data is duplicate, you need to understand the context and the functionality behind it.

Note: All SQL examples below use Oracle SQL syntax. However, they should work across most relational databases with minimal changes.

Related post: Apache Spark SQL date functions

1. SQL Find Duplicates using Count

The most common method to find duplicates in sql is using the count function in a select statement. There are two other clauses that are key to finding duplicates: GROUP BY and HAVING.

Let us continue using the database table (USER_DIET) from the previous example and see if we can find duplicates for the NAME column.

a. Duplicates in a single column

SELECT name,count(*)
FROM user_diet
GROUP BY name
HAVING count(*)>1;

Output from SQL statement: 
NAME COUNT(*)
John    3
Sam     3

In this second example, let us look at finding duplicates in multiple columns: NAME and FRUIT.

Lets think this thru and put things in context before diving into our select statement. As yourself, what am I trying to find here ?

We are trying to find if any of the users, in this case, Sam/John had the same fruit twice. That it ! This context is based on the two fields NAME and FRUIT.

b. Duplicates in multiple columns

SELECT name, fruit, count(*)
FROM user_diet
GROUP BY  name, fruit
HAVING count(*)>1;

Output from SQL statement: 
NAME   FRUIT    COUNT(*)
Sam    Banana   2

Key to remember, the columns in the select statement, excluding the count(*) should be the exact same in the group by clause as well.

Also note that using the count(*) function gives you a count of the number of occurrences of a value. In this case, “Sam” + “Banana” occurs twice in the table, but in actuality we only have one duplicate row.

c. SQL to find duplicate rows

The SQL to find duplicate rows in a table is not the same as checking for duplicates in a column.

Ideally, if the database table has the right combination of key columns, you should not have duplicate rows. Regardless, if you are suspicious that your table has duplicate rows, perform the below steps.

Determine they Key columns on your table.
If the table does not have keys defined, determine which column(s) makes a row unique. Often times this depends on the functional use case of the data.
Add the fields from Step 1 or Step 2 to your SQL COUNT(*) clause.

Using the USER_DIET table above, lets assume no keys were defined on the table. Our next option would be determining which column(s) makes a row unique.

Note that the table has 3 rows. If Sam or Jon had the same fruit more than once on the same day, this would create a duplicate row.

Could Sam or Jon eating different fruits on the same day be considered a duplicate row?

The answer – Maybe! It depends on the functional use case of the data.

The SQL to find duplicate rows syntax is as shown below.

SELECT name, fruit, day, count(*) from user_diet
GROUP BY name, fruit, day
HAVING count(*)>1;

2. SQL Find Duplicates using MINUS function

The MINUS function works on two tables ( or datasets) and returns rows from the first table that does not belong in the second table. This option using the MINUS function in SQL, to find duplicates, is specific to Oracle. Use it for awareness and to validate your results using the count(*) method.

Find duplicates using MINUS function and rowid

SELECT name, rowid FROM user_diet
MINUS
SELECT name, MIN(rowid) FROM user_diet
GROUP BY  name;

Output from SQL statement: 
NAME   COUNT(*)
Sam    2

ROWID is a pseudo column in Oracle and contains a distinct ID for each row in a table.

The first select statement (before the MINUS function) returns 6 rows containing NAME and a distinct value for the ROWID column. The second select statement on the other hand returns 2 rows, one for Sam and one for John. Why do you think that is ?

It’s because of the min function on the ROWID column.

The final output contains the “actual” number of duplicate rows, and not the total number of rows like the count(*) function.

Find duplicates using MINUS function and rownum

SELECT name, rownum FROM user_diet 
MINUS
SELECT name, rownum FROM 
            (SELECT DISTINCT name FROM user_diet);

Output from SQL statement: 
NAME   COUNT(*)
Sam    2

In this second example, we used ROWNUM, which is a pseudo column used to uniquely identify the order of each row in a select statement.

So, what’s the difference between ROWNUM and ROWID in our example?

They are both pseudo columns in Oracle.

ROWNUM is a number and is generated on the result of the SQL statement. ROWID on the other hand is associated with each row of a table.

3. Find Duplicates in SQL using Analytic functions

Analytic functions are used to perform calculations on a grouping of data, normally called a “window”. This technique can be a bit confusing if you are just starting off with SQL, but it’s definitely worth knowing.

SELECT name, ROW_NUMBER() OVER ( PARTITION BY ssn ORDER BY ssn) AS rnum 
FROM user_diet;

Output from SQL statement: 
NAME   RNUM
John   1
John   2
John   3
Sam    1
Sam    2
Sam    3

What are we doing here?

We are attempting to find if any duplicates exist for the column NAME.

Let’s break down this SQL and make sense of it.

The function ROW_NUMBER() assigns a number starting at 1 to the rows returned by the PARTITION window.

In our case, since we partitioned our dataset on the NAME column, we have 2 datasets: one for Sam and one for John. ROW_NUMBER() now assigns a unique number to each of the 3 rows for Sam, resets the counter and then does the same for John.

The resulting output is as shown on the right side of the query.

One of the reasons I love this technique is because I can turn the above SQL into a nested subquery and get a distinct set of records as shown below.

SELECT name FROM (
SELECT name, ROW_NUMBER() OVER ( PARTITION BY name ORDER BY name) AS rnum FROM user_diet)
WHERE rnum = 1;

Conclusion

A final tidbit, SQL is not limited to transactional databases.

Apache Spark has a module called Spark SQL to handle structured data. AWS Athena even lets you write SQL against files!

The demand for SQL skills is endless. So play around with what you learned here. Try selecting multiple columns, switch the PARTITIONS, change the SORT order. Practice is the best way to master something !

SQL helpful links

Interested in our services ?

email us at : info@obstkel.com

We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept”, you consent to the use of ALL the cookies.

Источник

Приветствую всех на сайте Info-Comp.ru! В этой небольшой заметке я покажу, как можно на SQL вывести повторяющиеся значения в столбце таблицы в Microsoft SQL Server. Все будет рассмотрено очень подробно и с примерами.

Заметка! Профессиональный видеокурс по T-SQL для начинающих.

Содержание

Исходные данные для примеров
Выводим повторяющиеся значения в столбце на T-SQL
Выводим все строки с повторяющимися значениями на T-SQL

Исходные данные для примеров

Сначала давайте я расскажу, какие данные я буду использовать в статье, чтобы Вы четко понимали и видели, какие результаты будут возвращаться, если выполнять те или иные действия.

Сразу скажу, что все данные тестовые.

Следующей инструкцией мы создаем таблицу Goods и добавляем в нее несколько строк, в некоторых из которых значение столбца Price будет повторяться.

Останавливаться на том, что делает та или иная инструкция, я не буду, так как это другая тема, если Вам интересно, можете более подробно посмотреть в следующих статьях:

Создание таблиц в Microsoft SQL Server (CREATE TABLE);
Добавление данных в таблицы Microsoft SQL Server (INSERT INTO).

  
   --Создание таблицы Goods
  CREATE TABLE Goods (
        ProductId       INT IDENTITY(1,1) NOT NULL CONSTRAINT PK_ProductId PRIMARY KEY,
        ProductName     VARCHAR(100) NOT NULL,
        Price           MONEY NULL,
  );
   GO

  --Добавление строк в таблицу Goods
   INSERT INTO Goods(ProductName, Price)
        VALUES ('Системный блок', 100),
                   ('Монитор', 200),
                   ('Сканер', 150),
                   ('Принтер', 200),
                   ('Клавиатура', 50),
                   ('Смартфон', 300),
                   ('Мышь', 20),
                   ('Планшет', 300),
                   ('Процессор', 200);
   GO

   --Выборка данных
   SELECT ProductId, ProductName, Price
   FROM Goods;

Вы видите, какие данные есть, именно к ним я буду посылать SQL запрос, который будет определять и выводить повторяющиеся значения в столбце Price.

Основной алгоритм определения повторяющихся значений в столбце состоит в том, что нам нужно сгруппировать все строки по столбцу, в котором необходимо найти повторяющиеся значения, и подсчитать количество строк в каждой сгруппированной строке, а затем просто поставить фильтр (>1) на итоговое количество, отбросив тем самым строки со значением 1, т.е. если значение встречается всего один раз, значит, оно не повторяется, и нам не нужно.

Вот пример всего вышесказанного.

   
   --Определяем повторяющиеся значения в столбце
   SELECT Price, COUNT(*) AS CNT
   FROM Goods
   GROUP BY Price
   HAVING COUNT(*) > 1;

Мы видим, что у нас есть всего два значения, которые повторяются — это 200 и 300. Первое значение, т.е. 200, повторяется 3 раза, второе — 2 раза.

Данные сгруппировали мы конструкцией GROUP BY, подсчитали количество значений встроенной функцией COUNT, а отфильтровали сгруппированные строки конструкцией HAVING.

Выводим все строки с повторяющимися значениями на T-SQL

Но в большинстве случаев просто узнать повторяющиеся в столбце значения недостаточно, иногда необходимо вывести все записи в этой таблице, которые содержат эти повторяющиеся значения.

Это можно реализовать с помощью подзапроса, но использовать подзапрос, в котором будет группировка, не очень удобно, и уж точно неудобочитаемо. Поэтому мне нравится в каких-то подобных случаях использовать CTE (обобщённое табличное выражение) для повышения читабельности кода. Также чтобы сделать результирующий набор данных более наглядным, его можно отсортировать по целевому столбцу, тем самым мы сразу увидим строки с повторяющимися значениями.

Вот пример, в котором мы выводим все строки с повторяющимися значениями в столбце, отсортированные по столбцу Price.

   
   --Выводим все строки с повторяющимися значениями
   WITH DuplicateValue AS (
        SELECT Price, COUNT(*) AS CNT
        FROM Goods
        GROUP BY Price
        HAVING COUNT(*) > 1
   )
   SELECT ProductId, ProductName, Price 
   FROM Goods
   WHERE Price IN (SELECT Price FROM DuplicateValue)
   ORDER BY Price, ProductId;

Как видим, сначала у нас идут все строки со значением 200, а затем строки со значением 300. Сортировку мы осуществили конструкцией ORDER BY. Если у Вас возникает вопрос, что такое DuplicateValue, то это всего лишь название CTE выражения, в принципе Вы его можете назвать и по-другому.

Заметка!

Для комплексного изучения языка T-SQL рекомендую почитать мои книги и пройти курсы:

SQL код – самоучитель по языку SQL для начинающих;
Стиль программирования на T-SQL – основы правильного написания кода. Книга, направленная на повышение качества T-SQL кода;
Профессиональные видеокурсы по T-SQL.

У меня на этом все, надеюсь, материал был Вам полезен. Удачи Вам, пока!

Источник

Поиск дубликатов

Подсчет дублей

Подсчет дубликатов по нескольким полям:

Все записи с одинаковыми значениями

Получить только дубликаты

Для нескольких полей:

Уникализация записей

По нескольким полям:

Удаление дубликатов

По нескольким полям:

1. SQL Find Duplicates using Count

a. Duplicates in a single column

b. Duplicates in multiple columns

c. SQL to find duplicate rows

2. SQL Find Duplicates using MINUS function

Find duplicates using MINUS function and rowid

Find duplicates using MINUS function and rownum

3. Find Duplicates in SQL using Analytic functions

Conclusion

SQL helpful links

Table of Contents

Interested in our services ?

email us at : info@obstkel.com

Copyright 2022 © OBSTKEL LLC. All rights Reserved

Исходные данные для примеров

Выводим все строки с повторяющимися значениями на T-SQL