Customize Consent Preferences

We use cookies to help you navigate efficiently and perform certain functions. You will find detailed information about all cookies under each consent category below.

The cookies that are categorized as "Necessary" are stored on your browser as they are essential for enabling the basic functionalities of the site. ... 

Always Active

Necessary cookies are required to enable the basic features of this site, such as providing secure log-in or adjusting your consent preferences. These cookies do not store any personally identifiable data.

No cookies to display.

Functional cookies help perform certain functionalities like sharing the content of the website on social media platforms, collecting feedback, and other third-party features.

No cookies to display.

Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics such as the number of visitors, bounce rate, traffic source, etc.

No cookies to display.

Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.

No cookies to display.

Advertisement cookies are used to provide visitors with customized advertisements based on the pages you visited previously and to analyze the effectiveness of the ad campaigns.

No cookies to display.

SQL Server – Finding duplicate records

Finding rows on a table is straight forward when filtering on columns, but what if you want to find duplicate data.

Let’s say I have a table that represents parents and children. The key to the table is the parent guid which is unique. Each row also has a child guid key that unique to the child, but is not unique in the sense that the child could have accidentally been added to the table twice.

Let’s look at the table:

SELECT * FROM parent_child

1

As you can see, David Doe was entered twice. How do you narrow your query to group like rows?

SELECT CHILD_GUID, FIRSTNAME, LASTNAME
FROM parent_child
GROUP BY CHILD_GUID, FIRSTNAME,LASTNAME

The SQL statement is selecting those columns that would be unique among duplicated rows. These columns are then added to a ‘GROUP BY’ clause to group like rows together. As you can see above, David Doe is in the table twice. We want to see David Doe once. The results are as follows:

3

But now you only want to return rows that are duplicate only. This is where the ‘HAVING’ clause comes in to play. By adding HAVING COUNT(*) > 1, only rows that are duplicated more than once will return in the result set.

SELECT CHILD_GUID, FIRSTNAME, LASTNAME
FROM parent_child
GROUP BY CHILD_GUID, FIRSTNAME,LASTNAME
HAVING COUNT(*) > 1

2

Now we see only the duplicate records, but you can take it a step further by saying show me the duplicates of the duplicate records. By placing the latter SQL code into a temporary table, you can then select the temporary table and join the original table to it where the child guid’s are equal as follows:

DECLARE @ParentChild TABLE
 (child_guid uniqueidentifier, FIRSTNAME varchar(50), LASTNAME varchar(50))
INSERT INTO @ParentChild(child_guid, FIRSTNAME, LASTNAME)
SELECT CHILD_GUID, FIRSTNAME, LASTNAME
FROM parent_child
GROUP BY CHILD_GUID, FIRSTNAME,LASTNAME
HAVING COUNT(*) > 1

SELECT parent_child.PARENT_GUID, temptable.child_guid, 
       temptable.FIRSTNAME, temptable.LASTNAME
FROM @ParentChild temptable
LEFT OUTER JOIN parent_child
     on parent_child.CHILD_GUID = temptable.child_guid

The results are as follows:

11

Leave a Reply