MySQL MariaDB

MySQL Find Duplicate Values in Table

Redundant data can be kept in the table by the database program, influencing the database’s output in MySQL. Data replication, though, happens for different purposes, and it is an important job to identify the duplicate values in the table when dealing with a MySQL database. Broadly speaking, it is smart to use clear restrictions on a table often to store information that prevents redundant rows. Sometimes, in a MySQL database, you might like to calculate the number of repeated values. We addressed this question in this topic, in which you will learn about how to locate duplicate values through different ways and how to count duplicate values.

To get started, you must have MySQL installed on your system with its utilities: MySQL workbench and command-line client shell. After that, you should have some data or values in your database tables as duplicates. Let’s explore this with some examples. First of all, open your command-line client shell from your desktop taskbar and type your MySQL password upon asked.

We have found different methods to find duplicated in a table. Have a look at them one by one.

Search Duplicates in a Single Column

First, you must know about the syntax of the query used to check and count duplicates for a single column.

>> SELECT col COUNT(col) FROM table GROUP BY col HAVING COUNT(col) > 1;

Here is the explanation of the above query:

  • Column: Name of the column to be checked.
  • COUNT(): the function used to count many duplicate values.
  • GROUP BY: the clause used to group all rows according to that particular column.

We have created a new table called ‘animals’ in our MySQL database ‘data’ having duplicate values. It has six columns with different values in it, e.g., id, Name, Species, Gender, Age, and Price providing information regarding different pets. Upon calling this table using the SELECT query, we get the below output on our MySQL command-line client shell.

>> SELECT * FROM data.animals;

Now, we will try to find the redundant and repeated values from the above table by using the COUNT and GROUP BY clause in the SELECT query. This query will count the Names of pets which are located less than 3 times in the table. After that, it will display those Names as below.

>> SELECT Name COUNT(Name) FROM data.animals GROUP BY Name HAVING COUNT(Name) < 3;

Using the same query to get different results while changing the COUNT number for Names of pets as shown below.

>> SELECT Name COUNT(Name) FROM data.animals GROUP BY Name HAVING COUNT(Name) > 3;

To get results for a total of 3 duplicate values for Names of pets as shown below.

>> SELECT Name COUNT(Name) FROM data.animals GROUP BY Name HAVING COUNT(Name) = 3;

Search Duplicates in Multiple Columns

The syntax of the query to check or count duplicates for multiple columns is as follows:

>> SELECT col1, COUNT(col1), col2, COUNT(col2) FROM table GROUP BY col1, col2 HAVING COUNT(col1) > 1 AND  COUNT(col2) > 1;

Here is the explanation of the above query:

  • col1, col2: name of the columns to be checked.
  • COUNT(): the function used to count several duplicate values.
  • GROUP BY: the clause used to group all rows according to that specific column.

We have been using the same table called ‘animals’ having duplicate values. We got the below output while utilizing the above query for checking the duplicate values in multiple columns. We have been checking and counting the duplicate values for columns Gender and Price while grouped by the column Price. It will show the pet genders and their prices which are residing in the table as duplicates not more than 5.

>> SELECT Gender, COUNT(Gender), Price, COUNT(Price) FROM data.animals GROUP BY Price HAVING COUNT(Price) < 5 AND  COUNT(Gender) < 5;

Search Duplicates in Single Table Using INNER JOIN

Here is the basic syntax for finding duplicates in a single table:

>> SELECT col1, col2, table.col FROM table INNER JOIN(SELECT col FROM table GROUP BY col HAVING COUNT(col1) > 1) temp ON table.col= temp.col;

Here is the narrative of the overhead query:

  • Col: the name of the column to be checked and selected for duplicates.
  • Temp: keyword to apply inner join on a column.
  • Table: name of the table to be checked.

We have a new table, ‘order2’ with duplicate values in the column OrderNo as shown below.

>> SELECT * FROM data.order2;

We are selecting three columns: Item, Sales, OrderNo to be shown in the output. While the column OrderNo is used to check duplicates. The inner join will select the values or rows having the values of Items more than one in a table. Upon executing, we will get the results below.

>> SELECT Item, Sales, order2.OrderNo FROM data.order2 INNER JOIN(SELECT OrderNo FROM data.order2 GROUP BY OrderNo HAVING COUNT(Item) > 1) temp ON order2.OrderNo= temp.OrderNo;

Search Duplicates in Multiple Tables Using INNER JOIN

Here is the simplified syntax for finding duplicates in multiple tables:

>> SELECT col FROM table1 INNER JOIN table2 ON table1.col = table2.col;

Here is the description of the overhead query:

  • col: name of the columns to be checked and selected.
  • INNER JOIN: the function used to Join two tables.
  • ON: used to join two tables according to provided columns.

We have two tables, ‘ order1’ and ‘order2’, in our database having the ‘OrderNo’ column in both as displayed below.

We will be using the INNER join to combine the duplicates of two tables according to a specified column. The INNER JOIN clause will get all the data from both the tables by joining them, and the ON clause will relate the same name columns from both tables, e.g., OrderNo.

>> SELECT * FROM data.order1 INNER JOIN data.order2 ON order1.OrderNo = order2.OrderNO;

To get the particular columns in an output, try the below command:

>> SELECT Region, Status, Item, Sales FROM data.order1 INNER JOIN data.order2 ON order1.OrderNo = order2.OrderNO;

Conclusion

We could now search for multiple copies in one or several tables of MySQL information and recognize the GROUP BY, COUNT, and INNER JOIN function. Make sure that you have built the tables properly and also that the right columns are chosen.

About the author

Aqsa Yasin

Aqsa Yasin

I am a self-motivated information technology professional with a passion for writing. I am a technical writer and love to write for all Linux flavors and Windows.