February 17, 2011
Getting rid of duplicate table rows in mysql seems to be a very often issue. And there are few well-known solutions like e.g. creating temporary table and copying there only unique ones with group by and then drop the old and rename the newly one. But this solution is bad because we lose auto_increment counters and we might not have all grant privileges to do that. Another one is creating unique index on the field(s) that are considered to be the “key”. But it’s not always possible, assume you have a column varchar(1024) and it is in utf-8 so it’s more than 1024 bytes – the maximum allowed key length in mysql.
I found a nice article https://www.xaprb.com/blog/2006/10/11/how-to-delete-duplicate-rows-with-sql/ and the first option worked for me. The idea is that you select unique records in inner query (with gouping) , then join it on the same table again but with condition minimum PK of inner != PK of outer one, that gives us the list of redundant rows which we just delete.
In my case I have
title varchar(1024) column that is considered to be unique key. So the query would look like:
DELETE bad_rows.* FROM offers AS bad_rows
INNER JOIN (
SELECT `title`,min(`id`) AS min_id FROM `offers` GROUP BY `title`,`provider_id`,`city_id` HAVING COUNT(*) > 1
) AS good_rows ON good_rows.title = bad_rows.title AND good_rows.min_id != bad_rows.id