关于表中重复数据的处理--MySQL
2018-03-09 本文已影响0人
yulong_vip
表结构及数据如下
SELECT * FROM test!
test.png
一. 根据字段查询表中重复的数据
1.根据字段test1,test2,查询表中重复的数据
SELECT * FROM test a WHERE (a.test1,a.test2) IN (SELECT test1,test2 FROM test GROUP BY test1,test2 HAVING COUNT(*)>1)
结果如下
1.png
2.根据字段test1,test3,查询表中重复的数据
SELECT * FROM test a WHERE (a.test1,a.test3) IN (SELECT test1,test3 FROM test GROUP BY test1,test3 HAVING COUNT(*)>1)
结果如下
2.png
二.删除表中重复数据,只保留一条
1.Mysql使用变量方法实现行号功能 (类似于Oracle中的rowid函数)
SELECT @rowid:=@rowid+1 as rowid,test.* FROM (select @rowid:=0) test,test
结果如下
2.1.png
2.结合以上方法,即可实现删除表中重复数据,只保留rowid最小的一行
根据字段test1,test3,查出需要删除的数据rowid:
SELECT * FROM
(SELECT @rowid:=@rowid+1 as rowid,test.* FROM (select @rowid:=0) test,test) c
WHERE rowid not in
(select min(rowid) from (SELECT @rowid:=@rowid+1 as rowid,test.* FROM (select @rowid:=0) test,test)b group by b.test1,b.test3 having count(*)>=1)
结果如下
2.2.png
根据字段test1,test3,删除重复数据
DELETE FROM
(SELECT @rowid:=@rowid+1 as rowid,test.* FROM (select @rowid:=0) test,test) t
WHERE rowid not in
(select min(rowid) from (SELECT @rowid:=@rowid+1 as rowid,test.* FROM (select @rowid:=0) test,test)b group by b.test1,b.test3 having count(*)>=1)
尴尬!设想中的以上写法为错误写法:原因是:更新数据时使用了查询,而查询的数据又做了更新的条件,mysql不支持这种方式。使用虚拟表的情况下再加一层封装发现仍无法规避这个问题(实体表则可以)?
尝试多次后,无奈使用折中解决办法,直接创建新表,把去除重复后的数据插入新表中,再删除旧表;
create temporary table temp1 as SELECT test1,test2,test3,test4 FROM
(SELECT @rowid:=@rowid+1 as rowid,test.* FROM (select @rowid:=0) test,test) t
WHERE rowid in
(select min(rowid) from (SELECT @rowid:=@rowid+1 as rowid,test.* FROM (select @rowid:=0) test,test)b group by b.test1,b.test3 having count(*)>=1)
SELECT * FROM temp1
结果如下
2.3.png
显然这个方法存在效率问题,如有更好解决方法请指教;