Finding duplicate values in multiple colums in a SQL table and count for chars

 

问题

https://stackoverflow.com/questions/70513319/finding-duplicate-values-in-multiple-colums-in-a-sql-table-and-count-for-chars

referring to this question:

Finding duplicate values in multiple colums in a SQL table and count

I have the following table structure:

idname1name2name3...
1HansPeterFrank
2HansFrankPeter
3HansPeterFrank
4PaulPeterHans
.
.
.

I use the following command to display the duplicates and the counts:

SELECTCOUNT(name1),name1,name2,name3
FROMtable
GROUPBYname1,name2,name3
HAVING(COUNT(name1)>1)AND(COUNT(name2)>1)AND(COUNT(name3)>1)

This command gives me a count of 2. I would like to know how the second line could also be counted as a dublicate.

Unfortunately, the solution to the original question (Finding duplicate values in multiple colums in a SQL table and count) does not work for char

解答

先将每行记录按列排序,再分组计数,最后找出计数最大的记录即可。这类运算用SQL写起来很麻烦,需要用嵌套SQL并且借助窗口函数才能实现,写出的SQL冗长难读。通常的办法是读出来用PythonSPL来做, SPL(一种 Java 的开源包)更容易被Java应用集成,代码也更简单一点,只要三句:


A

1

=sqlite.query("select name1,name2,name3 from names")

2

=A1.group([#1,#2,#3].sort();~.len()).maxp(#2)

3

=create(count,name1,name2,name3).record([A2.#2,A2.#1(1),A2.#1(2),A2.#1(3)])

SPL源代码:https://github.com/SPLWare/esProc

问答搜集