Assign unique value to field in duplicate records group during groupingBy

 

问题

https://stackoverflow.com/questions/68703671/assign-unique-value-to-field-in-duplicate-records-group-during-groupingby

According to the reply provided by devReddithere, I did grouping of CSV records (same client names) of following test file (fake data):

CSV test file

id,name,mother,birth,center
1,AntonioCarlosdaSilva,AnadaSilva,2008/03/31,1
2,CarlosRobertodeSouza,AmáliaMariadeSouza,2004/12/10,1
3,PedrodeAlbuquerque,MariadeAlbuquerque,2006/04/03,2
4,DanilodaSilvaCardoso,SôniadePaulaCardoso,2002/08/10,3
5,RalfodosSantosFilho,HelenadosSantos,2012/02/21,4
6,PedrodeAlbuquerque,MariadeAlbuquerque,2006/04/03,2
7,AntonioCarlosdaSilva,AnadaSilva,2008/03/31,1
8,RalfodosSantosFilho,HelenadosSantos,2012/02/21,4
9,RosanaPereiradeCampos,IvanaMariadeCampos,2002/07/16,3
10,PaulaCristinadeAbreu,CristinaPereiradeAbreu,2014/10/25,2
11,PedrodeAlbuquerque,MariadeAlbuquerque,2006/04/03,2
12,RalfodosSantosFilho,HelenadosSantos,2012/02/21,4

Client Entity

packageentities;

publicclassClient{

privateStringid;
privateStringname;
privateStringmother;
privateStringbirth;
privateStringcenter;
 
publicClient(){
}

publicClient(Stringid,Stringname,Stringmother,Stringbirth,Stringcenter){
this.id=id;
this.name=name;
this.mother=mother;
this.birth=birth;
this.center=center;
}

publicStringgetId(){
returnid;
}

publicvoidsetId(Stringid){
this.id=id;
}

publicStringgetName(){
returnname;
}

publicvoidsetName(Stringname){
this.name=name;
}

publicStringgetMother(){
returnmother;
}

publicvoidsetMother(Stringmother){
this.mother=mother;
}

publicStringgetBirth(){
returnbirth;
}

publicvoidsetBirth(Stringbirth){
this.birth=birth;
}

publicStringgetCenter(){
returncenter;
}

publicvoidsetCenter(Stringcenter){
this.center=center;
}
 
@Override
publicStringtoString(){
return"Client[id="+id+",name="+name+",mother="+mother+",birth="+birth+",center="+center
+"]";
}
 
}

Program

packageapplication;
 
importjava.io.IOException;
importjava.nio.file.Files;
importjava.nio.file.Paths;
importjava.util.LinkedHashMap;
importjava.util.List;
importjava.util.Map;
importjava.util.function.Function;
importjava.util.regex.Pattern;
importjava.util.stream.Collectors;
 
importentities.Client;
 
publicclassProgram{
 
publicstaticvoidmain(String[]args)throwsIOException{
 
Patternpattern=Pattern.compile(",");
 
List<Client>file=Files.lines(Paths.get("src/Client.csv"))
.skip(1)
.map(line->{
String[]fields=pattern.split(line);
returnnewClient(fields[0],fields[1],fields[2],fields[3],fields[4]);
})
.collect(Collectors.toList());

Map<String,List<Client>>grouped=file
.stream()
.filter(x->file.stream().anyMatch(y->isDuplicate(x,y)))
.collect(Collectors.toList())
.stream()
.collect(Collectors.groupingBy(p->p.getCenter(),LinkedHashMap::new,Collectors.mapping(Function.identity(),Collectors.toList())));

grouped.entrySet().forEach(System.out::println);
}
}

privatestaticBooleanisDuplicate(Clientx,Clienty){

return!x.getId().equals(y.getId())
&&x.getName().equals(y.getName())
&&x.getMother().equals(y.getMother())
&&x.getBirth().equals(y.getBirth());
}

Final Result (Grouped by Center)

1=[Client[id=1,name=AntonioCarlosdaSilva,mother=AnadaSilva,birth=2008/03/31,center=1],
Client[id=7,name=AntonioCarlosdaSilva,mother=AnadaSilva,birth=2008/03/31,center=1]]
2=[Client[id=3,name=PedrodeAlbuquerque,mother=MariadeAlbuquerque,birth=2006/04/03,center=2],
Client[id=5,name=RalfodosSantosFilho,mother=HelenadosSantos,birth=2012/02/21,center=2],
Client[id=6,name=PedrodeAlbuquerque,mother=MariadeAlbuquerque,birth=2006/04/03,center=2],
Client[id=8,name=RalfodosSantosFilho,mother=HelenadosSantos,birth=2012/02/21,center=2],
Client[id=11,name=PedrodeAlbuquerque,mother=MariadeAlbuquerque,birth=2006/04/03,center=2],
Client[id=12,name=RalfodosSantosFilho,mother=HelenadosSantos,birth=2012/02/21,center=2]]

What I Need

I need to assign a unique value to each group of repeated records, starting over each time center value changes, even keeping the records together, since map does not guarantee this, according to the example below:

Numbers at left show the grouping by center (1 and 2). Repeated names have the same inner group number and start from "1". When the center number changes, the inner group numbers should be restarted from "1" again and so on.

1=[Client[group=1,id=1,name=AntonioCarlosdaSilva,mother=AnadaSilva,birth=2008/03/31,center=1],
Client[group=1,id=7,name=AntonioCarlosdaSilva,mother=AnadaSilva,birth=2008/03/31,center=1]]

//CENTERCHANGED(2)-Restartinnergroupnumberto"1"again.

2=[Client[group=1,id=3,name=PedrodeAlbuquerque,mother=MariadeAlbuquerque,birth=2006/04/03,center=2],
Client[group=1,id=6,name=PedrodeAlbuquerque,mother=MariadeAlbuquerque,birth=2006/04/03,center=2],
Client[group=1,id=11,name=PedrodeAlbuquerque,mother=MariadeAlbuquerque,birth=2006/04/03,center=2],
 
//NAMECHANGED,BUTSAMECENTERYET-soincreasesby"1"(group=2)
 
Client[group=2,id=5,name=RalfodosSantosFilho,mother=HelenadosSantos,birth=2012/02/21,center=2],
Client[group=2,id=8,name=RalfodosSantosFilho,mother=HelenadosSantos,birth=2012/02/21,center=2],
Client[group=2,id=12,name=RalfodosSantosFilho,mother=HelenadosSantos,birth=2012/02/21,center=2]]

解答

这个问题需要将csv 中的数据按 center 分组,再对组内 name 连续排名。Java 实现则代码较长。

Java 下的开源包 SPL 很容易写,只要 1 句:


A

1

=file("client.csv":"UTF-8").import@ct().sort(center,name).derive(ranki(name;center):group)

SPL提供了JDBC 供 JAVA 调用,把上面的脚本存为 dense_rank.splx,在 JAVA 中以存储过程的方式调用脚本文件:

Class.forName("com.esproc.jdbc.InternalDriver");

con= DriverManager.getConnection("jdbc:esproc:local://");

st=con.prepareCall("call dense_rank ()");

st.execute();

或在JAVA 中以 SQL 方式直接执行 SPL 串:

st = con.prepareStatement("==file(\"client.csv\":\"UTF-8\").import@ct().sort(center,name).derive(ranki(name;center):group)");
st.execute();

SPL 源代码:https://github.com/SPLWare/esProc

问答搜集