有规则不定行文本结构化

【问题】

I have a CSV file with a non standardized content, it goes something like this:

John, 001
01/01/2015, hamburger
02/01/2015, pizza
03/01/2015, ice cream
Mary, 002
01/01/2015, hamburger
02/01/2015, pizza
John, 003
04/01/2015, chocolate

Now, what I'm trying to do is to write a logic in java to separate them.I would like"John, 001"as the header and to put all the rows under John, before Mary to be John's.

Will this be possible? Or should I just do it manually?

Edit: 
For the input, even though it is not standardized, a noticeable pattern is that the row that do not have names will always starts with a date.
My output goal would be a java object, where I can store it in the database eventually in the format below.

Name, hamburger, pizza, ice cream, chocolate
John, 01/01/2015, 02/01/2015, 03/01/2015, NA
Mary, 01/01/2015, 02/01/2015, NA, NA
John, NA, NA, NA, 04/01/2015

【回答】

本问题需要大量的结构化计算才能实现,JAVA缺乏相关的类库,实现过程复杂,代码可读性差。这种情况下可以用SPL辅助实现,代码更直观易懂:


A

B

1

=file("D:\\noneStand.csv").cursor@c()

=["hamburger","pizza","ice   cream","chocolate"]

2

=create(name,${foodlist})


3

for A1;!isdigit(left(#1,1))

=A3.to(2,).align(B1,#2)

4


=A2.record(A3.#1 | B3.(#1))

A1:以游标方式读入文件noneStand.csv,分隔符是逗号。

A2:创建存放结果的二维表。${foodlist}会将参数动态解析为表达式。foodlist为参数,参数值为hamburger,pizza,'ice cream',chocolate

undefined

A3:循环A1,每次将完整的一组数据存入A3。当某行第1个字段的首字符是字母时,这行之前的数据分为一组。B3,B4是循环的作用范围。

B3:A3(循环变量)的第2条以后的数据按foodlist对齐。比如Mary组对齐的结果是:

01/01/2015, hamburger

02/01/2015, pizza

NA,NA

NA,NA

B4:A2追加记录。A3.#1返回A3的第1条记录的第1个字段(比如:Mary)。B3.(#1)表示B3的第1个字段形成的集合,即[01/01/2015, 02/01/2015,NA,NA]"|"示合并。