"【问题】 I have a CSV file with a non standardized content, it goes something like this: John, 001 0 .."

swan
离开的是风景，留下的是人生
707 浏览 • 4 年前

有规则不定行文本结构化

Excel处理

csv(17) 结构化(8)

【问题】

I have a CSV file with a non standardized content, it goes something like this:

John, 001
01/01/2015, hamburger
02/01/2015, pizza
03/01/2015, ice cream
Mary, 002
01/01/2015, hamburger
02/01/2015, pizza
John, 003
04/01/2015, chocolate

Now, what I'm trying to do is to write a logic in java to separate them.I would like"John, 001"as the header and to put all the rows under John, before Mary to be John's.

Will this be possible? Or should I just do it manually?

Edit:
For the input, even though it is not standardized, a noticeable pattern is that the row that do not have names will always starts with a date.
My output goal would be a java object, where I can store it in the database eventually in the format below.

Name, hamburger, pizza, ice cream, chocolate
John, 01/01/2015, 02/01/2015, 03/01/2015, NA
Mary, 01/01/2015, 02/01/2015, NA, NA
John, NA, NA, NA, 04/01/2015

【回答】

本问题需要大量的结构化计算才能实现，JAVA缺乏相关的类库，实现过程复杂，代码可读性差。这种情况下可以用SPL辅助实现，代码更直观易懂：

	A	B
1	=file("D:\\noneStand.csv").cursor@c()	=["hamburger","pizza","ice cream","chocolate"]
2	=create(name,${foodlist})
3	for A1;!isdigit(left(#1,1))	=A3.to(2,).align(B1,#2)
4		=A2.record(A3.#1 \| B3.(#1))

A1:以游标方式读入文件noneStand.csv，分隔符是逗号。

A2:创建存放结果的二维表。${foodlist}会将参数动态解析为表达式。foodlist为参数，参数值为hamburger,pizza,'ice cream',chocolate

undefined

A3:循环A1，每次将完整的一组数据存入A3。当某行第1个字段的首字符是字母时，这行之前的数据分为一组。B3,B4是循环的作用范围。

B3:将A3（循环变量）的第2条以后的数据按foodlist对齐。比如Mary组对齐的结果是：

01/01/2015, hamburger

02/01/2015, pizza

NA,NA

B4:向A2追加记录。A3.#1返回A3的第1条记录的第1个字段（比如:Mary）。B3.(#1)表示B3的第1个字段形成的集合，即[01/01/2015, 02/01/2015,NA,NA]。"|"表示合并。

csv(17) 结构化(8)

有规则不定行文本结构化

【问题】

【回答】

目录