文本结构化,分隔符有两种
【问题】
I have a .txt called readings, it has the following data in it:
-10,3NW,15cm,4:38
5,15SW,8mm,2:8
8,8ENE,2mm,:25
-5,0,7cm,1
-3,0,3mm
Where the first position represents the Temperature, speed, precipitation and time(hours and minutes)
I want to split the string with tokens = line.split(":");
only if the fourth token exists. My code for splitting the string without doing any splits with the delimiter :
is:
try {
input = new BufferedReader(new FileReader("readings.txt"));
line = input.readLine();
while (line != null ) {
tokens = line.split(",");
temperature = Integer.parseInt(tokens[0].trim());
tokens[1] = tokens[1].trim();
separation = firstNonNumericPosition(tokens[1]);
if (separation == 0 || (separation < 0 && Integer.parseInt(tokens[1]) != 0)) {
speed = -1;
} else {
if (separation < 0) {
speed = 0;
direction = "";
} else {
numeric = tokens[1].substring(0, separation);
speed = Integer.parseInt(numeric.trim());
direction = tokens[1].substring(separation).trim();
}
if (tokens.length > 2) {
tokens[2] = tokens[2].trim();
separation = firstNonNumericPosition(tokens[2]);
if (separation <= 0) {
precipitation = -1;
} else {
numeric = tokens[2].substring(0, separation);
precipitation = Integer.parseInt(numeric.trim());
unit = tokens[2].substring(separation).trim();
}
} else {
precipitation = 0;
unit = "";
}
}
if (speed < 0 || precipitation < 0) {
System.out.println("Error in input: " + line);
} else {
readings[size] = new Reading(temperature, speed, direction,
precipitation, unit.equalsIgnoreCase("cm"));
size++;
}
line = input.readLine();
}
input.close();
} catch (NumberFormatException ex) {
System.out.println(ex.getMessage());
} catch (IOException ioe) {
System.out.println(ioe.getMessage());
} catch (ArrayIndexOutOfBoundsException ar){
System.out.println(ar.getMessage());
}
I tried using this logic but it gave an ArrayIndexOutOfBoundException of 3.
if(tokens.length > 3) {
tokens = line.split(":");
hours =Integer.parseInt(tokens[3].trim());
minutes =Integer.parseInt(tokens[4].trim());
}
How is it possible to split it if the fourth token exists?
These are just parts of my code, any further explanation on what the question means(in case I'm not clear enough) could be provided. Thanks in advance!
【回答】
将文本结构化为5列的二维表,其中源文件第4列不规范,需要按第2种分隔符(冒号)拆分为第4、第5列。该需求涉及有序运算和结构化运算,如无特殊要求可用SPL实现,代码简单易懂:
A |
|
1 |
=file("d:\\source.txt").import@c() |
2 |
=A1.new(#1:Temperature, #2:speed, #3:precipitation, (t=#4.array(":")).m(1):hours ,t.m(2):minutes) |
A1:读取source.txt文件内容,结果生成一个4列的序表。
A2:再将第4列按照冒号拆分成2列,最后生成一个字段名为Temperature,speed,precipitation,hours和minutes的二维表。
上述代码很容易集成到JAVA,参考Java 如何调用 SPL 脚本。