"【问题】 i've got a questions regarding on how to process a delimited file with a large number of co .."

swan
离开的是风景，留下的是人生
703 浏览 • 7 年前

取出指定列处理

应用计算

【问题】

i've got a questions regarding on how to process a delimited file with a large number of columns (>3000). I tried to extract the fields with the standard delimited file input component, but creating the schema takes hours and when i run the job i get an error, because the toString() method exceeds the 65535 bytes limit. At that point i can run the job but all the columns are messed up and i cant realy work with them anymore.

Is it possible to split that .csv-file with talend? Is there any other handling possible, maybe with some sort of java code? If you have any further questions dont hesitate to comment.

Cheers!

【回答】

你的要求是1：处理大csv文件。2：简化3000个字段的访问（schema ）。3：用java访问。

这种情况下可以用SPL来实现，SPL中可用游标访问大文件，具有丰富的结构化计算函数，且容易被JAVA集成。比如只读出大文件的某几列，代码如下：

	A
1	=file("d:\\data.csv").cursor@tc(field,fieldYouNeed)

上述代码很容易和JAVA集成（可参考Java 如何调用 SPL 脚本）。

csv(39) 大数据(7) 部分列(2)

取出指定列处理

【问题】

【回答】

目录