"【问题】 My Data looks like this: 549 648.077 0.01 552 648.141 0.45 554 .."

swan
离开的是风景，留下的是人生
1,099 浏览 • 5 年前

分段分组

桌面处理

text(12)

【问题】

My Data looks like this:

549 648.077 0.01

552 648.141 0.45

554 647.167 0.1

572 648.141 0.3

530 630.213 0.69

560 670.312 0.70

there are a few thousand lines in the file

the 1strow values range from 0-1100

the 2ndrow values range from 600-700

the 3rdrow values range from 0-1 I need to plot the data and therefore need to sort and modify the data:

I need to split the 3rd row values (normal range 0.0-1.0) into segments 0.0-0.20, 0.21-0.40, 0,41-0,60, 0.61-0.80,0.81-1.00

Next I need to split the segments from the 1strow (normal range (0-1100) into segments like 0-10, 11-20,21-30and so on up to 1100. What I want to do is find all 2^nd row values within a region 0.0-0.20 and 0-10 , 0.0-0.20 and 11-20,0.0-0.20 and 21-30.

When found I want to add them all together and divide the value by the number of appearances to get the mean value: so I want for a region between 0.0-0.20 and 0-10 one value. I'm fairly new to python and I think that this is some kind of approach:

import os

import csv

dataList = []

with open("table.csv") as csv_file:

data_reader = csv.reader(csv_file, dialect='excel-tab')

for rows in data_reader:

if float(rows[2]) <= 0.20:

if float(rows[0]) <= 10:

print(rows)

if 10 <float(rows[0])<=20:

print (rows)

That should work (without the print of course) to get the values, repeated than for if 20<float(rows[2])<= 0.40: ..... That should bring me the values I want but is there an easy way to set a range going from 0-1100 in 10 units step?

P.S.: I am aware that I gave lots of Info for a relative short question and that's because I don't really know if python is the best way to do this and if my approach is reasonable? Maybe I should go with panda but that I just installed. So in case anyone knows an easier (maybe not coding related) way to solve a problem like this I'd really appreciate it.

【回答】

这其实是简单的分组汇总，只是不能直接按3rd和1st分组，而是要按段（固定区间）分组，比如3rd的区间可以这样设定：3^rd乘100除20，按整数部分分组。这样0.1和0.15都会落入0-0.2这个组。

Python对结构化数据的运算支持有限，相应的代码会很复杂，如无特殊要求，可用SPL实现，如下：

	A
1	=file("d:\\souce.txt").import()
2	=A1.group(#3*100\20:3rd,#1\10:1st;~.avg(#2):avg)
3	=A2.run(3rd=3rd20/100,1st=1st10)

A1：读取文本文件source.txt中的内容。

undefined

A2：对第3列和第1列进行计算后分组汇总，计算后3rd会显示组号0、1、2等。

undefined

A3：将数据转为实际区间0、0.2、0.4。

undefined

text(12)

分段分组

【问题】

【回答】

目录