分段分组

【问题】

My Data looks like this:

549  648.077  0.01

552  648.141  0.45

554  647.167  0.1

572  648.141  0.3

530  630.213  0.69

560  670.312  0.70

 

 

there are a few thousand lines in the file

the 1strow values range from 0-1100

the 2ndrow values range from 600-700

the 3rdrow values range from 0-1 I need to plot the data and therefore need to sort and modify the data:

 

I need to split the 3rd row values (normal range 0.0-1.0) into segments 0.0-0.20, 0.21-0.40, 0,41-0,60, 0.61-0.80,0.81-1.00

 

Next I need to split the segments from the 1strow (normal range (0-1100) into segments like 0-10, 11-20,21-30and so on up to 1100. What I want to do is find all 2nd row values within a region 0.0-0.20 and 0-10 , 0.0-0.20 and 11-20,0.0-0.20 and 21-30.

 

When found I want to add them all together and divide the value by the number of appearances to get the mean value: so I want for a region between 0.0-0.20 and 0-10 one value. I'm fairly new to python and I think that this is some kind of approach:

import os                                                     

import csv                                                    

dataList = []                                                  

with open("table.csv") as csv_file:                        

data_reader = csv.reader(csv_file, dialect='excel-tab')       

for rows in data_reader:                                      

    if float(rows[2]) <= 0.20:                                 

        if float(rows[0]) <= 10:                              

           print(rows)                                      

        if 10 <float(rows[0])<=20:                           

            print (rows)

 

 

That should work (without the print of course) to get the values, repeated than for if 20<float(rows[2])<= 0.40: ..... That should bring me the values I want but is there an easy way to set a range going from 0-1100 in 10 units step?

 

P.S.: I am aware that I gave lots of Info for a relative short question and that's because I don't really know if python is the best way to do this and if my approach is reasonable? Maybe I should go with panda but that I just installed. So in case anyone knows an easier (maybe not coding related) way to solve a problem like this I'd really appreciate it.

【回答】

这其实是简单的分组汇总,只是不能直接按3rd1st分组,而是要按段(固定区间)分组,比如3rd的区间可以这样设定:3rd 10020,按整数部分分组。这样0.10.15都会落入0-0.2这个组。

Python对结构化数据的运算支持有限,相应的代码会很复杂,如无特殊要求,可用SPL实现,如下:


A

1

=file("d:\\souce.txt").import()

2

=A1.group(#3*100\20:3rd,#1\10:1st;~.avg(#2):avg)

3

=A2.run(3rd=3rd*20/100,1st=1st*10)

 

A1:读取文本文件source.txt中的内容。

 undefined

A2:对第3列和第1列进行计算后分组汇总,计算后3rd会显示组号012等。

undefined

A3:将数据转为实际区间00.20.4

undefined