分段分组
【问题】
My Data looks like this:
549 648.077 0.01
552 648.141 0.45
554 647.167 0.1
572 648.141 0.3
530 630.213 0.69
560 670.312 0.70
there are a few thousand lines in the file
the 1strow values range from 0-1100
the 2ndrow values range from 600-700
the 3rdrow values range from 0-1 I need to plot the data and therefore need to sort and modify the data:
I need to split the 3rd row values (normal range 0.0-1.0) into segments 0.0-0.20, 0.21-0.40, 0,41-0,60, 0.61-0.80,0.81-1.00
Next I need to split the segments from the 1strow (normal range (0-1100) into segments like 0-10, 11-20,21-30and so on up to 1100. What I want to do is find all 2nd row values within a region 0.0-0.20 and 0-10 , 0.0-0.20 and 11-20,0.0-0.20 and 21-30.
When found I want to add them all together and divide the value by the number of appearances to get the mean value: so I want for a region between 0.0-0.20 and 0-10 one value. I'm fairly new to python and I think that this is some kind of approach:
import os
import csv
dataList = []
with open("table.csv") as csv_file:
data_reader = csv.reader(csv_file, dialect='excel-tab')
for rows in data_reader:
if float(rows[2]) <= 0.20:
if float(rows[0]) <= 10:
print(rows)
if 10 <float(rows[0])<=20:
print (rows)
That should work (without the print of course) to get the values, repeated than for if 20<float(rows[2])<= 0.40: ..... That should bring me the values I want but is there an easy way to set a range going from 0-1100 in 10 units step?
P.S.: I am aware that I gave lots of Info for a relative short question and that's because I don't really know if python is the best way to do this and if my approach is reasonable? Maybe I should go with panda but that I just installed. So in case anyone knows an easier (maybe not coding related) way to solve a problem like this I'd really appreciate it.
【回答】
这其实是简单的分组汇总,只是不能直接按3rd和1st分组,而是要按段(固定区间)分组,比如3rd的区间可以这样设定:3rd 乘100除20,按整数部分分组。这样0.1和0.15都会落入0-0.2这个组。
Python对结构化数据的运算支持有限,相应的代码会很复杂,如无特殊要求,可用SPL实现,如下:
A |
|
1 |
=file("d:\\souce.txt").import() |
2 |
=A1.group(#3*100\20:3rd,#1\10:1st;~.avg(#2):avg) |
3 |
=A2.run(3rd=3rd*20/100,1st=1st*10) |
A1:读取文本文件source.txt中的内容。
A2:对第3列和第1列进行计算后分组汇总,计算后3rd会显示组号0、1、2等。
A3:将数据转为实际区间0、0.2、0.4。