文本读入拆分
【问题】
I have data in a csv file which looks like this:
fromaddress, toaddress, timestamp
sender1@email.com, recipient1@email.com, recipient2@email.com, 8-1-2015
sender2@email.com, recipient1@email.com, 8-2-2015
sender3@email.com, recipient1@email.com, recipient2@email.com, recipient3@email.com, recipient4@email.com, 8-3-2015
sender1@email.com, recipient1@email.com, recipient2@email.com, recipient3@email.com, 8-4-2015
Using Python, I would like to produce a txt file that looks like:
sender1_email.com, recipient1_email.com
sender1_email.com, recipient2_email.com
sender2_email.com, recipient1_email.com
sender3_email.com, recipient1_email.com
sender3_email.com, recipient2_email.com
sender3_email.com, recipient3_email.com
sender3_email.com, recipient4_email.com
sender1_email.com, recipient1_email.com
sender1_email.com, recipient2_email.com
sender1_email.com, recipient3_email.com
Ultimately, I imagine this whole process will take several steps. After reading in the csv file, I will need to create separate lists for fromaddress and toaddress (I am ignoring the timestamp column altogether). There is only 1 email address per row in the fromaddress column, however there are any number of email addresses per row in the toaddress column. I need to duplicate the fromaddress email address for each toaddress email address listed for each row. Once this done I need to replace all of the @ symbols with underscore (_) symbols. Finally, when I write the txt file, I need to add an extra space between each row so that it is "double-spaced"
I have not gotten very far as I'm a Python newbie and I'm stuck on the first step. The following code is duplicating the fromaddress for each individual character in the toaddress column instead of each individual email address. I also need help with the toaddress list as well. Can anyone help?
import csv
fromaddress = []
toaddress = []
with open("filename.csv", 'r') as f:
c = csv.reader(f, delimiter = ",")
for row in c:
for item in row[1]:
fromaddress.append(row[0]);
print(fromaddress)
Everyone, thanks for all of your help! I tried all your code but unfortunately I'm not getting the output I need. Instead of getting this (what I want):
sender1_email.com, recipient1_email.com
sender1_email.com, recipient2_email.com
sender1_email.com, recipient3_email.com
sender2_email.com, recipient1_email.com
sender3_email.com, recipient1_email.com
sender3_email.com, recipient2_email.com
I'm getting this:
sender1_email.com,"recipient1_email.com, recipient2_email.com, recipient3_email.com"
sender2_email.com,"recipient1_email.com"
sender3_email.com,"recipient1_email.com, recipient2_email.com"
There is only 1 element in each "fromaddress" row, but there are multiple elements in each "toaddress" row. Basically, I have to pair each recipient address with the correct sender address. I think I'm not getting the right output because of the (") double quotation marks in the csv file to surround all of the sender addresses in each row.
【回答】
取第2到第N行的数据,将每行第1个成员作为第1列,将第2到倒数第2个成员转为第2列,拼为多行二维表,把字符串中的"@"替换成"_"。
这里集合运算较多,用python实现有些麻烦,而使用SPL更简单:
A |
|
1 |
=file("d:\\input.csv").read@n().(replace(~,"@","_")) |
2 |
=A1.to(2,).(~.array()) |
3 |
=A2.news(~.to(2,~.len()-1);A2.~(1),~) |
4 |
=file("d:\\result.txt").export@c(A3) |
A1:读取 csv文件的内容,将每一行拼成字符串作为一个序列成员,并将字符串中的"@"替换成"_"。
A2:从序列A1中取第2个到最后一个成员组成新的序列,再把每一个序列中的成员拆分成序列,最后返回序列的序列。通过这一步获取并处理数据部分。
A3:获取A2每个序列成员中第2个到倒数第二个成员,并拆分成两列,最后返回成新序表。
A4:将A3的结果导出到以逗号分隔的文本文件。