"【问题】 I have data in a csv file which looks like this: fromaddress, toaddress, timestamp sender1@ .."

swan
离开的是风景，留下的是人生
998 浏览 • 5 年前

文本读入拆分

桌面处理

csv(17) 结构化(8)

【问题】

I have data in a csv file which looks like this:

fromaddress, toaddress, timestamp
sender1@email.com, recipient1@email.com, recipient2@email.com, 8-1-2015
sender2@email.com, recipient1@email.com, 8-2-2015
sender3@email.com, recipient1@email.com, recipient2@email.com, recipient3@email.com, recipient4@email.com, 8-3-2015
sender1@email.com, recipient1@email.com, recipient2@email.com, recipient3@email.com, 8-4-2015

Using Python, I would like to produce a txt file that looks like:

sender1_email.com, recipient1_email.com
sender1_email.com, recipient2_email.com
sender2_email.com, recipient1_email.com
sender3_email.com, recipient1_email.com
sender3_email.com, recipient2_email.com
sender3_email.com, recipient3_email.com
sender3_email.com, recipient4_email.com
sender1_email.com, recipient1_email.com
sender1_email.com, recipient2_email.com
sender1_email.com, recipient3_email.com

Ultimately, I imagine this whole process will take several steps. After reading in the csv file, I will need to create separate lists for fromaddress and toaddress (I am ignoring the timestamp column altogether). There is only 1 email address per row in the fromaddress column, however there are any number of email addresses per row in the toaddress column. I need to duplicate the fromaddress email address for each toaddress email address listed for each row. Once this done I need to replace all of the @ symbols with underscore (_) symbols. Finally, when I write the txt file, I need to add an extra space between each row so that it is "double-spaced"

I have not gotten very far as I'm a Python newbie and I'm stuck on the first step. The following code is duplicating the fromaddress for each individual character in the toaddress column instead of each individual email address. I also need help with the toaddress list as well. Can anyone help?

import csv
fromaddress = []
toaddress = []

with open("filename.csv", 'r') as f:
    c = csv.reader(f, delimiter = ",")
    for row in c:
        for item in row[1]:
            fromaddress.append(row[0]);

print(fromaddress)

Everyone, thanks for all of your help! I tried all your code but unfortunately I'm not getting the output I need. Instead of getting this (what I want):

sender1_email.com, recipient1_email.com
sender1_email.com, recipient2_email.com
sender1_email.com, recipient3_email.com
sender2_email.com, recipient1_email.com
sender3_email.com, recipient1_email.com
sender3_email.com, recipient2_email.com

I'm getting this:

sender1_email.com,"recipient1_email.com, recipient2_email.com, recipient3_email.com"
sender2_email.com,"recipient1_email.com"
sender3_email.com,"recipient1_email.com, recipient2_email.com"

There is only 1 element in each "fromaddress" row, but there are multiple elements in each "toaddress" row. Basically, I have to pair each recipient address with the correct sender address. I think I'm not getting the right output because of the (") double quotation marks in the csv file to surround all of the sender addresses in each row.

【回答】

取第2到第N行的数据，将每行第1个成员作为第1列，将第2到倒数第2个成员转为第2列，拼为多行二维表，把字符串中的"@"替换成"_"。

这里集合运算较多，用python实现有些麻烦，而使用SPL更简单：

	A
1	=file("d:\\input.csv").read@n().(replace(~,"@","_"))
2	=A1.to(2,).(~.array())
3	=A2.news(~.to(2,~.len()-1);A2.~(1),~)
4	=file("d:\\result.txt").export@c(A3)

A1：读取 csv文件的内容，将每一行拼成字符串作为一个序列成员，并将字符串中的"@"替换成"_"。

A2：从序列A1中取第2个到最后一个成员组成新的序列，再把每一个序列中的成员拆分成序列，最后返回序列的序列。通过这一步获取并处理数据部分。

undefined

A3：获取A2每个序列成员中第2个到倒数第二个成员，并拆分成两列，最后返回成新序表。

undefined

A4：将A3的结果导出到以逗号分隔的文本文件。

csv(17) 结构化(8)

文本读入拆分

【问题】

【回答】

目录