Excel 工作表做 JOIN
【问题】
I’m working on combining two excel worksheets. Before I start, I’d like to mention that I also have mysql workbench, so I’m open to working on this issue in either sql or vba (I should learn both). I’m working with .bed files, which are lists of genomic coordinates. In short, the data is indexed by chromosome number (ie:chr2) and then has a numerical start and stop location on the chromosome. These numerical locations can span a large range (ie:100-10,000) or be a single position (ie: 999-1000). I have a list of coordinates that cover a large range, and in a separate file I have a list of single positions.
Example of a file with ranges:
chromosome start stop
chr1 45616321
chr3 984211253
Example of file with single positions:
chromosome start stop
chr1 52135214
chr3 1025410255
I would like to combine these worksheets such that if a location in my list of single positions is found within the range in my list of ranges, the locations for both are listed in the same row. The lists are 1000s of locations long, so I’d also like this program to loop through every row. Using the example data listed above, I’d like my output to look like the following:
Example of desired output:
chromosome start stop chromosome start stop
chr1 45616321 chr1 52135214
chr3 984211253 chr3 1025410255
There is a high probability that multiple single positions will fall within a single range, and I would like these to be listed as separate rows.
有人给出 SQL: 正确,只是要出库入库,稍显麻烦。
select a.chromosome, a.start, a stop, b.chromosome, b.start, b.stop from ranges_table a, positions_table b where b.start >= a.start and b.stop <= a.stop
有人给出 VBA:正确,SQL+ 出库入库代码。
SubSqlJoin()
Dim oConnAsNew ADODB.Connection
Dim oRSAsNew ADODB.Recordset
Dim sPath
Dim sSQLAsString, wbAsWorkbook
Set wb =ThisWorkbook
sSQL ="select a.chromosome, a.start, a stop," & _
"b.chromosome, b.start, b.stop" & _
"from <ranges_table> a, <positions_table> b" & _
"where b.start >= a.start and b.stop <= a.stop"
sSQL =Replace(sSQL,“<ranges_table>”, _
Rangename(wb.Worksheets(“Ranges”).Range(“A1”).CurrentRegion))
sSQL =Replace(sSQL,“<positions_table>”, _
Rangename(wb.Worksheets(“Positions”).Range(“A1”).CurrentRegion))
If wb.Path <>""Then
sPath = wb.FullName
Else
MsgBox"The workbook must be saved first!"
ExitSub
EndIf
oConn.Open"Provider=Microsoft.ACE.OLEDB.12.0;Data Source=‘" & sPath &"’;" & _
“Extended Properties=‘Excel 12.0;HDR=Yes;IMEX=1’;”
oRS.Open sSQL, oConn
IfNot oRS.EOFThen
wb.Worksheets(“Results”).Range(“A2”).CopyFromRecordset oRS
Else
MsgBox"No records found"
EndIf
oRS.Close
oConn.Close
EndSub
FunctionRangename(rAsRange)AsString
Rangename =“[” & r.Parent.Name &“$” & _
r.Address(False,False) &“]”
EndFunction
【回答】
用 VBA 太麻烦了,用 SPL 几句就算出来了
A | ||
---|---|---|
1 | =file(“D:\\range.xlsx”).xlsimport@t() | |
2 | =file(“D:\\position.xlsx”).xlsimport@t() | |
3 | =xjoin(A1;A2,start>A1.start && stop<A1.stop) | |
4 | =file(“D:\\result.xlsx”).xlsexport(A3,#1.chromosome,#1.start,#1.stop,#1.chromosome,#2.start,#2.stop) | |
运行结果:
A1:读取 excel 文件返回序表
A2:读取 excel 文件返回序表
A3:将两个序列进行叉乘
A4:将序表输出到新 excel 文件中