在一个记录中连接多个值而不重复

2024-09-30 08:26:57 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个如下所示的dbf表,它是两个表的一对多连接的结果。我想从一个Taxlot id字段获得唯一的区域值。在

表名:输入表
tid-----区域
1------A
1------A
1------B
1------C
2------D
2------E
3------C

期望输出表 表名:输入表
tid-----区域
1------A、B、C
2------D,E
3------C

我得到了一些帮助,但没能成功。在

inputTbl = r"C:\temp\input.dbf"
taxIdZoningDict = {}
searchRows = gp.searchcursor(inputTbl)
searchRow = searchRows.next()
while searchRow:
   if searchRow.TID in taxIdZoningDict:
      taxIdZoningDict[searchRow.TID].add(searchRow.ZONE)
   else:
      taxIdZoningDict[searchRow.TID] = set() #a set prevents dulpicates!
      taxIdZoningDict[searchRow.TID].add(searchRow.ZONE)
   searchRow = searchRows.next()

outputTbl = r"C:\temp\output.dbf"
gp.CreateTable_management(r"C:\temp", "output.dbf")
gp.AddField_management(outputTbl, "TID", "LONG")
gp.AddField_management(outputTbl, "ZONES", "TEXT", "", "", "20")
tidList = taxIdZoningDict.keys()
tidList.sort() #sorts in ascending order
insertRows = gp.insertcursor(outputTbl)
for tid in tidList:
   concatString = ""
   for zone in taxIdZoningDict[tid]
      concatString = concatString + zone + ","
   insertRow = insertRows.newrow()
   insertRow.TID = tid
   insertRow.ZONES = concatString[:-1]
   insertRows.insertrow(insertRow)
del insertRow
del insertRows

Tags: in区域managementtempgptiddbfinsertrow
3条回答

我认为Morlock的答案不满足删除重复项的要求。我将使用defaultdict(set),它将自动省略dup,而不是defaultdict(list),因此使用.add()代替.append()。在

这对我同时使用microsoftaccessvba和microsoftexcelvba都有效。这不是很有效的代码,但它是有效的。我可以在Access和Excel中打开结果文件。在

设置sDBF*sOutDBF*变量,使其适合您自己的自定义路径。在

Sub VBASolution()
    Dim oRS
    Dim sConn
    Dim sDBFPath, sOutDBFPath
    Dim sDBFName, sOutDBFName
    Dim oDict
    Dim curTID, curZone, sZones
    Dim oConn
    Dim oFS
    Dim sTableName

    sDBFPath = "C:\Path\To\DBFs\"
    sDBFName = "Input.dbf"

    sOutDBFPath = "C:\Path\To\DBFs\"
    sOutDBFName = "RESULTS.dbf"

    sConn = "Driver={Microsoft dBASE Driver (*.dbf)}; DriverID=277; Dbq="
    Set oRS = CreateObject("ADODB.Recordset")


    oRS.Open "SELECT DISTINCT tid, zone FROM " & sDBFName, sConn & sDBFPath

    Set oDict = CreateObject("Scripting.Dictionary")

    Do While Not oRS.EOF
        curTID = oRS.Fields("tid").Value
        curZone = oRS.Fields("zone").Value
        If Not oDict.Exists(curTID) Then
            oDict.Add curTID, CreateObject("Scripting.Dictionary")
        End If
        If Not oDict(curTID).Exists(curZone) Then
            oDict(curTID).Add curZone, curZone
        End If
        oRS.MoveNext
    Loop
    oRS.Close

    Set oRS = Nothing

    Set oConn = CreateObject("ADODB.Connection")
    oConn.Open sConn & sOutDBFPath

    'Delete the resultant DBF file if it already exists.
    Set oFS = CreateObject("Scripting.FileSystemObject")
    With oFS
        If .FileExists(sOutDBFPath & "\" & sOutDBFName) Then
            .DeleteFile sOutDBFPath & "\" & sOutDBFName
        End If
    End With

    sTableName = oFS.GetBaseName(sOutDBFName)

    oConn.Execute "CREATE TABLE " & sTableName & " (tid int, zone varchar(80))"

    Dim i, j
    For Each i In oDict.Keys
        curTID = i
        sZones = ""
        For Each j In oDict(i)
            sZones = sZones & "," & j
        Next
        sZones = Mid(sZones, 2)
        oConn.Execute "INSERT INTO " & sTableName & " (tid, zone) VALUES ('" & curTID & "','" & sZones & "')"
    Next
    oConn.Close

    Set oConn = Nothing
    Set oDict = Nothing
    Set oFS = Nothing
End Sub

编辑:为了它的价值,我还把它插入到Windows XP中的VBScript.VBS文件(文本)中,并在文件底部添加以下行:

Call VBASolution()

我不知道它是否需要安装Office,或者Windows是否附带了相应的dbf驱动程序。在

我将使用my dbf moduledefaultdict来大大简化代码:

import dbf
from collections import defaltdict

inputTbl = dbf.Table(r'c:\temp\input.dbf')
taxIdZoning = defaultdict(set)

for record in inputTbl:
    taxIdZoning[record.tid].add(record.zone)
inputTbl.close()

outputTbl = dbf.Table(r'c:\temp\output.dbf', 'tid N(17.0), zones C(20)')
for tid in sorted(taxIdZoning):
    record = outputTbl.append()
    record.tid = tid
    record.zones = ','.join(sorted(taxIdZoning[tid]))
outputTbl.close()

注意:字段名是小写的,我不确定如何确切地表示LONG,但希望17位数字就足够了。:)对于任何错误我深表歉意——没有输入文件很难测试。在

相关问题 更多 >

    热门问题