好的,我有两张xlsx表,两张表的第二列,索引1,都有一个sim卡号列表。在使用xlrd提取数据之后,我已经成功地将这两列的内容作为两个列表打印到我的powershell终端中,以及这些列表中元素的数量
第一张(他们的工作表)有454个条目,第二张(我们的工作表)有361个条目。我需要找到第二张工作表中不存在的93个条目并将它们放入(unpaidSims)。当然,我可以手动完成这项任务,但是我希望将来在不可避免地需要再次完成这项任务时自动完成这项任务,所以我正在尝试编写这个python脚本
考虑到python同意我有454个条目的列表和361个条目的列表,我认为我只需要找出一个列表比较,我研究了堆栈溢出,用3种不同的解决方案尝试了3次,但是每次,当我使用该脚本生成第三个列表(unpaidSims)时,上面写着454…意味着它没有删除小列表中重复的条目。请告知
from os.path import join, dirname, abspath
import xlrd
theirBookFileName = join(dirname(dirname(abspath(__file__))), 'pycel', 'theirBook.xlsx')
ourBookFileName = join(dirname(dirname(abspath(__file__))), 'pycel', 'ourBook.xlsx')
theirBook = xlrd.open_workbook(theirBookFileName)
ourBook = xlrd.open_workbook(ourBookFileName)
theirSheet = theirBook.sheet_by_index(0)
ourSheet = ourBook.sheet_by_index(0)
theirSimColumn = theirSheet.col(1)
ourSimColumn = ourSheet.col(1)
numColsTheirSheet = theirSheet.ncols
numRowsTheirSheet = theirSheet.nrows
numColsOurSheet = ourSheet.ncols
numRowsOurSheet = ourSheet.nrows
# First Attempt at the comparison, but fails and returns 454 entries from the bigger list
unpaidSims = [d for d in theirSimColumn if d not in ourSimColumn]
print unpaidSims
lengthOfUnpaidSims = len(unpaidSims)
print lengthOfUnpaidSims
print "\nWe are expecting 93 entries in this new list"
# Second Attempt at the comparison, but fails and returns 454 entries from the bigger list
s = set(ourSimColumn)
unpaidSims = [x for x in theirSimColumn if x not in s]
print unpaidSims
lengthOfUnpaidSims = len(unpaidSims)
print lengthOfUnpaidSims
# Third Attempt at the comparison, but fails and returns 454 entries from the bigger list
unpaidSims = tuple(set(theirSimColumn) - set(ourSimColumn))
print unpaidSims
lengthOfUnpaidSims = len(unpaidSims)
print lengthOfUnpaidSims
根据xlrd Documentation,
col
方法返回“给定列中的Cell
对象序列”它没有提到
Cell
对象的比较。深入研究the source,他们似乎没有在类中编写任何比较方法。因此,Python documentation表示将通过“对象标识”来比较对象。换句话说,比较将是False
,除非它们是Cell
类的完全相同的实例,即使它们包含的值是相同的您需要比较
value
和Cell
的value
。例如:相关问题 更多 >
编程相关推荐