从特殊字符列表创建词典问题的回答

从特殊字符列表创建词典

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

我正在编写这个小脚本：基本上它将list元素（其中包含特殊字符）映射到它的索引以创建字典。 <pre><code>#!/usr/bin/env python #-*- coding: latin-1 -*- ln1 = '?0>9<8~7|65"4:3}2{1+_)' ln2 = "(*&^%$£@!/`'\][=-#¢" refStr = ln2+ln1 keyDict = {} for i in range(0,len(refStr)): keyDict[refStr[i]] = i print "-" * 32 print "Originl: ",refStr print "KeyDict: ", keyDict # added just to test a few special characters tsChr = ['£','%','\\','¢'] for k in tsChr: if k in keyDict: print k, "\t", keyDict[k] else: print k, "\t", "not in the dic." </code></pre> 它返回如下结果： <pre><code>Originl: (*&^%$£@!/`'\][=-#¢?0>9<8~7|65"4:3}2{1+_) KeyDict: {'!': 9, '\xa3': 7, '\xa2': 20, '%': 4, '$': 5, "'": 12, '&': 2, ')': 42, '(': 0, '+': 40, '*': 1, '-': 17, '/': 10, '1': 39, '0': 22, '3': 35, '2': 37, '5': 31, '4': 33, '7': 28, '6': 30, '9': 24, '8': 26, ':': 34, '=': 16, '<': 25, '?': 21, '>': 23, '@': 8, '\xc2': 19, '#': 18, '"': 32, '[': 15, ']': 14, '\\': 13, '_': 41, '^': 3, '`': 11, '{': 38, '}': 36, '|': 29, '~': 27} </code></pre> 这一切都很好，除了字符<code>£</code>，<code>%</code>和<code>\</code>分别转换为<code>\xa3</code>，<code>\xa2</code>和<code>\\</code>。有人知道为什么打印<code>ln1</code>/<code>ln2</code>很好，但字典却不行。我该怎么解决？非常感谢您的帮助。干杯！！ <hr/>更新1 我添加了额外的特殊字符-<code>#</code>和<code>¢</code>，然后我得到了@Duncan的建议： <pre><code>! 9 ? 7 ? 20 % 4 $ 5 .... .... 8 26 : 34 = 16 < 25 ? 21 > 23 @ 8 ? 19 .... .... </code></pre> 请注意，第7、19和20个元素，它们根本没有正确打印。第21个元素是实际的<code>?</code>字符。干杯！！ <hr/>更新2 只是把这个循环添加到我最初的帖子中，来测试我的目的： <pre><code>tsChr = ['£','%','\\','¢'] for k in tsChr: if k in keyDict: print k, "\t", keyDict[k] else: print k, "\t", "not in the dic." </code></pre> 我得到的结果是： <pre><code>£ not in the dic. % 4 \ 13 ¢ not in the dic. </code></pre> 运行脚本时，它认为<code>£</code>和<code>¢</code>实际上不在字典中，这是我的问题。有人知道如何解决这个问题，或者我做错了什么/我做错了什么？ 最后，我将检查字典中某个文件（或一行文本）中的字符，看它是否存在，以及是否有可能在文本中包含<code>é</code>或<code>£</code>等字符。干杯！！

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

在我看来，学习一般的unicode和<a href="http://docs.python.org/howto/unicode.html" rel="nofollow">it's use in python</a> 如果你不想知道人们为什么要把事情搞砸，所以你必须处理一个“\xa3”，而不是一个简单的<code>£</code>，那么邓肯的回答是完美的，告诉你你想知道的一切。 <h2>更新（请参阅更新2）</h2> 请断言您的文件是用拉丁语-1编码保存的，并且现在是非utf-8，您的测试将通过（或者将<code>#-*- coding: latin-1 -*-</code>更改为<code>#-*- coding: utf-8 -*-</code>） 从我上面的链接中，你可以很容易地理解阅读（和理解）内容： 您的文件被保存为utf-8，这意味着对于char<code>£</code>使用2个字节，但是由于您告诉python解释器编码是拉丁语-1，因此他将使用一个键的2个utf-8字节中的每个字节。 实际上，我可以在<code>ln2</code>中计算19个字符，但是如果您发出<code>len(ln2)</code>，它将返回21个字符。 当您测试<code>'£' in keyDict.keys()</code>时，您正在寻找一个2个字符的字符串，而每个2个字符在字典中都有自己的键，这就是为什么它找不到它。 您还可以测试<code>len(keyDict)</code>，发现它比您预期的要长。 我想这说明了一切，请理解不是所有的故事都很容易在一个网页上解释，但上面的链接，在我看来是一个很好的起点，混合了一些故事和一些编码示例。 干杯 备注：我正在使用这段代码，并将其保存为UTF-8，它工作得非常完美： <pre><code>#!/usr/bin/env python #-*- coding: utf-8 -*- ln1 = u'?0>9<8~7|65"4:3}2{1+_)' ln2 = u"(*&^%$£@!/`'\][=-#¢" refStr = u"%s%s" % (ln2, ln1) keyDict = {} for idx, chr_ in enumerate(refStr): print chr_, keyDict[chr_] = idx print u"-" * 32 print u"Originl: ", refStr print u"KeyDict: ", keyDict tsChr = [u'£', u'%', u'\\', u'¢'] for k in tsChr: if k in keyDict.keys(): print k, "\t", keyDict[k] else: print k, repr(k), "\t", "not in the dic." </code></pre>

从特殊字符列表创建词典

1 个回答

相关Python问题