在Pandas中使用动态列表查询问题的回答

在Pandas中使用动态列表查询

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

<code>query</code>执行Python表达式的完整解析（有一些限制，例如，不能使用<code>lambda</code>表达式或三元<code>if</code>/<code>else</code>表达式）。这意味着您在查询字符串中引用的任何列都必须是有效的Python标识符（“变量名”的更正式的单词）。检查这一点的一种方法是使用隐藏在<code>tokenize</code>模块中的<code>Name</code>模式： <pre><code>In [156]: tokenize.Name Out[156]: '[a-zA-Z_]\\w*' In [157]: def isidentifier(x): .....: return re.match(tokenize.Name, x) is not None .....: In [158]: isidentifier('adsf') Out[158]: True In [159]: isidentifier('1adsf') Out[159]: False </code></pre> 现在，由于列名中有空格，每个用空格分隔的单词都将被计算为单独的标识符，因此您将得到类似于 ^{pr2}$ 这是无效的Python语法。尝试在Python解释器中输入<code>annual rate</code>，您将得到一个<code>SyntaxError</code>异常。在 回家消息：将列重命名为有效的变量名。除非列遵循某种结构，否则您将无法以编程方式（至少，很容易）执行此操作。在你的情况下，你可以 <pre><code>In [166]: cols Out[166]: ['annual rate', '1/2 annual rate', 'monthly rate'] In [167]: list(map(lambda x: '_'.join(x.split()).replace('1/2', 'half'), cols)) Out[167]: ['annual_rate', 'half_annual_rate', 'monthly_rate'] </code></pre> 然后，您可以像@acushner的示例那样格式化查询字符串 <pre><code>In [173]: newcols Out[173]: ['annual_rate', 'half_annual_rate', 'monthly_rate'] In [174]: ' or '.join('%s > 1' % c for c in newcols) Out[174]: 'annual_rate > 1 or half_annual_rate > 1 or monthly_rate > 1' </code></pre> <h3>注意：您实际上不需要来使用<code>query</code>这里：</h3> <pre><code>In [180]: df = DataFrame(randn(10, 3), columns=cols) In [181]: df Out[181]: annual rate 1/2 annual rate monthly rate 0 -0.6980 0.6322 2.5695 1 -0.1413 -0.3285 -0.9856 2 0.8189 0.7166 -1.4302 3 1.3300 -0.9596 -0.8934 4 -1.7545 -0.9635 2.8515 5 -1.1389 0.1055 0.5423 6 0.2788 -1.3973 -0.9073 7 -1.8570 1.3781 0.0501 8 -0.6842 -0.2012 -0.5083 9 -0.3270 -1.5280 0.2251 [10 rows x 3 columns] In [182]: df.gt(1).any(1) Out[182]: 0 True 1 False 2 False 3 True 4 True 5 False 6 False 7 True 8 False 9 False dtype: bool In [183]: df[df.gt(1).any(1)] Out[183]: annual rate 1/2 annual rate monthly rate 0 -0.6980 0.6322 2.5695 3 1.3300 -0.9596 -0.8934 4 -1.7545 -0.9635 2.8515 7 -1.8570 1.3781 0.0501 [4 rows x 3 columns] </code></pre> 正如@Jeff在评论中指出的那样，您可以引用非标识符列名，尽管方式很笨拙： <pre><code>pd.eval('df[df["annual rate"]>0]') </code></pre> 如果你想拯救小猫的生命，我不建议你写这样的代码。在

在Pandas中使用动态列表查询

1 个回答

相关Python问题