<p>我假设这就是扩展代码行后数据的样子:(而且,如果您可以添加一些空格…^^ ^)</p>
<pre class="lang-py prettyprint-override"><code>df = pd.DataFrame(
[
[1001, "27452.webp", "981.webp", "d92e.webp",
"{'is_doc1': False, 'is_doc2': True}",
"{'is_doc1': True, 'is_doc2': True}",
"{'detected': True, 'count': 1}"
],
[1002, "27452.webp", "981.webp", "d92e.webp",
"{'is_doc1': True, 'is_doc2': True}",
"{'is_doc1': False, 'is_doc2': True}",
"{'detected': True, 'count': 1}"
],
[1003, "27452.webp", "981.webp", "d92e.webp",
"{'is_doc1': True, 'is_doc2': True}",
"{'is_doc1': False, 'is_doc2': True}",
"{'detected': False, 'count': 1}"
],
],
columns=['user_uid', 'bool1', 'bool2', 'bool3', 'bool1_res', 'bool2_res',
'bool3_res'
]
)
</code></pre>
<h2>我的回答</h2>
<p>执行分为两部分:(1)解析字符串;(2)处理/生成“新”列值。在</p>
^{pr2}$
<h3>第1部分:解析dict字符串</h3>
<p>此函数通过<a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.applymap.html" rel="nofollow noreferrer">pd.DataFrame.applymap</a>应用于dataframe中的每个元素,并使用<code>ast.literal_eval</code>,正如@jezrael正确地建议的那样。在</p>
<pre class="lang-py prettyprint-override"><code>def str2dict(x: Any):
"""(Step 1) Parses argument using ast.literal_eval"""
try:
x = ast.literal_eval(x.strip())
# if x is not parsable, return x as-is
except ValueError as e:
pass
finally:
return x
</code></pre>
<h3>第2部分:处理数据(即制作“新”列)</h3>
<p>此函数应用于数据帧的每一行(由<a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.aggregate.html" rel="nofollow noreferrer">pd.DataFrame.agg</a>):</p>
<p>根据你发布函数中的逻辑,我:</p>
<ol>
<li><p>检查<code>bool3['detected']</code>是否为False(前两个条件都检测到==True);如果是,则引发ValueError</p></li>
<li><p>检查bool1的is_doc1是否为True,如果不是,则为bool2</p></li>
<li><p>如果两种情况都不是真的,则引发<code>ValueError</code></p></li>
</ol>
<pre class="lang-py prettyprint-override"><code>def make_newcol_entry(x: pd.Series):
"""(Step 2) constructs "new" column value for pandas group"""
try:
if x.bool3_res['detected'] is False:
raise ValueError
# check is_doc1 properties
elif x.bool1_res['is_doc1'] is True:
document1 = x.bool1
elif x.bool2_res['is_doc1'] is True:
document1 = x.bool2
else:
raise ValueError
except ValueError:
entry = "not valid"
pass
# if there is `is_doc1` that is True, construct your entry.
else:
entry = {
"task_id": "uid",
"group_id": "uid",
"data": {"document1": document1, "document2": x.bool3}
}
return entry
</code></pre>
<h3>要执行,请运行:</h3>
<pre class="lang-py prettyprint-override"><code>df = df.assign(new=lambda x: x.applymap(str2dict) \
.agg(make_newcol_entry, axis=1))
</code></pre>
<p>请注意,这将解析dataframe中的所有<em>元素。在</p>
<p>要只解析<em>列<code>bool_res</code>列,可以分两步执行:</p>
<pre class="lang-py prettyprint-override"><code># select and parse only res cols ('bool#_res'), then apply
df.update(df.filter(regex=r'_res$', axis=1).applymap(str2dict))
df = df.assign(lambda x: x.agg(apply_make_newcol_entry, axis=1))
</code></pre>
<h2>结果</h2>
<pre class="lang-py prettyprint-override"><code>$ df
user_uid bool1 bool2 bool3 bool1_res bool2_res bool3_res new
0 1001 27452.webp 981.webp d92e.webp {'is_doc1': False, 'is_doc2': True} {'is_doc1': True, 'is_doc2': True} {'detected': True, 'count': 1} {'task_id': 'uid', 'group_id': 'uid', 'data': {'document1': '981.webp', 'document2': 'd92e.webp'}}
1 1002 27452.webp 981.webp d92e.webp {'is_doc1': True, 'is_doc2': True} {'is_doc1': False, 'is_doc2': True} {'detected': True, 'count': 1} {'task_id': 'uid', 'group_id': 'uid', 'data': {'document1': '27452.webp', 'document2': 'd92e.webp'}}
2 1003 27452.webp 981.webp d92e.webp {'is_doc1': True, 'is_doc2': True} {'is_doc1': False, 'is_doc2': True} {'detected': False, 'count': 1} not valid
</code></pre>
<pre class="lang-py prettyprint-override"><code>$ df['new']
0 {'task_id': 'uid', 'group_id': 'uid', 'data': {'document1': '981.webp', 'document2': 'd92e.webp'}}
1 {'task_id': 'uid', 'group_id': 'uid', 'data': {'document1': '27452.webp', 'document2': 'd92e.webp'}}
2 not valid
Name: new, dtype: object
</code></pre>