下面,我有一个表,其中TST1
到TST5
列不能取任何值或以下值之一:NOT_DONE
{
我需要计算下表中验证的元素(行)数量
当最右边的值介于30和50之间时(由5、so 30、35、40…分隔),则认为元素已验证。这意味着,如果该行对所有TST1
到TST5
都没有值,则不计算任何值。如果在NOT_DONE
{UNTESTED
的左侧找到数值,则不会验证该数值
换句话说,我需要从右向左数一行
例如,从下表中,只有6个元素被视为已验证
最后,我需要计算其中有多少属于A组或B组
我解决这个问题的最初想法是创建一个包含所有已验证元素的新列,但我真的不知道如何做到这一点
我正在使用python 2.7和pandas 0.24.2。我是新手,非常感谢您的帮助和指导
+-------+----------+----------+----------+--------+----------+
| Group | TST1 | TST2 | TST3 | TST4 | TST5 |
+-------+----------+----------+----------+--------+----------+
| A | | NOT_DONE | | | 50 |
+-------+----------+----------+----------+--------+----------+
| A | | | 35 | | |
+-------+----------+----------+----------+--------+----------+
| B | | | | | |
+-------+----------+----------+----------+--------+----------+
| A | | | INCOMP | | |
+-------+----------+----------+----------+--------+----------+
| B | UNTESTED | | 50 | INCOMP | |
+-------+----------+----------+----------+--------+----------+
| B | | | | | |
+-------+----------+----------+----------+--------+----------+
| B | | 30 | | | |
+-------+----------+----------+----------+--------+----------+
| A | | INCOMP | 40 | | |
+-------+----------+----------+----------+--------+----------+
| B | | | | | UNTESTED |
+-------+----------+----------+----------+--------+----------+
| A | | | | | |
+-------+----------+----------+----------+--------+----------+
| B | | INCOMP | | | |
+-------+----------+----------+----------+--------+----------+
| A | | | | | |
+-------+----------+----------+----------+--------+----------+
| B | | 50 | | | |
+-------+----------+----------+----------+--------+----------+
| B | | | UNTESTED | 35 | NOT_DONE |
+-------+----------+----------+----------+--------+----------+
| B | | | | | |
+-------+----------+----------+----------+--------+----------+
| A | | 40 | | INCOMP | |
+-------+----------+----------+----------+--------+----------+
| A | | | | 30 | |
+-------+----------+----------+----------+--------+----------+
| B | | | | | |
+-------+----------+----------+----------+--------+----------+
| B | | NOT_DONE | | 30 | NOT_DONE |
+-------+----------+----------+----------+--------+----------+
编辑: 这是我尝试过的,但它统计所有表示数值的行,而不是最右边的值为数值的行。我真的不知道如何选择从正确的开始
filter1 = df.loc[:, 'TST1':'TST5']\
.apply(lambda x: x.astype(str).str.match(r'\d+\.*\d*'), axis=0)\
.any(axis=1)
number_validated = filter1.sum()
print "Number of validated items: ", number_validated
预期输出应该只是一个简短的文本摘要:
Number of validated items: 5
Number of group A validated items: 4
Number of group B validated items: 2
另一个选项,在python 2.7.18和pandas 0.24.2上测试(尽管在python 3中工作良好):
使用^{} 提取最右边的值,并使用^{} 强制将其转换为数字:
然后^{} 检查} 30和50(包括):
Group
是否为^{相关问题 更多 >
编程相关推荐