x、 findall函数返回一个值，但不会写入pandals数据fram

def find_donation_orgs(x): text = nltk.Text(nltk.word_tokenize(x)) donation = text.findall(r"<\.> <.*>{,15}? <donat.*|contrib.*|Donat.*|Contrib.*> <.*>*? <to> (<.*>+?) <\.|\,|\;> ") return donation

text = df.text.iloc[1] textfindall = text.findall(r"<\.> <.*>{,15}? <donat.*|contrib.*|Donat.*|Contrib.*> <.*>*? <to> (<.*>+?) <\.|\,|\;> ") print('text is ' + str(type(text))) print('textfindall is ' + str(type(textfindall))) print(textfindall)

1条回答

网友

1楼 · 发布于 2024-09-30 08:18:23

尝试通过检查函数实际接收和返回的内容来调试代码。您可以使用调试器（在大多数IDE中都可以找到），也可以使用函数的返回值来确定问题是出在函数还是熊猫函数上

def find_donation_orgs(x):
    return x

确保您的输入符合您的期望。在

^{pr2}$

看看标记化是怎么回事。在

def find_donation_orgs(x):
    text = nltk.Text(nltk.word_tokenize(x))
    all_occurrences = text.findall(r"<\.> <.*>{,15}? <donat.*|contrib.*|Donat.*|Contrib.*> <.*>*? <to> (<.*>+?) <\.|\,|\;> ")
    if all_occurrences is None:
        return "no occurrences"
    else:
        return all_occurrences

检查你的正则表达式是否有问题。在这种情况下，返回tokenizer输出，尝试修复正则表达式。在

更新

查看source code of the ^{}对象，findall方法实际上并不返回任何内容，而是打印结果：

^{4}$

这是因为Text对象仅用于交互式控制台：

A wrapper around a sequence of simple (string) tokens, which is intended to support initial exploration of texts (via the interactive console). [...] If you wish to write a program which makes use of these analyses, then you should bypass the Text class, and use the appropriate analysis function or class directly instead.

您的功能应该如下所示：

from nltk.util import tokenwrap
def find_donation_orgs(x):
    searcher = nltk.TokenSearcher(nltk.word_tokenize(x))
    hits = searcher.findall(r"<\.> <.*>{,15}? <donat.*|contrib.*|Donat.*|Contrib.*> <.*>*? <to> (<.*>+?) <\.|\,|\;> ")

    hits = [' '.join(h) for h in hits]
    donation = tokenwrap(hits, "; ")
    return donation

这将复制原始行为，但实际返回值除外。当然，在收到hits列表后，您可能希望以不同的方式格式化输出。在

更新

相关问题更多 >

编程相关推荐

热门问题

热门文章