将字符串中的元素与数据帧中的行进行比较

2024-10-02 12:35:47 发布

您现在位置:Python中文网/ 问答频道 /正文

我在数据框中有一些行,如果它们包含以下字符串中包含的数字/数值,我将选择这些行

text_1="source="The previous low was 27,523. The 1.35 trillion ($22.5 million ) program could start in October. The number of people who left the country plunged 99.8 percent from a year earlier to 2,750, according to the data from the agency."

数据帧

Account       Sentences
51343     The subsidies are expected to form part of a second budget. 
6376     The subsidies, totalling 2.35tn, are expected to form part of a second budget. New plans to allocate $22.5 billion to a new reimbursement programme.

31     The subsidies, totalling 1.35tn, are expected to form part of a second budget. New plans to allocate $22.5 billion to a new reimbursement programme.

2624     The way to a sports fan’s heart? Behind-the-scenes content from their favourite teams.
613    The subsidies, totalling 1.43 tn, are expected to form part of a second budget. New plans to allocate $21.5 billion to a new reimbursement programme.

764    The subsidies, totalling 1.35tn, are expected to form part of a second budget. New plans to allocate $22.5 billion to a new reimbursement programme.

所需的输出将是创建三列:

  • 包括与一行匹配的所有数字的一个
  • 一个包括所有与文本不同的数字
  • 一个表示布尔值(1表示匹配所有值;0表示完全不匹配;0.5表示至少有一个公共值)

我尝试做的第一件事是在句号中更改所有逗号,以避免textSentences列的行中的数字混淆。 然后,从text中提取所有数字,以便与行中的每个数值进行比较

numb=(re.findall("\d+[,.\d]\d+", text))
for i in df['Sentences']:
        print(re.findall("\d+[,.\d]\d+", i))

句子中每一行要比较的数字是:27.523, 1.35, 22.5, 2.750, 99.8(请注意逗号应转换为句号)

现在,我应该创建一个新的专栏,其中包含了要获得的结果

Account       Common                       Difference                    Match?
51343                            { 27.523, 1.35, 22.5, 2.750, 99.8 }       0
6376           22.5                          2.35                         0.5

31        {1.35, 22.5}              { 27.523, 2.750, 99.8 }               0.5

2624                             { 27.523, 1.35, 22.5, 2.750, 99.8 }       0
613                    { 27.523, 1.35, 22.5, 2.750, 99.8 }, {1.43, 21.5}   0

764       {1.35, 22.5}           { 27.523, 2.750, 99.8 }                  0.5

你认为这是可行的吗?为了得到这些结果,你能给我一些建议吗


Tags: ofthetotextform数字aretn
1条回答
网友
1楼 · 发布于 2024-10-02 12:35:47

你可以这样做:

# define variables
regex = r"(\d+[\.|,]?\d*)"
numbers = { "27.523", "1.35", "22.5", "2.750", "99.8" }




# define needed functions
def normalize(numbers):
    # converts , in numbers to .
    normalized = set()
    for number in numbers:
        normalized.add(re.sub(',', '.', number))
    return normalized

def get_difference(text):
    found_numbers = set(re.findall(regex, text))
    found_numbers = normalize(found_numbers)
    return found_numbers.difference(numbers)

def get_common(text):
    found_numbers = set(re.findall(regex, text))
    found_numbers = normalize(found_numbers)
    return found_numbers.intersection(numbers)

def get_match_ratio(text):
    found_numbers = set(re.findall(regex, text))
    found_numbers = normalize(found_numbers)
    common = found_numbers.intersection(numbers)
    if len(common) == len(numbers):
        return 1
    elif len(common) > 0:
        return 0.5
    else:
        return 0

# apply the functions and generate new columns
df["Common"] = df["Sentences"].apply(get_common)
df["Difference"] = df["Sentences"].apply(get_difference)
df["Match?"] = df["Sentences"].apply(get_match_ratio)
df.drop(columns=["Sentences"], inplace=True)
print(df.to_string())
#   Account        Common    Difference  Match?
#0    51343            {}            {}     0.0
#1     6376        {22.5}        {2.35}     0.5
#2       31  {1.35, 22.5}            {}     0.5
#3     2624            {}            {}     0.0
#4      613            {}  {21.5, 1.43}     0.0
#5      764  {1.35, 22.5}            {}     0.5

根据Difference列。我不知道你是怎么得到这些值的。所以,我临时想知道什么对你有用

相关问题 更多 >

    热门问题