将字符串中的元素与数据帧中的行进行比较

text_1="source="The previous low was 27,523. The 1.35 trillion ($22.5 million ) program could start in October. The number of people who left the country plunged 99.8 percent from a year earlier to 2,750, according to the data from the agency."

Account Sentences 51343 The subsidies are expected to form part of a second budget. 6376 The subsidies, totalling 2.35tn, are expected to form part of a second budget. New plans to allocate $22.5 billion to a new reimbursement programme. 31 The subsidies, totalling 1.35tn, are expected to form part of a second budget. New plans to allocate $22.5 billion to a new reimbursement programme. 2624 The way to a sports fan’s heart? Behind-the-scenes content from their favourite teams. 613 The subsidies, totalling 1.43 tn, are expected to form part of a second budget. New plans to allocate $21.5 billion to a new reimbursement programme. 764 The subsidies, totalling 1.35tn, are expected to form part of a second budget. New plans to allocate $22.5 billion to a new reimbursement programme.

Account Common Difference Match? 51343 { 27.523, 1.35, 22.5, 2.750, 99.8 } 0 6376 22.5 2.35 0.5 31 {1.35, 22.5} { 27.523, 2.750, 99.8 } 0.5 2624 { 27.523, 1.35, 22.5, 2.750, 99.8 } 0 613 { 27.523, 1.35, 22.5, 2.750, 99.8 }, {1.43, 21.5} 0 764 {1.35, 22.5} { 27.523, 2.750, 99.8 } 0.5

1条回答

网友

1楼 · 发布于 2024-10-02 12:35:47

你可以这样做：

# define variables
regex = r"(\d+[\.|,]?\d*)"
numbers = { "27.523", "1.35", "22.5", "2.750", "99.8" }




# define needed functions
def normalize(numbers):
    # converts , in numbers to .
    normalized = set()
    for number in numbers:
        normalized.add(re.sub(',', '.', number))
    return normalized

def get_difference(text):
    found_numbers = set(re.findall(regex, text))
    found_numbers = normalize(found_numbers)
    return found_numbers.difference(numbers)

def get_common(text):
    found_numbers = set(re.findall(regex, text))
    found_numbers = normalize(found_numbers)
    return found_numbers.intersection(numbers)

def get_match_ratio(text):
    found_numbers = set(re.findall(regex, text))
    found_numbers = normalize(found_numbers)
    common = found_numbers.intersection(numbers)
    if len(common) == len(numbers):
        return 1
    elif len(common) > 0:
        return 0.5
    else:
        return 0

# apply the functions and generate new columns
df["Common"] = df["Sentences"].apply(get_common)
df["Difference"] = df["Sentences"].apply(get_difference)
df["Match?"] = df["Sentences"].apply(get_match_ratio)
df.drop(columns=["Sentences"], inplace=True)
print(df.to_string())
#   Account        Common    Difference  Match?
#0    51343            {}            {}     0.0
#1     6376        {22.5}        {2.35}     0.5
#2       31  {1.35, 22.5}            {}     0.5
#3     2624            {}            {}     0.0
#4      613            {}  {21.5, 1.43}     0.0
#5      764  {1.35, 22.5}            {}     0.5

根据Difference列。我不知道你是怎么得到这些值的。所以，我临时想知道什么对你有用

相关问题更多 >

编程相关推荐

热门问题

热门文章