Python：截断字符串并组合

我想要达到的目标

我的代码下面刮一个网站，并导出数据框到excel文件。但是，我需要从第一列中删除不必要的字符并将它们组合起来，这样就不需要在excel文件中重命名months。每一行都有一个来自网站的名称，HOZ18（2018年12月）“HOZ19（2019年12月）”，除了“\”之外，我对此不感兴趣。所以，我只想把12月18日，1月19日，2月20日等放在第一栏。你知道吗

代码

from urllib.request import urlopen import pandas as pd import requests from bs4 import BeautifulSoup url = "https://shared.websol.barchart.com/quotes/quote.php?page=quote&sym=ho&x=13&y=8&domain=if&display_ice=1&enabled_ice_exchanges=&tz=0&ed=0" res = requests.get(url) soup = BeautifulSoup(res.text, 'lxml') Contracts=[] LastPrice=[] data_rows = soup.findAll('tr')[2:] for td in data_rows: Contract = td.findAll('td')[0].text Contracts.append(Contract) LstPrice = td.findAll('td')[7].text LastPrice.append(LstPrice) df = pd.DataFrame({'Contracts': Contracts, 'Previous Settled': LastPrice})

2条回答

网友

1楼 · 编辑于 2024-09-29 23:28:04

如果您希望将Dec \'18 (HOZ18)这样的字符串转换为December 18，下面是一个解决方案。你知道吗

1）定义一个函数来清除字符串：

# define a dictionary to convert short month names to full ones
month_mapper = {
    'Jan': 'January',
    'Feb': 'February',
    'Mar': 'March',
    'Apr': 'April',
    'May': 'May',
    'Jun': 'June',
    'Jul': 'July',
    'Aug': 'August',
    'Sep': 'September',
    'Oct': 'October',
    'Nov': 'November',
    'Dec': 'December',
}

def clean_month_string(s):
    # replace the '\' char with empty string
    s = s.replace('\\', '')

    # split into three pieces on space
    # eg, "Dec '18 (HOZ18)" ->
    #   month = "Dec"
    #   year = "'18"
    #   code = "(HOZ18)"
    month, year, code = s.split(' ')

    # convert month using month mapper
    month = month_mapper[month]

    # remove the ' at the start of the year
    year = year.replace("'", "")

    # return new month and new year (dropping code)
    return ' '.join([month, year])

2）使用apply将该函数应用于数据帧中的每一行。

# drop that first row, which is not properly formatted
df = df.drop(0).reset_index(drop=True)

# apply the function to your 'Contracts' series.
df['Contracts'] = df['Contracts'].apply(clean_month_string)

网友

2楼 · 编辑于 2024-09-29 23:28:04

这里有一个不需要.apply()的选项。它假设我们面对的是21世纪的岁月，不确定这是否对你有用。它还将月份存储为一个数字，这可能很有用，如果没有，您可以删除该位。你知道吗

import pandas as pd
import re
import datetime

# Data setup.

data = pd.DataFrame(['Dec \'18 (HOZ18)', 'Jan \'19 (HOF19)', 'Feb \'19 (HOG19)'], columns = ['string'])

# Extract the month number using regex, then map it to a month number.

data['month_number'] = [datetime.datetime.strptime(re.sub('\s\'.*', '', i), '%b').month for i in data['string']]

# Extract the year, prepend '20' and store as an integer.

data['year'] = [int('20' + re.search('\d\d', i).group(0)) for i in data['string']]

print(data)

给予：

            string  month_number  year
0  Dec '18 (HOZ18)            12  2018
1  Jan '19 (HOF19)             1  2019
2  Feb '19 (HOG19)             2  2019

我想要达到的目标

代码

输出（仅部分）

相关问题更多 >

编程相关推荐

热门问题

热门文章