URL库请求.urlopen（url）.read（）与SQLAlchemy一起存储的是十六进制字符串而不是HTML

class ParksTxState(Base): __tablename__ = 'parks_tx_state' id = Column(Integer, primary_key=True) park_name = Column(Text) url = Column(Text) html = Column(Text) engine = create_engine("postgresql://<user>:<pass>@localhost/<db>", echo=False) Session = sessionmaker(bind=engine) session = Session() url = 'https://tpwd.texas.gov/state-parks/abilene' html = request.urlopen(url).read() print(html) # b'<!DOCTYPE html>\n<html xmlns="http://www.w3.org/1999/xhtml">\n<head>\n... # so far so good... newpark = ParksTxState() newpark.html = html print(newpark.html) # b'<!DOCTYPE html>\n<html xmlns="http://www.w3.org/1999/xhtml">\n<head>\n... # so we're still good here before committing.... session.add(newpark) session.commit() print(newpark.html) # \x3c21444f43545950452068746d6c3e0a3... # and here is where the garbage comes in.

1条回答

网友

1楼 · 发布于 2024-05-19 21:14:36

好的，看起来request.urlopen(url).read()正在返回一个bytes对象（请参见Methods of File Objects）。这需要转换成一个带有.decode('utf-8')的字符串

html = request.urlopen(url).read()
html_string = html.decode('utf-8')

另见Convert bytes to a string?

相关问题更多 >

编程相关推荐

热门问题

热门文章