使用BeautifulSoup转换XLIFF文件

3条回答

网友

1楼 · 编辑于 2024-09-27 19:09:27

要从<source>中提取两个文本条目，可以使用以下方法：

from bs4 import BeautifulSoup
import requests

html = """<source><x ctype="x-htmltag" equiv-text="&lt;b&gt;" id="html_tag_191"/>Choose your product\
<x ctype="x-htmltag" equiv-text="&lt;/b&gt;" id="html_tag_192"/>From a list: </source>"""

soup = BeautifulSoup(html, 'lxml')
print(list(soup.source.stripped_strings))

给你：

['Choose your product', 'From a list:']

网友

2楼 · 编辑于 2024-09-27 19:09:27

我建议不要使用通用XML解析器解析XLIFF文件。相反，尝试寻找一个专门的XLIFF工具包。有一些python项目，但我没有使用它们的经验（我：主要是Java）

网友

3楼 · 编辑于 2024-09-27 19:09:27

您可以使用for-loop处理source中的所有子项。
你可以用copy.copy(child)和append到target复制它们。
同时，您可以检查child是否为NavigableString并将其转换

text = '''<source><x ctype="x-htmltag" equiv-text="&lt;b&gt;" id="html_tag_191"/>Choose your product\
<x ctype="x-htmltag" equiv-text="&lt;/b&gt;" id="html_tag_192"/>From a list: </source>'''

conversions = {
    'Choose your product': 'Wybierz swój produkt',
    'From a list: ': 'Z listy: ',
}

from bs4 import BeautifulSoup as BS
from bs4.element import NavigableString
import copy

#soup = BS(text, 'html.parser')  # it has problem to parse it
#soup = BS(text, 'html5lib')     # it has problem to parse it
soup = BS(text, 'lxml')

# create `<target>`
target = soup.new_tag('target')

# add `<target>` after `<source>
source = soup.find('source')
source.insert_after('', target)

# work with children in `<source>`
for child in source:
    print('type:', type(child))

    # duplicate child and add to `<target>`
    child = copy.copy(child)
    target.append(child)

    # convert text and replace in child in `<target>`        
    if isinstance(child, NavigableString):
        new_text = conversions[child.string]
        child.string.replace_with(new_text)

print(' - target  -')
print(target)
print(' - source  -')
print(source)
print(' - soup  -')
print(soup)

结果（为使其更具可读性，几乎没有重新格式化）：

type: <class 'bs4.element.Tag'>
type: <class 'bs4.element.NavigableString'>
type: <class 'bs4.element.Tag'>
type: <class 'bs4.element.NavigableString'>

 - target  -

<target>
  <x ctype="x-htmltag" equiv-text="&lt;b&gt;" id="html_tag_191"></x>
  Wybierz swój produkt
  <x ctype="x-htmltag" equiv-text="&lt;/b&gt;" id="html_tag_192"></x>
  Z listy: 
</target>

 - source  -

<source>
  <x ctype="x-htmltag" equiv-text="&lt;b&gt;" id="html_tag_191"></x>
  Choose your product
  <x ctype="x-htmltag" equiv-text="&lt;/b&gt;" id="html_tag_192"></x>
  From a list: 
</source>

 - soup  -

<html><body>
<source>
  <x ctype="x-htmltag" equiv-text="&lt;b&gt;" id="html_tag_191"></x>
  Choose your product
  <x ctype="x-htmltag" equiv-text="&lt;/b&gt;" id="html_tag_192"></x>
  From a list: 
</source>
<target>
  <x ctype="x-htmltag" equiv-text="&lt;b&gt;" id="html_tag_191"></x>
  Wybierz swój produkt
  <x ctype="x-htmltag" equiv-text="&lt;/b&gt;" id="html_tag_192"></x>
  Z listy: 
</target>
</body></html>

相关问题更多 >

编程相关推荐

热门问题

热门文章

使用BeautifulSoup转换XLIFF文件

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >