如何在PySpark中写入XML时删除特定属性

2024-10-02 14:30:17 发布

您现在位置:Python中文网/ 问答频道 /正文

下面是示例XML,其中AdditionalAttribute下的属性“Value”为空。 如何在pyspark中写入XML时删除此属性。我只想在属性为空时删除

<ItemList>
    <Item Action="MANAGE" ItemGroupCode="PROD" ItemID="Item1" OrganizationCode="" UnitOfMeasure="EACH">
        <PrimaryInformation Description="Item1" ItemType="WEB" ProductLine="GM" IsPickupAllowed="Y" IsReturnable="Y" IsShippingAllowed="Y" ShortDescription="Item1:Black:Medium" IsModelItem="N" ModelItemUnitOfMeasure="EACH" ImageLocation="" ImageID=""></PrimaryInformation>
        <ItemAliasList Reset="Y"></ItemAliasList>
        <Extn ExtnColor="BLACK" ExtnColorDesc=""></Extn>
        <ClassificationCodes Model="Item1"></ClassificationCodes>
        <AdditionalAttributeList Reset="Y">
            <AdditionalAttribute AttributeDomainID="ItemAttribute" AttributeGroupID="ItemAttributeGroup1" Name="Size" Value=""></AdditionalAttribute>
            <AdditionalAttribute AttributeDomainID="ItemAttribute" AttributeGroupID="ItemAttributeGroup1" Name="Color" Value=""></AdditionalAttribute>
        </AdditionalAttributeList>
    </Item>

</ItemList>

I tried with setting treatEmptyValuesAsNulls to true while writing. But it doesn't work.

 df.write \
    .format('xml') \
    .options(rowTag='Item', rootTag='ItemList', treatEmptyValuesAsNulls = 'true') \
    .save(path)

Any kind of help is appreciated.

Tags: 属性valuexmlitemreseteachitem1itemlist