如何从asn1数据文件中提取数据并加载到数据帧中?

2024-10-01 07:35:46 发布

您现在位置:Python中文网/ 问答频道 /正文

我的最终目标是将从PubMed接收的元数据加载到pyspark数据帧中。 到目前为止,我已经成功地使用shell脚本从PubMed数据库下载了我想要的数据。 下载的数据为asn1格式。以下是数据输入的示例:

Pubmed-entry ::= {
  pmid 31782536,
  medent {
    em std {
      year 2019,
      month 11,
      day 30,
      hour 6,
      minute 0
    },
    cit {
      title {
        name "Impact of CYP2C19 genotype and drug interactions on voriconazole
 plasma concentrations: a spain pharmacogenetic-pharmacokinetic prospective
 multicenter study."
      },
      authors {
        names std {
          {
            name ml "Blanco Dorado S",
            affil str "Pharmacy Department, University Clinical Hospital
 Santiago de Compostela (CHUS). Santiago de Compostela, Spain.; Clinical
 Pharmacology Group, University Clinical Hospital, Health Research Institute
 of Santiago de Compostela (IDIS). Santiago de Compostela, Spain.; Department
 of Pharmacology, Pharmacy and Pharmaceutical Technology, Faculty of Pharmacy,
 University of Santiago de Compostela (USC). Santiago de Compostela, Spain."
          },
          {
            name ml "Maronas O",
            affil str "Genomic Medicine Group, Centro Nacional de Genotipado
 (CEGEN-PRB3), CIBERER, CIMUS, University of Santiago de Compostela (USC),
 Santiago de Compostela, Spain."
          },
          {
            name ml "Latorre-Pellicer A",
            affil str "Genomic Medicine Group, Centro Nacional de Genotipado
 (CEGEN-PRB3), CIBERER, CIMUS, University of Santiago de Compostela (USC),
 Santiago de Compostela, Spain."
          },
          {
            name ml "Rodriguez Jato T",
            affil str "Pharmacy Department, University Clinical Hospital
 Santiago de Compostela (CHUS). Santiago de Compostela, Spain."
          },
          {
            name ml "Lopez-Vizcaino A",
            affil str "Pharmacy Department, University Hospital Lucus Augusti
 (HULA). Lugo, Spain."
          },
          {
            name ml "Gomez Marquez A",
            affil str "Pharmacy Department, University Hospital Ourense
 (CHUO). Ourense, Spain."
          },
          {
            name ml "Bardan Garcia B",
            affil str "Pharmacy Department, University Hospital Ferrol (CHUF).
 A Coruna, Spain."
          },
          {
            name ml "Belles Medall D",
            affil str "Pharmacy Department, General University Hospital
 Castellon (GVA). Castellon, Spain."
          },
          {
            name ml "Barbeito Castineiras G",
            affil str "Microbiology Department, University Clinical Hospital
 Santiago de Compostela (CHUS). Santiago de Compostela, Spain."
          },
          {
            name ml "Perez Del Molino Bernal ML",
            affil str "Microbiology Department, University Clinical Hospital
 Santiago de Compostela (CHUS). Santiago de Compostela, Spain."
          },
          {
            name ml "Campos-Toimil M",
            affil str "Department of Pharmacology, Pharmacy and Pharmaceutical
 Technology, Faculty of Pharmacy, University of Santiago de Compostela (USC).
 Santiago de Compostela, Spain."
          },
          {
            name ml "Otero Espinar F",
            affil str "Department of Pharmacology, Pharmacy and Pharmaceutical
 Technology, Faculty of Pharmacy, University of Santiago de Compostela (USC).
 Santiago de Compostela, Spain."
          },
          {
            name ml "Blanco Hortas A",
            affil str "Epidemiology Unit. Fundacion Instituto de Investigacion
 Sanitaria de Santiago de Compostela (FIDIS), University Hospital Lucus
 Augusti (HULA), Spain."
          },
          {
            name ml "Duran Pineiro G",
            affil str "Clinical Pharmacology Group, University Clinical
 Hospital, Health Research Institute of Santiago de Compostela (IDIS).
 Santiago de Compostela, Spain."
          },
          {
            name ml "Zarra Ferro I",
            affil str "Pharmacy Department, University Clinical Hospital
 Santiago de Compostela (CHUS). Santiago de Compostela, Spain.; Clinical
 Pharmacology Group, University Clinical Hospital, Health Research Institute
 of Santiago de Compostela (IDIS). Santiago de Compostela, Spain."
          },
          {
            name ml "Carracedo A",
            affil str "Genomic Medicine Group, Centro Nacional de Genotipado
 (CEGEN-PRB3), CIBERER, CIMUS, University of Santiago de Compostela (USC),
 Santiago de Compostela, Spain.; Galician Foundation of Genomic Medicine,
 Health Research Institute of Santiago de Compostela (IDIS), SERGAS, Santiago
 de Compostela, Spain."
          },
          {
            name ml "Lamas MJ",
            affil str "Clinical Pharmacology Group, University Clinical
 Hospital, Health Research Institute of Santiago de Compostela (IDIS).
 Santiago de Compostela, Spain."
          },
          {
            name ml "Fernandez-Ferreiro A",
            affil str "Pharmacy Department, University Clinical Hospital
 Santiago de Compostela (CHUS). Santiago de Compostela, Spain.; Clinical
 Pharmacology Group, University Clinical Hospital, Health Research Institute
 of Santiago de Compostela (IDIS). Santiago de Compostela, Spain.; Department
 of Pharmacology, Pharmacy and Pharmaceutical Technology, Faculty of Pharmacy,
 University of Santiago de Compostela (USC). Santiago de Compostela, Spain."
          }
        }
      },
      from journal {
        title {
          iso-jta "Pharmacotherapy",
          ml-jta "Pharmacotherapy",
          issn "1875-9114",
          name "Pharmacotherapy"
        },
        imp {
          date std {
            year 2019,
            month 11,
            day 29
          },
          language "eng",
          pubstatus aheadofprint,
          history {
            {
              pubstatus other,
              date std {
                year 2019,
                month 11,
                day 30,
                hour 6,
                minute 0
              }
            },
            {
              pubstatus pubmed,
              date std {
                year 2019,
                month 11,
                day 30,
                hour 6,
                minute 0
              }
            },
            {
              pubstatus medline,
              date std {
                year 2019,
                month 11,
                day 30,
                hour 6,
                minute 0
              }
            }
          }
        }
      },
      ids {
        pubmed 31782536,
        doi "10.1002/phar.2351",
        other {
          db "ELocationID doi",
          tag str "10.1002/phar.2351"
        }
      }
    },
    abstract "BACKGROUND: Voriconazole, a first-line agent for the treatment
 of invasive fungal infections, is mainly metabolized by cytochrome P450 (CYP)
 2C19. A significant portion of patients fail to achieve therapeutic
 voriconazole trough concentrations, with a consequently increased risk of
 therapeutic failure. OBJECTIVE: To show the association between
 subtherapeutic voriconazole concentrations and factors affecting voriconazole
 pharmacokinetics: CYP2C19 genotype and drug-drug interactions. METHODS:
 Adults receiving voriconazole for antifungal treatment or prophylaxis were
 included in a multicenter prospective study conducted in Spain. The
 prevalence of subtherapeutic voriconazole troughs were analyzed in the rapid
 metabolizer and ultra-rapid metabolizer patients (RMs and UMs, respectively),
 and compared with the rest of the patients. The relationship between
 voriconazole concentration, CYP2C19 phenotype, adverse events (AEs), and
 drug-drug interactions was also assessed. RESULTS: In this study 78 patients
 were included with a wide variability in voriconazole plasma levels with only
 44.8% of patients attaining trough concentrations within the therapeutic
 range of 1 and 5.5 microg/ml. The allele frequency of *17 variant was found
 to be 29.5%. Compared with patients with other phenotypes, RMs and UMs had a
 lower voriconazole plasma concentration (RM/UM: 1.85+/-0.24 microg/ml versus
 other phenotypes: 2.36+/-0.26 microg/ml, ). Adverse events were more common
 in patients with higher voriconazole concentrations (p<0.05). No association
 between voriconazole trough concentration and other factors (age, weight,
 route of administration, and concomitant administration of enzyme inducer,
 enzyme inhibitor, glucocorticoids, or proton pump inhibitors) was found.
 CONCLUSION: These results suggest the potential clinical utility of using
 CYP2C19 genotype-guided voriconazole dosing to achieve concentrations in the
 therapeutic range in the early course of therapy. Larger studies are needed
 to confirm the impact of pharmacogenetics on voriconazole pharmacokinetics.",
    pmid 31782536,
    pub-type {
      "Journal Article"
    },
    status publisher
  }
}

这就是我被困的地方。我不知道如何从asn1中提取信息并将其放入pyspark数据帧中。有人能给我个建议吗?在


Tags: andofnamedemldepartmentclinicalstr
2条回答

你的问题可能不简单,但值得一试。在

方法1:

既然有了规范,就可以尝试寻找一个可以创建数据模型的ASN.1工具(又名ASN.1编译器)。在您的例子中,因为您下载了文本的ASN.1值,所以您需要这个工具来提供ASN.1值解码器。在

如果该工具正在生成Java代码,它将如下所示:

// decode a Pubmed-entry
// input is your data
Asn1ValueReader reader = new Asn1ValueReader(input);
PubmedEntry obj = PubmedEntry.readPdu(reader);
// access the data
obj.getPmid();
obj.getMedent();

一些注意事项:

  • 能做所有这些的工具将不是免费的(如果你找到一个)。这里的问题是,你有一个文本ASN1值,而工具通常会提供二进制解码器(BER、DER等)
  • 为了创建进入pyspark数据帧的记录,需要编写大量的粘合代码

我不久前写了this,但它没有文本ASN1值解码器

方法2:

如果您的数据足够简单并且是文本数据,那么您可以尝试编写自己的解析器(使用类似ANTLR的工具)。。。如果您不熟悉解析器,要评估此方法并不容易。在

编辑: 很遗憾,specification无效。在

以上数据绝对是“ASN.1格式”。这种格式称为ASN.1值表示法,用于文本表示ASN.1值。(这种格式比JSON编码规则的标准化早。今天,人们可以出于同样的目的使用JSON,但JSON的处理方式与ASN.1值表示法有一些不同)。在

正如YaFred自己指出的,上面YaFred发布的ASN.1模式包含一些错误。您自己发布的注释似乎也包含一些错误。我查看了NCBI的一整套ASN.1文件,注意到它们包含几个错误。因此,除非它们被固定,否则不能用符合ASN.1标准的工具(如ASN.1游乐场)来处理它们。其中一些错误很容易修复,但修复其他错误需要了解这些文件作者的意图。这种情况可能是由于NCBI项目使用他们自己的ASN.1工具包,这可能以某种非标准的方式使用ASN.1。在

我可以想象在NCBI工具箱中应该有一些方法可以让您解码上面的值表示法,所以如果我是您,我会研究一下这个工具箱。我不能给你一个更好的建议,因为我不知道NCBI工具包。在

相关问题 更多 >