
2024-06-26 00:02:10 发布

您现在位置:Python中文网/ 问答频道 /正文







text = "F.N. Freitas, C. Singulani, G. Vila-Verde, Linea Science Server,: The Dark Energy Survey Data Release 2. Ap._J._Supp._Ser. 255, (2021).Alam S., A. de Mattia, A. Tamone, S. {\' A}vila, J.A. Peacock, V. Gonzalez-Perez, A. Smith, A. Raichoor, A.J. Ross, J.E. Bautista, E. Burtin, J. Comparat, K.S. Dawson, H. du Mas des Bourboux, S. Escoffier, H. Gil-Mar{\'\i}n, S. Habib, K. Heitmann, J. Hou, F.G. Mohammad, E.M. Mueller, R. Neveux, R. Paviot, W.J. Percival, G. Rossi, V. Ruhlmann-Kleider, R. Tojeiro, M. Vargas Maga{\~n}a, C. Zhao, G.B. Zhao: The completed SDSS-IV extended Baryon Oscillation Spectroscopic Survey: N-body mock challenge for the eBOSS emission line galaxy sample. Mon._Not._R._Astron._Soc. 504, (2021).Alam S., J.A. Peacock, D.J. Farrow, J. Loveday, A.M. Hopkins: Using GAMA to probe the impact of small-scale galaxy physics on nonlinear redshift-space distortions. Mon._Not._R._Astron._Soc. 503, (2021).Alam S., M. Aubert, S. Avila, C. Balland, J.E. Bautista, M.A. Bershady, D. Bizyaev, M.R. Blanton, A.S. Bolton, J. Bovy, J. Brinkmann, J.R. Brownstein, E. Burtin, S. Chabanier, M.J. Chapman, P.D. Choi, C.H. Chuang, J. Comparat, M.C. Cousinou, A. Cuceu, K.S. Dawson, S. de la Torre, A. de Mattia, V.S. Agathe, H.M. des Bourboux, S. Escoffier, T. Etourneau, J. Farr, A. Font-Ribera, P.M. Frinchaboy, S. Fromenteau, H. Gil-Mar{\'\i}n, J.M. Le Goff, A.X. Gonzalez-Morales, V. Gonzalez-Perez, K. Grabowski, J. Guy, A.J. Hawken, J. Hou, H. Kong, J. Parker, M. Klaene, J.P. Kneib, S. Lin, D. Long, B.W. Lyke, A. de la Macorra, P. Martini, K. Masters, F.G. Mohammad, J. Moon, E.M. Mueller, A. Mu{\~n}oz-Guti{\'e}rrez, A.D. Myers, S. Nadathur, R. Neveux, J.A. Newman, P. Noterdaeme, A. Oravetz, D. Oravetz, N. Palanque-Delabrouille, K. Pan, R. Paviot, W.J. Percival, I. P{\'e}rez-R{\`a}fols, P. Petitjean, M.M. Pieri, A. Prakash, A. Raichoor, C. Ravoux, M. Rezaie, J. Rich, A.J. Ross, G. Rossi, R. Ruggeri, V. Ruhlmann-Kleider, A.G. S{\'a}nchez, F.J. S{\'a}nchez, J.R. S{\'a}nchez-Gallego, C. Sayres, D.P. Schneider, H.J. Seo, A. Shafieloo, A. Slosar, A. Smith, J. Stermer, A. Tamone, J.L. Tinker, R. Tojeiro, M. Vargas-Maga{\~n}a, A. Variu, Y. Wang, B.A. Weaver, A.M. Weijmans, C. Y{\`e}che, P. Zarrouk, C. Zhao, G.B. Zhao, Z. Zheng: Completed SDSS-IV extended Baryon Oscillation Spectroscopic Survey: Cosmological implications from two decades of spectroscopic surveys at the Apache Point Observatory. Physical_Review_D 103, (2021).Alam S., N.P. Ross, S. Eftekharzadeh, J.A. Peacock, J. Comparat, A.D. Myers, A.J. Ross: Quasars at intermediate redshift are not special; but they are often satellites. Mon._Not._R._Astron._Soc. 504, (2021).Alonso-Herrero A., S. Garc{\'\i}a-Burillo, S.F. H{\"o}nig, I. Garc{\'\i}a-Bernete, C. Ramos Almeida, O. Gonz{\'a}lez-Mart {'hallo}"

encodings = {
    "'": u'\u0300',
    "'\\": u'\u0301',
    "^": u'\u0302',
    "~": u'\u0303',
    "o":  u'\u00D8',
    "ss": 'ß'


# remove the encoding and replace it with its corresponding character
def repl(m):
    string = m.group()
    get_open_bracket_idx = string.find('{')
    get_close_bracket_idx = string.find('}')
    encoding = substring.substringByChar(
        string, startChar=string[get_open_bracket_idx + 1], endChar=string[get_close_bracket_idx - 2])
    string_content = string[get_close_bracket_idx - 1]
    string_and_encoding = encoding + string
    string_content = encodings.get(encoding, string_content) + string_content
    print(f'encoding: {encoding}')
    print(f'string content: {string_content}')
    return string_content

# This nearly works, it just matches {'some_text} which it shouldnt
changed_text = re.sub(r'\{\\?[^{}]*}', repl, text)

Tags: thetextgetstringdesomecontentsurvey
1楼 · 发布于 2024-06-26 00:02:10




regex demo详细信息

  • \{-a{字符
  • ([^\w\s]+|_)-1组:一个特殊字符
  • \s*-零个或多个空格
  • (\w)-第2组:任何单词字符
  • }-}字符

样本implementation in Python

import re
text = r"{\' A}vila,  Y{\`e}che, {'hallo}"

encodings = {
    "\\'": u'\u0300',
    "\\`": u'\u0302',

def repl(m):
    encoding = m.group(1)
    string_content = m.group(2)
    if encoding in encodings:
        return string_content + encodings[encoding]
    return string_content

changed_text = re.sub(r'\{([^\w\s]+|_)\s*(\w)}', repl, text)
# => Àvila,  Yêche, {'hallo}

相关问题 更多 >