如何使用Python从MySql表中循环解析模式

def parse(self, response): for mbuh in response.xpath('//body'): Item = ParsingerbotItem() Item['ling'] = str(response.url) ngaliase = re.findall("\w+.com", str(response.url))[0] mmhtml = mbuh.xpath('//body').extract() cur.execute("select aliase, pattern, seq, opsi, replacer from tb_bersihin where aliase='"+ngaliase+"\' order by seq asc") for filde in cur.fetchall(): faliase = filde[0] fpattern = filde[1] fseq = filde[2] fopsi = filde[3] freplacer = filde[4] print "faliase=%s,fpattern=%s,furutan=%d,fopsi=%s,freplacer=%s" % \ (faliase, fpattern, fseq, fopsi, freplacer ) if ( freplacer == "NO" ) : freplacer="" if ( fopsi == "NL" ) : fopsi="re.DOTALL" k1 = re.sub(fpattern , freplacer, str(mmhtml), re.DOTALL) print k1

3条回答

网友

1楼 · 编辑于 2024-09-29 23:27:24

循环for mbuh in response.xpath('//body')只执行一次，因为xpath只返回一个选择器。然后使用mmhtml = mbuh.xpath(etc)，它将始终返回相同的数据，无论mbuh中的内容是什么，因为xpath以“//”开头，这意味着“从页面开始搜索”。它还会将整个页面提取为文本。你知道吗

我明白你为什么这么说了当前fetchall，但为什么要循环使用mbuh=响应.xpath()? 您希望xpath返回什么？你知道吗

网友

2楼 · 编辑于 2024-09-29 23:27:24

@instete，看看this漂亮的小蜘蛛。它读取thisCSV文件，并对页面进行通用解析。以此为起点，将CSV文件读取更改为数据库读取。很可能您没有1000个url，因此只需要从数据库中读取一次，并将XPath表达式存储在内存中。有帮助吗？你知道吗

网友

3楼 · 编辑于 2024-09-29 23:27:24

我想我解决了我自己的问题，也许我不擅长描述我上面的问题，但我只想把第一个模式的结果作为第二个模式的主题，然后从mysql表继续到下一个模式。。你知道吗

我所做的只是改变了

k1 = re.sub(fpattern , freplacer, str(mmhtml), re.DOTALL)

进入

k1 = re.sub(fpattern , freplacer, k1)

在循环cur.fetchall()之前声明k1 = str(mmhtml)

谢谢

相关问题更多 >

编程相关推荐

热门问题

热门文章