Tag Archives: Can’t match

The matching result of Python XPath is null

When writing Python crawlers, in order to save time, you usually open F12 in the browser and right-click to copy XPath.

There is a hole in Google browser, which took half a day


The copied XPath is as follows in
Python:// * [@ id = “mainframe”]/div/table/tbody/TR/TD [1]// text ()


Use the browser plug-in XPath helper to test the match successfully!


In the Python code, you can’t match it, as follows

xxx.xpath('//*[@id="mainFrame"]/div/table/tbody/tr/td[1]//text()')

The matching result is an empty list.


reason:

The browser “optimizes” the XPath, so that the XPath copied directly from the browser can’t be run in Python.

Solution:

Delete the extra tbody. The code is as follows:

#There is an extra tbody, delete it
xxx.xpath('//*[@id="mainFrame"]/div/table/tbody/tr/td[1]//text()')
# The modified code is as follows and successfully matches.
xxx.xpath('//*[@id="mainFrame"]/div/table/tr/td[1]//text()')