When writing Python crawlers, in order to save time, you usually open F12 in the browser and right-click to copy XPath.
There is a hole in Google browser, which took half a day
The copied XPath is as follows in
Python:// * [@ id = “mainframe”]/div/table/tbody/TR/TD [1]// text ()
Use the browser plug-in XPath helper to test the match successfully!
In the Python code, you can’t match it, as follows
xxx.xpath('//*[@id="mainFrame"]/div/table/tbody/tr/td[1]//text()')
The matching result is an empty list.
reason:
The browser “optimizes” the XPath, so that the XPath copied directly from the browser can’t be run in Python.
Solution:
Delete the extra tbody. The code is as follows:
#There is an extra tbody, delete it
xxx.xpath('//*[@id="mainFrame"]/div/table/tbody/tr/td[1]//text()')
# The modified code is as follows and successfully matches.
xxx.xpath('//*[@id="mainFrame"]/div/table/tr/td[1]//text()')