Tag Archives: Image captcha

Get picture captcha with Python + Chrome

.

we’ll start by importing some libraries that we’ll use in our code:

import re  # 正则
import time  # 代码停顿执行
from selenium import webdriver  # 打开访问的网站
from PIL import Image  # 图片 安装PIL --> Pillow
import pytesseract  # 图片转文字

(if the above some library file is not installed, can be used in the terminal PIP command to install, or for installation in pyCharm oh, you can refer to https://blog.csdn.net/YuanLiYin079/article/details/108726138, the installation method of selenium in the article to try)

to get the captcha, we need to go to the browser we are going to visit (in this case, using the Google browser)

# chromedriver.exe文件放置的路径(根据自己的路径做适当的修改)
chrome_driver = r"C:\Users\Admin\AppData\Local\Programs\Python\Python37\Lib\site-packages\selenium\webdriver\chrome\chromedriver.exe"
driver = webdriver.Chrome(executable_path=chrome_driver)
driver.maximize_window()
driver.implicitly_wait(3)  # 等待3秒
login_url = 'https://我们要访问的登录页面的地址写在这里哦.com'
# 进入访问地址的登录页面
driver.get(login_url)
time.sleep(3)

enter the page, start to get the captcha!

# 获取图片验证码
# 1、全屏截图,设置要将图片放置的路径
driver.save_screenshot('D:\Python_work\images\image.png')
# 2、获取图片验证码坐标和大小
code_image = driver.find_element_by_class_name('verifyCodeImg')
code_location = code_image.location
code_image_size = code_image.size
time.sleep(2)
print("验证码的坐标为:", code_location)  # 控制台查看{'x': 716, 'y': 475}
print("验证码的大小为:", code_image_size)  # 图片大小{'height': 48, 'width': 140}

# 3、图片4个点的坐标位置
left = code_image.location['x']  # x点的坐标
top = code_image.location['y']  # y点的坐标
right = left + code_image.size['width']  # 上面右边点的坐标
Rdown = top + code_image.size['height']   # 下面右边点的坐标
image = Image.open('D:\Python_work\images\image.png')

# 4、将图片验证码截取
code_image = image.crop((left, top, right, Rdown))
code_image.save('D:\Python_work\images\image1.png')  # 截取的验证码图片保存为新的文件
codeStr = pytesseract.image_to_string(code_image)  # 图片转文字
# 5、去除识别出来的特殊字符
codeStrS = re.sub(u"([^\u4e00-\u9fa5\u0030-\u0039\u0041-\u005a\u0061-\u007a])", "", codeStr)
result_four = codeStrS[0:4]  # 只获取前4个字符
print(codeStrS)  # 打印识别的验证码

now we can see the obtained captcha we printed out in the console, perform your input operation, and see what happens!


install pytesseract,
download the tesseract_ocr file from https://github.com/UB-Mannheim/tesseract/wiki, install:
remember the path to install because it will be used later.


then, open found an error, open the pytesseract. Py files, Find tesseract_cmd, comment out the original, and add a new one: tesseract_cMD = “path /tesseract.exe”. Then execute the code, and it will execute successfully.