Selenium模拟淘宝登陆的一次尝试

本文最后更新于:2021年1月25日 晚上

最近在研究python爬虫,因为教程是前几年的,所以随着网站的迭代,实战部分出现了很多新的问题。出现问题解决问题go~

今天先说说selenium被识别如何解决?

解决Selenium被识别

  • Selenium被监测出来一般是在JS代码里判断,加上execute_cdp_cmd后面的这句话就好了
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    from selenium import webdriver

    # 实例化driver
    driver = webdriver.Chrome()
    # 防止被监测
    driver.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", {
    "source": """
    Object.defineProperty(navigator, 'webdriver', {
    get: () => undefined
    })
    """
    })

    Selenium的options配置

    1
    2
    3
    4
    5
    6
    7
    8
    from selenium import webdriver
    chrome_options = webdriver.ChromeOptions()
    chrome_options.add_argument('--no-sandbox') # 解决DevToolsActivePort文件不存在的报错
    chrome_options.add_argument('window-size=1920x1080') # 指定浏览器分辨率
    chrome_options.add_argument('--disable-gpu') # 谷歌文档提到需要加上这个属性来规避bug
    chrome_options.add_argument('--hide-scrollbars') # 隐藏滚动条, 应对一些特殊页面
    chrome_options.add_argument('blink-settings=imagesEnabled=false') # 不加载图片, 提升速度
    chrome_options.add_argument('--headless') # 浏览器不提供可视化页面. linux下如果系统不支持可视化不加这条会启动失败

    爬虫上线的Selenium配置

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    from selenium import webdriver


    chrome_options = webdriver.ChromeOptions()
    chrome_options.add_argument('--no-sandbox')
    chrome_options.add_argument('--disable-gpu')
    chrome_options.add_argument('blink-settings=imagesEnabled=false')
    chrome_options.add_argument('--headless')

    driver = webdriver.Chrome(
    executable_path='C:\Program Files (x86)\Google\Chrome\Application\chromedriver.exe', # 不是Window不需要加
    options = chrome_options,
    )

    driver.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", {
    "source": """
    Object.defineProperty(navigator, 'webdriver', {
    get: () => undefined
    })
    """
    })
  • 上述内容来自https://blog.csdn.net/kzl_knight/article/details/106613495

模拟登陆实现

  • 知道了上面的内容(其实不是很懂,复制粘贴就完了),实现模拟登陆就比较容易了
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    # -*- coding: utf-8 -*-
    """
    @Time : 2021/1/25 19:28
    @Auth : Ne-21
    @File :taobao_login.py
    @IDE :PyCharm
    @Motto:Another me.

    """
    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    from selenium.webdriver import ActionChains
    import time

    options = webdriver.ChromeOptions()
    browser = webdriver.Chrome(options=options)
    # 防止被监测
    browser.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", {
    "source": '''
    Object.defineProperty(navigator, 'webdriver', {
    get: () => undefined
    })
    '''
    })
    wait = WebDriverWait(browser, 5)


    def do_slider():
    """
    处理滑动验证码,没有测试
    :return:
    """
    slider_go = wait.until(
    EC.presence_of_element_located((By.CSS_SELECTOR, '#nc_1_n1z'))
    )
    # 实例化一个动作链关联游览器
    action = ActionChains(browser)
    action.reset_actions()
    # 使用鼠标动作链进行点击并悬浮
    action.click_and_hold(slider_go)
    # 滑动验证码
    action.move_by_offset(xoffset=258, yoffset=0).perform()
    time.sleep(1)


    def login(username, password):
    browser.get('https://login.taobao.com/member/login.jhtml')
    input_username = wait.until(
    EC.presence_of_element_located((By.CSS_SELECTOR, '#fm-login-id'))
    )
    input_password = wait.until(
    EC.presence_of_element_located((By.CSS_SELECTOR, '#fm-login-password'))
    )
    submit = wait.until(
    EC.element_to_be_clickable((By.CSS_SELECTOR, '#login-form > div.fm-btn > button'))
    )
    input_username.send_keys(username)
    time.sleep(2)
    input_password.send_keys(password)
    time.sleep(2)
    submit.click()
    time.sleep(3) # 等待检验滑块

    # 判断有无滑块验证
    try:
    slider = wait.until(
    EC.presence_of_element_located((By.CSS_SELECTOR, '#nc_1__scale_text > span'))
    )
    if bool(slider):
    print('发现滑块验证码')
    time.sleep(2)
    do_slider()
    submit.click()
    else:
    print('未发现滑块')
    pass
    except:
    print('未发现滑块')
    finally:
    print('登录成功')




    def main():
    login(username='', password='')


    if __name__ == '__main__':
    main()