selenium 最全教程

作者：由隨遇而安發表于攝影時間：2021-01-13

selenium 本身是一套web自動化測試工具，但其經常被用於爬蟲，解決一些複雜爬蟲的問題。

selenium 用於爬蟲時，相當於模擬人操作瀏覽器。

瀏覽器驅動

使用 selenium 需要先安裝瀏覽器驅動，selenium 支援多種瀏覽器

支援的瀏覽器型別有十幾種，其中常用的有

chrome谷歌，驅動下載地址，注意瀏覽器與驅動的版本要匹配，下面的瀏覽器也一樣

firefox，火狐，驅動下載地址

ie，ie不好用，驅動下載地址

phantomjs，這是一個無介面的瀏覽器，特點是高效，後面我會有一篇部落格專門介紹它。

safari，手機瀏覽器

驅動要放到環境變數的地址裡，如 c：//python2，或者把驅動的地址放到環境變數裡

具體安裝請百度，搜尋 “selenium 瀏覽器驅動下載” 即可

注意，linux 中瀏覽器驅動要安裝對應的 linux 版本

基礎使用方法

1. 宣告瀏覽物件

from selenium import webdriver

#構造模擬瀏覽器

# firefox_login=webdriver。Ie（） # Firefox（）

firefox_login=webdriver。Chrome（）

這一步可設定無介面模式，即操作瀏覽器時，隱層瀏覽器

options = webdriver。ChromeOptions（）

options。add_argument（‘——headless’） # 設定無介面可選

firefox_login=webdriver。Chrome（chrome_options=options）

2. 訪問頁面

firefox_login。get（‘http：//www。renren。com/’）

# firefox_login。maximize_window（）# 視窗最大化，可有可無，看情況

firefox_login。minimize_window（）

3. 查詢元素並互動

firefox_login。find_element_by_id（‘email’）。clear（）

firefox_login。find_element_by_id（‘email’）。send_keys（‘xxx@sina。com’）

元素查詢方法彙總

find_element_by_name

find_element_by_id

find_element_by_xpath

find_element_by_link_text

find_element_by_partial_link_text

find_element_by_tag_name

find_element_by_class_name

find_element_by_css_selector

以上是單元素查詢，多元素把 element 變成 elements 即可。

還有一種較通用的方法

from selenium。webdriver。common。by import By 注意這裡要匯入

browser = webdriver。Chrome（）

browser。get（“http：//www。taobao。com”）

input_first = browser。find_element（By。ID，“q”） ID可以換成其他

4. 操作瀏覽器

firefox_login。find_element_by_id（‘login’）。click（）

可將操作放入動作鏈中序列執行

from selenium import webdriver

from selenium。webdriver import ActionChains

browser = webdriver。Chrome（）

url = “http：//www。runoob。com/try/try。php？filename=jqueryui-api-droppable”

browser。get（url）

browser。switch_to。frame（‘iframeResult’）

source = browser。find_element_by_css_selector（‘#draggable’）

target = browser。find_element_by_css_selector（‘#droppable’）

actions = ActionChains（browser）

actions。drag_and_drop（source， target）

actions。perform（）

上面實現了一個元素拖拽的功能

執行 js 命令

直接用js命令操作瀏覽器

from selenium import webdriver

browser = webdriver。Chrome（）

browser。get（“http：//www。zhihu。com/explore”）

browser。execute_script（‘window。scrollTo（0， document。body。scrollHeight）’）

browser。execute_script（‘alert（“To Bottom”）’）

5. 輸出並關閉

print（firefox_login。current_url）

print（firefox_login。page_source）

#瀏覽器退出

# firefox_login。close（）

firefox_login。quit（）

獲取元素屬性

get_attribute（‘class’）

logo = browser。find_element_by_id（‘zh-top-link-logo’）

print（logo。get_attribute（‘class’））

獲取文字 logo。text

獲取id logo。id

獲取位置 logo。location

獲取標籤名logo。tag_name

獲取size logo。size

方法進階

除了基礎的操作外，還有很多特殊的應用場景需要處理。

frame 標籤

很多網頁中存在 frame 標籤，要處理frame裡面的資料，首先要切入frame，處理完了還要切出來。

切入用 switch_to。frame，切出用 switch_to。parent_frame

示例

# encoding：utf-8

import time

from selenium import webdriver

from selenium。common。exceptions import NoSuchElementException

browser = webdriver。Chrome（）

url = ‘http：//www。runoob。com/try/try。php？filename=jqueryui-api-droppable’

browser。get（url）

browser。switch_to。frame（‘iframeResult’） # iframeResult 是 iframe 的 id 進入frame

source = browser。find_element_by_css_selector（‘#draggable’）

print（source）

try：

logo = browser。find_element_by_class_name（‘logo’）

except NoSuchElementException：

print（‘NO LOGO’）

browser。switch_to。parent_frame（） # 退出 frame

logo = browser。find_element_by_class_name（‘logo’）

print（logo）

print（logo。text）

等待

在操作瀏覽器時經常要等待，selenium 也有等待方法，分為顯式等待和隱式等待

複製程式碼

from selenium import webdriver

browser = webdriver。Chrome（）

browser。implicitly_wait（100）#

browser。get（‘https：//www。zhihu。com/explore’）

input = browser。find_element_by_class_name（‘zu-top-add-question’）

print（input）

顯式等待

from selenium import webdriver

from selenium。webdriver。common。by import By

from selenium。webdriver。support。ui import WebDriverWait

from selenium。webdriver。support import expected_conditions as EC

browser = webdriver。Chrome（）

browser。get（‘https：//www。taobao。com/’）

wait = WebDriverWait（browser， 100）#

input = wait。until（EC。presence_of_element_located（（By。ID， ‘q’）））

button = wait。until（EC。element_to_be_clickable（（By。CSS_SELECTOR， ‘。btn-search’）））

print（input， button）

顯式等待和隱式等待都是無阻塞的，即響應就繼續，不同的是，顯示等待需要設定響應條件，如獲取某元素。

常用判斷條件

title_is：判斷當前頁面的title是否等於預期

title_contains：判斷當前頁面的title是否包含預期字串

presence_of_element_located：判斷某個元素是否被加到了dom樹裡，並不代表該元素一定可見

visibility_of_element_located：判斷某個元素是否可見。可見代表元素非隱藏，並且元素的寬和高都不等於0

visibility_of：跟上面的方法做一樣的事情，只是上面的方法要傳入locator，這個方法直接傳定位到的element就好了

presence_of_all_elements_located：判斷是否至少有1個元素存在於dom樹中。舉個例子，如果頁面上有n個元素的class都是‘column-md-3’，那麼只要有1個元素存在，這個方法就返回True

text_to_be_present_in_element：判斷某個元素中的text是否包含了預期的字串

text_to_be_present_in_element_value：判斷某個元素中的value屬性是否包含了預期的字串

frame_to_be_available_and_switch_to_it：判斷該frame是否可以switch進去，如果可以的話，返回True並且switch進去，否則返回False

invisibility_of_element_located：判斷某個元素中是否不存在於dom樹或不可見

element_to_be_clickable - it is Displayed and Enabled：判斷某個元素中是否可見並且是enable的，這樣的話才叫clickable

staleness_of：等某個元素從dom樹中移除，注意，這個方法也是返回True或False

element_to_be_selected：判斷某個元素是否被選中了，一般用在下拉列表

element_located_to_be_selected

element_selection_state_to_be：判斷某個元素的選中狀態是否符合預期

element_located_selection_state_to_be：跟上面的方法作用一樣，只是上面的方法傳入定位到的element，而這個方法傳入locator

alert_is_present：判斷頁面上是否存在alert

更多參考：

wait。until（EC。text_to_be_present_in_element_value（（‘id’， ‘inputSearchCity’）， u‘西安’））

瀏覽器的前進後退

forward/back

import time

from selenium import webdriver

browser = webdriver。Chrome（）

browser。get（‘https：//www。baidu。com/’）

browser。get（‘https：//www。taobao。com/’）

browser。back（）

time。sleep（1）

browser。forward（）

browser。close（）

cookie 操作

get_cookies（）

delete_all_cookies（）

add_cookie（）

from selenium import webdriver

browser = webdriver。Chrome（）

browser。get（‘https：//www。zhihu。com/explore’）

print（browser。get_cookies（））

browser。add_cookie（{‘name’： ‘name’， ‘domain’： ‘www。zhihu。com’， ‘value’： ‘zhaofan’}）

print（browser。get_cookies（））

browser。delete_all_cookies（）

print（browser。get_cookies（））

選項卡管理

暫略

異常處理

暫略

參考資料：

https：//

selenium-python。readthedocs。io

英文官方教程

https：//

selenium-python。readthedocs。io

/api。html

webdriver API

《Python爬蟲開發與專案實戰》pdf電子書

http：//www。

cnblogs。com/zhaof/p/695

3241。html

很好的教程

https：//www。

jianshu。com/p/47853fdb6

13b

等待

https：//

blog。csdn。net/qq_383166

55/article/details/81989232

等待例項

標簽： Browser element Selenium WebDriver 瀏覽器

上一篇:為什麼景深會跟拍攝距離，感測器大小還有焦段長短有關係？

下一篇：請問：有沒有可以直接連線蘋果手機的雙排觸點高速儲存卡的讀卡器？

selenium 最全教程

猜你喜歡

瀏覽器頁面另存為影象存不完整怎麼解決？

怎麼戒傳統手藝

什麼是迴流（重排）和什麼是重繪？

TypeScript資料結構與演算法：集合

ggplot2點線構圖的幾種常見案例型別