0%

web自动化神器playwright

playwright

优点

  • Selenium需要通过WebDriver操作浏览器;Playwright通过开发者工具与浏览器交互,安装简洁,不需要安装各种Driver。

  • Playwright几乎支持所有语言,且不依赖于各种Driver,通过调用内置浏览器所以启动速度更快。

  • Selenium基于HTTP协议(单向通讯),Playwright基于Websocket(双向通讯)可自动获取浏览器实际情况。

  • 比如使用selenium时,操作元素需要对每个元素进行智能查询等待等,而Playwright为自动等待:

    • 等待元素出现(定位元素时,自动等待30s,时间可以自定义,单位毫秒)
    • 等待事件发生
  • Playwright速度比selenium快很多,还支持异步方式

  • 支持使用API的方式发送请求

限制

  • 不支持旧版Edge和IE11。Playwright不支持传统的Microsoft Edge或IE11,支持新的Microsoft Edge (在Chromium上)。
  • 在真实移动设备上测试: Playwright使用桌面浏览器来模拟移动设备。

安装

1
2
3
4
5
6
#升级pip
pip install --upgrade pip
#安装playwright模块
pip install playwright
#安装主流浏览器依赖,时间可能较久
playwright install

测试

  • 录制代码,输入下面的命令,启动一个浏览器,一个代码记录器,然后再浏览器的所有步骤都自动记录到了代码记录器中
1
python -m playwright codegen

image-20221011105539960

  • 录制代码如下
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
from playwright.sync_api import Playwright, sync_playwright, expect
def run(playwright: Playwright) -> None:
browser = playwright.chromium.launch(headless=False)
context = browser.new_context()
page = context.new_page()
page.goto("https://www.baidu.com/")
page.locator("input[name=\"wd\"]").click()
page.locator("input[name=\"wd\"]").fill("python")
page.get_by_role("button", name="百度一下").click()
page.wait_for_url("https://www.baidu.com/s?ie=utf-8&f=8&rsv_bp=1&rsv_idx=1&tn=baidu&wd=python&fenlei=256&rsv_pq=0xc3da98d700012600&rsv_t=a363ozUooWOMdrOI3S3PH3JauszohenVsQYNmRX6SyweDX91MOi0p89Sb4HG&rqlang=en&rsv_enter=0&rsv_dl=tb&rsv_sug3=6&rsv_sug1=1&rsv_sug7=100&rsv_btype=i&inputT=1846&rsv_sug4=1847&rsv_jmp=fail")
# ---------------------
context.close()
browser.close()
with sync_playwright() as playwright:
run(playwright)

通过以上代码可以了解到:

  • playwright支持同步和异步两种使用方法

  • 不需要为每个浏览器下载webdriver

  • 相比selenium多了一层context抽象

  • 支持无头浏览器,且较为推荐(headless默认值为True)

  • 可以使用传统定位方式(CSS,XPATH等),也有自定义的新的定位方式(如文字定位)

  • 没有使用selenium的先定位元素,再进行操作的方式,而是在操作方法中传入了元素定位,定位和操作同时进行(其实也playwright也提供了单独的定位方法,作为可选)

  • 很多方法使用了with的上下文语法

  • 当然更多的人愿意在Pycharm中手写用例

playwright基本概念

PlayWright的核心概念包括:

Browser

  • 一个Browser是一个Chromium, Firefox 或 WebKit(plarywright支持的三种浏览器)的实例plarywright脚本通常以启动浏览器实例开始,以关闭浏览器结束。浏览器实例可以在headless(没有 GUI)或head模式下启动。Browser实例创建:
1
2
3
4
5
from playwright.sync_api import sync_playwright

with sync_playwright() as p:
browser = p.chromium.launch(headless=False)
browser.close()
  • 启动browser实例是比较耗费资源的,plarywright做的就是如何通过一个browser实例最大化多个BrowserContext的性能。
  • API:Browser

BrowserContext

  • 一个BrowserContex就像是一个独立的匿名模式会话(session),非常轻量,但是又完全隔离。

  • (译者注:每个browser实例可有多个BrowserContex,且完全隔离。比如可以在两个BrowserContext中登录两个不同的账号,也可以在两个 context 中使用不同的代理。 )

  • context创建:

1
2
browser = playwright.chromium.launch()
context = browser.new_context()
  • context还可用于模拟涉及移动设备、权限、区域设置和配色方案的多页面场景,如移动端context创建:
1
2
3
4
5
6
7
8
9
10
11
12
from playwright.sync_api import sync_playwright

with sync_playwright() as p:
iphone_11 = p.devices['iPhone 11 Pro']
browser = p.webkit.launch(headless=False)
context = browser.new_context(
**iphone_11,
locale='de-DE',
geolocation={ 'longitude': 12.492507, 'latitude': 41.889938 },
permissions=['geolocation']
)
browser.close()

API:

Page 和 Frame

  • 一个BrowserContext可以有多个page,每个page代表一个tab或者一个弹窗。page用于导航到URL并与page内的内容交互。创建page:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
page = context.new_page()

# Navigate explicitly, similar to entering a URL in the browser.
page.goto('http://example.com')
# Fill an input.
page.fill('#search', 'query')

# Navigate implicitly by clicking a link.
page.click('#submit')
# Expect a new url.
print(page.url)

# Page can navigate from the script - this will be picked up by Playwright.
# window.location.href = 'https://example.com'
  • 一个page可以有多个frame对象,但只有一个主frame,所有page-level的操作(比如click),都是作用在主frame上的。page的其他frame会打上iframe HTML标签,这些frame可以在内部操作实现访问。
1
2
3
4
5
6
7
8
9
10
11
12
# 通过name属性获取frame
frame = page.frame('frame-login')

# 通过URL获取frame
frame = page.frame(url=r'.*domain.*')

# 通过其他选择器(selector)获取frame
frame_element_handle = page.query_selector('.frame-class')
frame = frame_element_handle.content_frame()

# 与frame交互
frame.fill('#username-input', 'John')
  • 在录制模式下,会自动识别是否是frame内的操作,不好定位frame时,那么可以使用录制模式来找。

API:

Selector

  • playwright可以通过 CSS selector, XPath selector, HTML 属性(比如 id, data-test-id)或者是文本内容定位元素。

  • 除了xpath selector外,所有selector默认都是指向shadow DOM,如果要指向常规DOM,可使用*:light。不过通常不需要。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# Using data-test-id= selector engine
page.click('data-test-id=foo')

# CSS and XPath selector engines are automatically detected
page.click('div')
page.click('//html/body/div')

# Find node by text substring
page.click('text=Hello w')

# Explicit CSS and XPath notation
page.click('css=div')
page.click('xpath=//html/body/div')

# Only search light DOM, outside WebComponent shadow DOM:
page.click('css:light=div')

# 不同的selector可组合使用,用 >>连接
# Click an element with text 'Sign Up' inside of a #free-month-promo.
page.click('#free-month-promo >> text=Sign Up')

# Capture textContent of a section that contains an element with text 'Selectors'.
section_text = page.eval_on_selector('*css=section >> text=Selectors', 'e => e.textContent')

详细:

Element selectors | Playwright Python

Auto-waiting

  • playwright在执行操作之前对元素执行一系列可操作性检查,以确保这些行动按预期运行。它会自动等待(auto-wait)所有相关检查通过,然后才执行请求的操作。如果所需的检查未在给定的范围内通过timeout,则操作将失败并显示TimeoutError

  • 如 page.click(selector, **kwargs) 和 page.fill(selector, value, **kwargs) 这样的操作会执行auto-wait ,等待元素变成可见(visible)和 可操作( actionable)。例如,click将会:

    • 等待selectorx选定元素出现在 DOM 中

    • 待它变得可见(visible):有非空的边界框且没有 visibility:hidden

    • 等待它停止移动:例如,等待 css 过渡(css transition)完成

    • 将元素滚动到视图中

    • 等待它在动作点接收点事件:例如,等待元素不被其他元素遮挡

    • 如果在上述任何检查期间元素被分离,则重试

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# Playwright waits for #search element to be in the DOM
page.fill('#search', 'query')

# Playwright waits for element to stop animating
# and accept clicks.
page.click('#search')

#也可显示执行等待动作

# Wait for #search to appear in the DOM.
page.wait_for_selector('#search', state='attached')
# Wait for #promo to become visible, for example with `visibility:visible`.
page.wait_for_selector('#promo')

# Wait for #details to become hidden, for example with `display:none`.
page.wait_for_selector('#details', state='hidden')
# Wait for #promo to be removed from the DOM.
page.wait_for_selector('#promo', state='detached')

Execution context

  • API page.evaluate(expression, **kwargs) 可以用来运行web页面中的 JavaScript函数,并将结果返回到plarywright环境中。浏览器的全局变量,如 windowdocument, 可用于 evaluate。
1
2
3
4
5
6
7
8
href = page.evaluate('() => document.location.href')

# if the result is a Promise or if the function is asynchronous evaluate will automatically wait until it's resolved

status = page.evaluate("""async () => {
response = fetch(location.href)
return response.status
}""")

Evaluation Argument

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
result = page.evaluate("([x, y]) => Promise.resolve(x * y)", [7, 8])
print(result) # prints "56"


print(page.evaluate("1 + 2")) # prints "3"
x = 10
print(page.evaluate(f"1 + {x}")) # prints "11"


body_handle = page.query_selector("body")
html = page.evaluate("([body, suffix]) => body.innerHTML + suffix", [body_handle, "hello"])
body_handle.dispose()


# A primitive value.
page.evaluate('num => num', 42)

# An array.
page.evaluate('array => array.length', [1, 2, 3])

# An object.
page.evaluate('object => object.foo', { 'foo': 'bar' })

# A single handle.
button = page.query_selector('button')
page.evaluate('button => button.textContent', button)

# Alternative notation using elementHandle.evaluate.
button.evaluate('(button, from) => button.textContent.substring(from)', 5)

# Object with multiple handles.
button1 = page.query_selector('.button1')
button2 = page.query_selector('.button2')
page.evaluate("""o => o.button1.textContent + o.button2.textContent""",
{ 'button1': button1, 'button2': button2 })

# Object destructuring works. Note that property names must match
# between the destructured object and the argument.
# Also note the required parenthesis.
page.evaluate("""
({ button1, button2 }) => button1.textContent + button2.textContent""",
{ 'button1': button1, 'button2': button2 })

# Array works as well. Arbitrary names can be used for destructuring.
# Note the required parenthesis.
page.evaluate("""
([b1, b2]) => b1.textContent + b2.textContent""",
[button1, button2])

# Any non-cyclic mix of serializables and handles works.
page.evaluate("""
x => x.button1.textContent + x.list[0].textContent + String(x.foo)""",
{ 'button1': button1, 'list': [button2], 'foo': None })

结合pytest

  • testcas\conftest.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
import pytest
from playwright.sync_api import sync_playwright
from py._xmlgen import html


@pytest.fixture()
def browser():
playwrigh = sync_playwright().start()
browser = playwrigh.chromium.launch(headless=False)

# 返回数据
yield browser

# 实现用例后置
browser.close()
playwrigh.stop()


@pytest.mark.hookwrapper
def pytest_runtest_makereport(item, call):
outcome = yield
report = outcome.get_result()
report.description = str(item.function.__doc__)
report.nodeid = report.nodeid.encode("utf-8").decode("unicode_escape") #

def pytest_html_results_table_header(cells):
cells.insert(1, html.th('用例名称'))
cells.insert(2, html.th('Test_nodeid'))
cells.pop(2)


def pytest_html_results_table_row(report, cells):
cells.insert(1, html.td(report.description))
cells.insert(2, html.td(report.nodeid))
cells.pop(2)


def pytest_html_results_table_html(report, data):
if report.passed:
del data[:]
data.append(html.div('通过的用例未捕获日志输出.', class_='empty log'))


def pytest_html_report_title(report):
report.title = "pytest示例项目测试报告"


  • testcase\test1.py,page.request.get可以直接发送请求
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
import pytest


class TestClassName:
@pytest.mark.usefixtures("browser")
def test_func_name1(self, browser):
context = browser.new_context()
page = context.new_page()
# 发送http请求
resp = page.request.get("http://www.kuaidi100.com/query?type=")
print(resp.text())
page.goto("https://www.baidu.com/")
assert page.title() == "百度一下,你就知道"

page.locator("input[name=\"wd\"]").click()
page.locator("input[name=\"wd\"]").fill("python")
page.get_by_role("button", name="百度一下").click()
context.close()

@pytest.mark.usefixtures("browser")
def test_func_name1_1(self, browser):
context = browser.new_context()
page = context.new_page()
page.goto("https://www.baidu.com/")
assert page.title() == "百度一下,你就知道1"
page.locator("input[name=\"wd\"]").click()
page.locator("input[name=\"wd\"]").fill("python")
page.get_by_role("button", name="百度一下").click()
context.close()


  • 执行用例
1
2
3
4
5
6
# 批量运行用例
pytest -s testcase\ --html=report.html --self-contained-html --capture=sys

# 多线程运行用例
pip install pytest-multithreading -i https://pypi.douban.com/simple
pytest -s testcase/ --th 10 --html=report.html --self-contained-html --capture=sys
  • 查看执行结果

image-20221011152104439

检查元素可见性

  • 在元素定位过程中,经常出现元素出现了,但是实际定位不到,这时候可以检查dom元素的可见性
1
2
3
4
5
6
7
8
9
10
def find_el(page, el, timeout=10000):
try:
# 等待元素出现到dom中
element = page.wait_for_selector(el, state="attached", timeout=timeout)
# 等待元素可见
element.wait_for_element_state("visible", timeout=timeout)
return element
except Exception as e:
pass
return None

其他

  • 如何集成到CI上待实践
  • 关于多机并行,可以多进程去启动,也可以在CI上新建几个节点去执行