Inputs

Page object classes, in their __init__ method, must define input parameters with type hints pointing to input classes.

Those input classes may be:

Based on the target URL and parameter type hints, frameworks automatically build the required objects at run time, and pass them to the __init__ method of the corresponding page object class.

For example, if a page object class has an __init__ parameter of type HttpResponse, and the target URL is https://example.com, your framework would send an HTTP request to https://example.com, download the response, build an HttpResponse object with the response data, and pass it to the __init__ method of the page object class being used.

Built-in input classes

Warning

Not all frameworks support all web-poet built-in input classes.

The web_poet.page_inputs module defines multiple classes that you can define as inputs for a page object class, including:

Working with HttpResponse

HttpResponse has many attributes and methods.

Tip

You can use WebPage instead of ItemPage to have HttpResponse as input and get convenient shortcuts for working with it.

To get the entire response body, you can use body for the raw bytes, text for the str (decoded with the detected encoding), or json() to load a JSON response as a Python data structure:

>>> response.body
b'{"foo": "bar"}'
>>> response.text
'{"foo": "bar"}'
>>> response.json()
{'foo': 'bar'}

There are also methods to select content from responses: jmespath() for JSON and css() and xpath() for HTML and XML:

>>> response.jmespath("foo")
[<Selector query='foo' data='bar'>]
>>> response.css("h1::text")
[<Selector query='descendant-or-self::h1/text()' data='Title'>]
>>> response.xpath("//h1/text()")
[<Selector query='//h1/text()' data='Title'>]

Working with BrowserResponse

BrowserResponse is similar to HttpResponse, but for browser-rendered pages. In addition to the text attribute, it has an html attribute containing the rendered HTML (as a str) after JavaScript execution.

Like HttpResponse, it provides css() and xpath() methods to select content from the rendered page:

>>> response.html
'<html><head>...</head><body><h1>Title</h1>...</body></html>'
>>> response.css("h1::text")
[<Selector query='descendant-or-self::h1/text()' data='Title'>]
>>> response.xpath("//h1/text()")
[<Selector query='//h1/text()' data='Title'>]

Custom input classes

You may define your own input classes if you are using a framework that supports it.

However, note that custom input classes may make your page object classes less portable across frameworks.

Input annotations

A type hint that points to an input class can be annotated with Annotated. For example:

from typing import Annotated
from web_poet.page_inputs.http import HttpResponse
from web_poet.pages import WebPage


class MyPage(WebPage):
    def __init__(self, response: Annotated[HttpResponse, "my-metadata"]): ...

web-poet requires annotations to be JSON-serializable, for fixture support. Because Annotated requires annotations to be hashable, web-poet provides annotation_encode() to support list and dict structures in annotations. For example:

from typing import Annotated
from web_poet import annotation_encode
from web_poet.page_inputs.http import HttpResponse
from web_poet.pages import WebPage


class MyPage(WebPage):
    def __init__(
        self, response: Annotated[HttpResponse, annotation_encode({"foo": ["bar"]})]
    ): ...