protocol.http.web Module

Advanced HTTP Client handling.

class wpull.protocol.http.web.LoopType[source]

Bases: enum.Enum

Indicates the type of request and response.

authentication = None

Response to a HTTP authentication.

normal = None

Normal response.

redirect = None

Redirect.

robots = None

Response to a robots.txt request.

class wpull.protocol.http.web.WebClient(http_client: typing.Union=None, request_factory: typing.Callable=<class 'wpull.protocol.http.request.Request'>, redirect_tracker_factory: typing.Union=<class 'wpull.protocol.http.redirect.RedirectTracker'>, cookie_jar: typing.Union=None)[source]

Bases: object

A web client handles redirects, cookies, basic authentication.

Parameters:
close()[source]
cookie_jar

Return the Cookie Jar.

http_client

Return the HTTP Client.

redirect_tracker_factory

Return the Redirect Tracker factory.

request_factory

Return the Request factory.

session(request: wpull.protocol.http.request.Request) → wpull.protocol.http.web.WebSession[source]

Return a fetch session.

Parameters:request – The request to be fetched.

Example usage:

client = WebClient()
session = client.session(Request('http://www.example.com'))

with session:
    while not session.done():
        request = session.next_request()
        print(request)

        response = yield from session.start()
        print(response)

        if session.done():
            with open('myfile.html') as file:
                yield from session.download(file)
        else:
            yield from session.download()
Returns:WebSession
class wpull.protocol.http.web.WebSession(request: wpull.protocol.http.request.Request, http_client: wpull.protocol.http.client.Client, redirect_tracker: wpull.protocol.http.redirect.RedirectTracker, request_factory: typing.Callable, cookie_jar: typing.Union=None)[source]

Bases: object

A web session.

done() → bool[source]

Return whether the session has finished.

Returns:If True, the document has been fully fetched.
Return type:bool
download(file: typing.Union=None, duration_timeout: typing.Union=None)[source]

Download content.

Parameters:
  • file – An optional file object for the document contents.
  • duration_timeout – Maximum time in seconds of which the entire file must be read.
Returns:

An instance of http.request.Response.

Return type:

Response

See WebClient.session() for proper usage of this function.

Coroutine.

loop_type() → wpull.protocol.http.web.LoopType[source]

Return the type of response.

Seealso:LoopType.
next_request() → typing.Union[source]

Return the next Request to be fetched.

redirect_tracker

Return the Redirect Tracker.

start()[source]

Begin fetching the next request.