`document.base` Module¶

Document bases.

class wpull.document.base.BaseDocumentDetector[source]¶

Bases: object

Base class for classes that detect document types.

classmethod is_file(file)[source]¶

Return whether the reader is likely able to read the file.

Parameters:	file – A file object containing the document.
Returns:	bool

classmethod is_request(request)[source]¶

Return whether the request is likely supported.

Parameters:	request (`http.request.Request`) – An HTTP request.
Returns:	bool

classmethod is_response(response)[source]¶

Return whether the response is likely able to be read.

Parameters:	response (`http.request.Response`) – An HTTP response.
Returns:	bool

classmethod is_supported(file=None, request=None, response=None, url_info=None)[source]¶

Given the hints, return whether the document is supported.

Parameters:	file – A file object containing the document. request (`http.request.Request`) – An HTTP request. response (`http.request.Response`) – An HTTP response. url_info (`url.URLInfo`) – A URLInfo.
Returns:	If True, the reader should be able to read it.
Return type:	bool

classmethod is_url(url_info)[source]¶

Return whether the URL is likely to be supported.

Parameters:	url_info (`url.URLInfo`) – A URLInfo.
Returns:	bool

class wpull.document.base.BaseExtractiveReader[source]¶

Bases: object

Base class for document readers that can only extract links.

iter_links(file, encoding=None)[source]¶

Return links from file.

Returns:	Each item is a str which represents a link.
Return type:	iterator

class wpull.document.base.BaseHTMLReader[source]¶

Bases: object

Base class for document readers for handling SGML-like documents.

iter_elements(file, encoding=None)[source]¶

Return an iterator of elements found in the document.

Parameters:	file – A file object containing the document. encoding (str) – The encoding of the document.
Returns:	Each item is an element from `document.htmlparse.element`
Return type:	iterator

class wpull.document.base.BaseTextStreamReader[source]¶

Bases: object

Base class for document readers that filters link and non-link text.

iter_links(file, encoding=None, context=False)[source]¶

Return the links.

This function is a convenience function for calling iter_text() and returning only the links.

iter_text(file, encoding=None)[source]¶

Return the file text and links.

Parameters:

file – A file object containing the document.
encoding (str) – The encoding of the document.

Returns:

Each item is a tuple:

str: The text
bool (or truthy value): Whether the text is a likely a link. If truthy value may be provided containing additional context of the link.

Return type:

iterator

The links returned are raw text and will require further processing.

wpull.document.base.VeryFalse = <wpull.document.base.VeryFalseType object>¶: Document is not definitely supported.

class wpull.document.base.VeryFalseType[source]¶: Bases: object

document.base Module¶

`document.base` Module¶