pipeline.session
Module¶
-
class
wpull.pipeline.session.
ItemSession
(app_session: wpull.pipeline.app.AppSession, url_record: wpull.pipeline.item.URLRecord)[source]¶ Bases:
object
Item for a URL that needs to processed.
-
add_child_url
(url: str, inline: bool=False, link_type: typing.Union=None, post_data: typing.Union=None, level: typing.Union=None, replace: bool=False)[source]¶ Add links scraped from the document with automatic values.
Parameters: - url – A full URL. (It can’t be a relative path.)
- inline – Whether the URL is an embedded object.
- link_type – Expected link type.
- post_data – URL encoded form data. The request will be made using POST. (Don’t use this to upload files.)
- level – The child depth of this URL.
- replace – Whether to replace the existing entry in the database table so it will be redownloaded again.
This function provides values automatically for:
inline
level
parent
: The referrering page.root
See also
add_url()
.
-
child_url_record
(url: str, inline: bool=False, link_type: typing.Union=None, post_data: typing.Union=None, level: typing.Union=None)[source]¶ Return a child URLRecord.
This function is useful for testing filters before adding to table.
-
is_processed
¶ Return whether the item has been processed.
-
is_virtual
¶
-
request
¶
-
response
¶
-