processor.coprocessor.phantomjs Module

PhantomJS page loading and scrolling.

class wpull.processor.coprocessor.phantomjs.PhantomJSCoprocessor(phantomjs_driver_factory: typing.Callable, processing_rule: wpull.processor.rule.ProcessingRule, phantomjs_params: wpull.processor.coprocessor.phantomjs.PhantomJSParamsType, warc_recorder=None, root_path='.')[source]

Bases: object

PhantomJS coprocessor.

Parameters:
  • phantomjs_driver_factory – Callback function that accepts params argument and returns PhantomJSDriver
  • processing_rule – Processing rule.
  • warc_recorder – WARC recorder.
  • root_dir (str) – Root directory path for temp files.
process(item_session: wpull.pipeline.session.ItemSession, request, response, file_writer_session)[source]

Process PhantomJS.

Coroutine.

class wpull.processor.coprocessor.phantomjs.PhantomJSCoprocessorSession(phantomjs_driver_factory, root_path, processing_rule, file_writer_session, request, response, item_session: wpull.pipeline.session.ItemSession, params, warc_recorder)[source]

Bases: object

PhantomJS coprocessor session.

close()[source]

Clean up.

run()[source]
exception wpull.processor.coprocessor.phantomjs.PhantomJSCrashed[source]

Bases: Exception

PhantomJS exited with non-zero code.

wpull.processor.coprocessor.phantomjs.PhantomJSParams

PhantomJS parameters

wpull.processor.coprocessor.phantomjs.snapshot_type

list

File types. Accepted are html, pdf, png, gif.

wpull.processor.coprocessor.phantomjs.wait_time

float

Time between page scrolls.

wpull.processor.coprocessor.phantomjs.num_scrolls

int

Maximum number of scrolls.

wpull.processor.coprocessor.phantomjs.smart_scroll

bool

Whether to stop scrolling if number of requests & responses do not change.

wpull.processor.coprocessor.phantomjs.snapshot

bool

Whether to take snapshot files.

wpull.processor.coprocessor.phantomjs.viewport_size

tuple

Width and height of the page viewport.

wpull.processor.coprocessor.phantomjs.paper_size

tuple

Width and height of the paper size.

wpull.processor.coprocessor.phantomjs.load_time

float

Maximum time to wait for page load.

wpull.processor.coprocessor.phantomjs.custom_headers

dict

Default HTTP headers.

wpull.processor.coprocessor.phantomjs.page_settings

dict

Page settings.

alias of PhantomJSParamsType