Plugin Scripting Hooks¶
Wpull’s scripting support is modelled after alard’s Wget with Lua hooks.
Scripts are installed using the YAPSY plugin architecture. To create your plugin
script, subclass wpull.application.plugin.WpullPlugin
and
load it with --plugin-script
option.
The plugin interface provides two type of callbacks: hooks and events.
Hook¶
Hooks change the behavior of the program. When the callback is
registered to the hook, it is required to provide a return value
typically one of wpull.application.hook.Actions
. Only
one callback may be registered to a hook.
To register your callback, decorate your callback with
wpull.application.plugin.hook()
.
Event¶
Events are points in the program that occur and are notified to registered listeners.
To register your callback, decorate your callback with
wpull.application.plugin.event()
.
Interfaces¶
The global hooks and events constants are located at
wpull.application.plugin.PluginFunctions
.
PluginFunctions.accept_url
- hook Interface:
FetchRule.plugin_accept_url
PluginFunctions.dequeued_url
- event Interface:
URLTableHookWrapper.dequeued_url
PluginFunctions.exit_status
- hook Interface:
AppStopTask.plugin_exit_status
PluginFunctions.finishing_statistics
- event Interface:
StatsStopTask.plugin_finishing_statistics
PluginFunctions.get_urls
- event Interface:
ProcessingRule.plugin_get_urls
PluginFunctions.handle_error
- hook Interface:
ResultRule.plugin_handle_error
PluginFunctions.handle_pre_response
- hook Interface:
ResultRule.plugin_handle_pre_response
PluginFunctions.handle_response
- hook Interface:
ResultRule.plugin_handle_response
PluginFunctions.queued_url
- event Interface:
URLTableHookWrapper.queued_url
PluginFunctions.resolve_dns
- hook Interface:
Resolver.resolve_dns
PluginFunctions.resolve_dns_result
- event Interface:
Resolver.resolve_dns_result
PluginFunctions.wait_time
- hook Interface:
ResultRule.plugin_wait_time
Example¶
Here is a example Python script. It
- Prints hello on start up
- Refuses to download anything with the word “dog” in the URL
- Scrapes URLs on a hypothetical homepage
- Stops the program execution when the server returns HTTP 429
import datetime
import re
from wpull.application.hook import Actions
from wpull.application.plugin import WpullPlugin, PluginFunctions, hook
from wpull.protocol.abstract.request import BaseResponse
from wpull.pipeline.session import ItemSession
class MyExamplePlugin(WpullPlugin):
def activate(self):
super().activate()
print('Hello world!')
def deactivate(self):
super().deactivate()
print('Goodbye world!')
@hook(PluginFunctions.accept_url)
def my_accept_func(self, item_session: ItemSession, verdict: bool, reasons: dict) -> bool:
return 'dog' not in item_session.request.url
@event(PluginFunctions.get_urls)
def my_get_urls(self, item_session: ItemSession):
if item_session.request.url_info.path != '/':
return
matches = re.finditer(
r'<div id="profile-(\w+)"', item_session.response.body.content
)
for match in matches:
url = 'http://example.com/profile.php?username={}'.format(
match.group(1)
)
item_session.add_child_url(url)
@hook(PluginFunctions.handle_response)
def my_handle_response(item_session: ItemSession):
if item_session.response.response_code == 429:
return Actions.STOP