converter Module

Document content post-processing.

class wpull.converter.BaseDocumentConverter[source]

Bases: object

Base class for classes that convert links within a document.

convert(input_filename, output_filename, base_url=None)[source]
class wpull.converter.BatchDocumentConverter(html_parser, element_walker, url_table, backup=False)[source]

Bases: object

Convert all documents in URL table.

Parameters:
  • url_table – An instance of database.URLTable.
  • backup (bool) – Whether back up files are created.
convert_all()[source]

Convert all links in URL table.

convert_by_record(url_record)[source]

Convert using given URL Record.

class wpull.converter.CSSConverter(url_table)[source]

Bases: wpull.scraper.css.CSSScraper, wpull.converter.BaseDocumentConverter

CSS converter.

convert(input_filename, output_filename, base_url=None)[source]
convert_text(text, base_url=None)[source]
get_new_url(url, base_url=None)[source]
class wpull.converter.HTMLConverter(html_parser, element_walker, url_table)[source]

Bases: wpull.scraper.html.HTMLScraper, wpull.converter.BaseDocumentConverter

HTML converter.

convert(input_filename, output_filename, base_url=None)[source]