robotstxt Module

Robots.txt exclusion directives.

class wpull.robotstxt.RobotsTxtPool[source]

Bases: object

Pool of robots.txt parsers.

can_fetch(url_info: wpull.url.URLInfo, user_agent: str)[source]

Return whether the URL can be fetched.

has_parser(url_info: wpull.url.URLInfo)[source]

Return whether a parser has been created for the URL.

load_robots_txt(url_info: wpull.url.URLInfo, text: str)[source]

Load the robot.txt file.

classmethod url_info_key(url_info: wpull.url.URLInfo) → tuple[source]