protocol.http.robots Module

Robots.txt file logistics.

exception wpull.protocol.http.robots.NotInPoolError[source]

Bases: Exception

The URL is not in the pool.

class wpull.protocol.http.robots.RobotsTxtChecker(web_client: wpull.protocol.http.web.WebClient=None, robots_txt_pool: wpull.robotstxt.RobotsTxtPool=None)[source]

Bases: object

Robots.txt file fetcher and checker.

Parameters:
  • web_client – Web Client.
  • robots_txt_pool – Robots.txt Pool.
can_fetch(request: wpull.protocol.http.request.Request, file=None) → bool[source]

Return whether the request can fetched.

Parameters:
  • request – Request.
  • file – A file object to where the robots.txt contents are written.

Coroutine.

can_fetch_pool(request: wpull.protocol.http.request.Request)[source]

Return whether the request can be fetched based on the pool.

fetch_robots_txt(request: wpull.protocol.http.request.Request, file=None)[source]

Fetch the robots.txt file for the request.

Coroutine.

robots_txt_pool

Return the RobotsTxtPool.

web_client

Return the WebClient.