protocol.http.robots Module¶
Robots.txt file logistics.
-
exception
wpull.protocol.http.robots.NotInPoolError[source]¶ Bases:
ExceptionThe URL is not in the pool.
-
class
wpull.protocol.http.robots.RobotsTxtChecker(web_client: wpull.protocol.http.web.WebClient=None, robots_txt_pool: wpull.robotstxt.RobotsTxtPool=None)[source]¶ Bases:
objectRobots.txt file fetcher and checker.
Parameters: - web_client – Web Client.
- robots_txt_pool – Robots.txt Pool.
-
can_fetch(request: wpull.protocol.http.request.Request, file=None) → bool[source]¶ Return whether the request can fetched.
Parameters: - request – Request.
- file – A file object to where the robots.txt contents are written.
Coroutine.
-
can_fetch_pool(request: wpull.protocol.http.request.Request)[source]¶ Return whether the request can be fetched based on the pool.
-
fetch_robots_txt(request: wpull.protocol.http.request.Request, file=None)[source]¶ Fetch the robots.txt file for the request.
Coroutine.
-
robots_txt_pool¶ Return the RobotsTxtPool.
-
web_client¶ Return the WebClient.