protocol.http.robots
Module¶
Robots.txt file logistics.
-
exception
wpull.protocol.http.robots.
NotInPoolError
[source]¶ Bases:
Exception
The URL is not in the pool.
-
class
wpull.protocol.http.robots.
RobotsTxtChecker
(web_client: wpull.protocol.http.web.WebClient=None, robots_txt_pool: wpull.robotstxt.RobotsTxtPool=None)[source]¶ Bases:
object
Robots.txt file fetcher and checker.
Parameters: - web_client – Web Client.
- robots_txt_pool – Robots.txt Pool.
-
can_fetch
(request: wpull.protocol.http.request.Request, file=None) → bool[source]¶ Return whether the request can fetched.
Parameters: - request – Request.
- file – A file object to where the robots.txt contents are written.
Coroutine.
-
can_fetch_pool
(request: wpull.protocol.http.request.Request)[source]¶ Return whether the request can be fetched based on the pool.
-
fetch_robots_txt
(request: wpull.protocol.http.request.Request, file=None)[source]¶ Fetch the robots.txt file for the request.
Coroutine.
-
robots_txt_pool
¶ Return the RobotsTxtPool.
-
web_client
¶ Return the WebClient.