writer Module

Document writers.

class wpull.writer.AntiClobberFileWriter(path_namer: wpull.path.PathNamer, file_continuing: bool=False, headers_included: bool=False, local_timestamping: bool=True, adjust_extension: bool=False, content_disposition: bool=False, trust_server_names: bool=False)[source]

Bases: wpull.writer.BaseFileWriter

File writer that downloads to a new filename if the original exists.

session_class
class wpull.writer.AntiClobberFileWriterSession(path_namer: wpull.path.PathNamer, file_continuing: bool, headers_included: bool, local_timestamping: bool, adjust_extension: bool, content_disposition: bool, trust_server_names: bool)[source]

Bases: wpull.writer.BaseFileWriterSession

class wpull.writer.BaseFileWriter(path_namer: wpull.path.PathNamer, file_continuing: bool=False, headers_included: bool=False, local_timestamping: bool=True, adjust_extension: bool=False, content_disposition: bool=False, trust_server_names: bool=False)[source]

Bases: wpull.writer.BaseWriter

Base class for saving documents to disk.

Parameters:
  • path_namer – The path namer.
  • file_continuing – If True, the writer will modify requests to fetch the remaining portion of the file
  • headers_included – If True, the writer will include the HTTP header responses on top of the document
  • local_timestamping – If True, the writer will set the Last-Modified timestamp on downloaded files
  • adjust_extension – If True, HTML or CSS file extension will be added whenever it is detected as so.
  • content_disposition – If True, the filename is extracted from the Content-Disposition header.
  • trust_server_names – If True and there is redirection, use the last given response for the filename.
session() → wpull.writer.BaseFileWriterSession[source]

Return the File Writer Session.

session_class

Return the class of File Writer Session.

This should be overridden by subclasses.

class wpull.writer.BaseFileWriterSession(path_namer: wpull.path.PathNamer, file_continuing: bool, headers_included: bool, local_timestamping: bool, adjust_extension: bool, content_disposition: bool, trust_server_names: bool)[source]

Bases: wpull.writer.BaseWriterSession

Base class for File Writer Sessions.

discard_document(response: wpull.protocol.abstract.request.BaseResponse)[source]
extra_resource_path(suffix: str) → str[source]
classmethod open_file(filename: str, response: wpull.protocol.abstract.request.BaseResponse, mode='wb+')[source]

Open a file object on to the Response Body.

Parameters:
  • filename – The path where the file is to be saved
  • response – Response
  • mode – The file mode

This function will create the directories if not exist.

process_request(request: wpull.protocol.abstract.request.BaseRequest)[source]
process_response(response: wpull.protocol.abstract.request.BaseResponse)[source]
save_document(response: wpull.protocol.abstract.request.BaseResponse)[source]
classmethod save_headers(filename: str, response: wpull.protocol.http.request.Response)[source]

Prepend the HTTP response header to the file.

Parameters:
  • filename – The path of the file
  • response – Response
classmethod set_timestamp(filename: str, response: wpull.protocol.http.request.Response)[source]

Set the Last-Modified timestamp onto the given file.

Parameters:
  • filename – The path of the file
  • response – Response
class wpull.writer.BaseWriter[source]

Bases: object

Base class for document writers.

session() → 'BaseWriterSession'[source]

Return a session for a document.

class wpull.writer.BaseWriterSession[source]

Bases: object

Base class for a single document to be written.

discard_document(response: wpull.protocol.abstract.request.BaseResponse)[source]

Don’t save the document.

This function is called by a Processor once the Processor deemed the document should be deleted (i.e., a “404 Not Found” response).

extra_resource_path(suffix: str) → typing.Union[source]

Return a filename suitable for saving extra resources.

process_request(request: wpull.protocol.abstract.request.BaseRequest) → wpull.protocol.abstract.request.BaseRequest[source]

Rewrite the request if needed.

This function is called by a Processor after it has created the Request, but before submitting it to a Client.

Returns:The original Request or a modified Request
process_response(response: wpull.protocol.abstract.request.BaseResponse)[source]

Do any processing using the given response if needed.

This function is called by a Processor before any response or error handling is done.

save_document(response: wpull.protocol.abstract.request.BaseResponse) → str[source]

Process and save the document.

This function is called by a Processor once the Processor deemed the document should be saved (i.e., a “200 OK” response).

Returns:The filename of the document.
class wpull.writer.IgnoreFileWriter(path_namer: wpull.path.PathNamer, file_continuing: bool=False, headers_included: bool=False, local_timestamping: bool=True, adjust_extension: bool=False, content_disposition: bool=False, trust_server_names: bool=False)[source]

Bases: wpull.writer.BaseFileWriter

File writer that ignores files that already exist.

session_class
class wpull.writer.IgnoreFileWriterSession(path_namer: wpull.path.PathNamer, file_continuing: bool, headers_included: bool, local_timestamping: bool, adjust_extension: bool, content_disposition: bool, trust_server_names: bool)[source]

Bases: wpull.writer.BaseFileWriterSession

process_request(request)[source]
class wpull.writer.MuxBody(stream: typing.BinaryIO, **kwargs)[source]

Bases: wpull.body.Body

Writes data into a second file.

close()[source]
flush()[source]
write(data: bytes) → int[source]
writelines(lines)[source]
class wpull.writer.NullWriter[source]

Bases: wpull.writer.BaseWriter

File writer that doesn’t write files.

session() → wpull.writer.NullWriterSession[source]
class wpull.writer.NullWriterSession[source]

Bases: wpull.writer.BaseWriterSession

discard_document(response)[source]
extra_resource_path(suffix)[source]
process_request(request)[source]
process_response(response)[source]
save_document(response)[source]
class wpull.writer.OverwriteFileWriter(path_namer: wpull.path.PathNamer, file_continuing: bool=False, headers_included: bool=False, local_timestamping: bool=True, adjust_extension: bool=False, content_disposition: bool=False, trust_server_names: bool=False)[source]

Bases: wpull.writer.BaseFileWriter

File writer that overwrites files.

session_class
class wpull.writer.OverwriteFileWriterSession(path_namer: wpull.path.PathNamer, file_continuing: bool, headers_included: bool, local_timestamping: bool, adjust_extension: bool, content_disposition: bool, trust_server_names: bool)[source]

Bases: wpull.writer.BaseFileWriterSession

class wpull.writer.SingleDocumentWriter(stream: typing.BinaryIO, headers_included: bool=False)[source]

Bases: wpull.writer.BaseWriter

Writer that writes all the data into a single file.

session() → wpull.writer.SingleDocumentWriterSession[source]
class wpull.writer.SingleDocumentWriterSession(stream: typing.BinaryIO, headers_included: bool)[source]

Bases: wpull.writer.BaseWriterSession

Write all data into stream.

discard_document(response)[source]
extra_resource_path(suffix)[source]
process_request(request)[source]
process_response(response: wpull.protocol.abstract.request.BaseResponse)[source]
save_document(response)[source]
class wpull.writer.TimestampingFileWriter(path_namer: wpull.path.PathNamer, file_continuing: bool=False, headers_included: bool=False, local_timestamping: bool=True, adjust_extension: bool=False, content_disposition: bool=False, trust_server_names: bool=False)[source]

Bases: wpull.writer.BaseFileWriter

File writer that only downloads newer files from the server.

session_class
class wpull.writer.TimestampingFileWriterSession(path_namer: wpull.path.PathNamer, file_continuing: bool, headers_included: bool, local_timestamping: bool, adjust_extension: bool, content_disposition: bool, trust_server_names: bool)[source]

Bases: wpull.writer.BaseFileWriterSession

process_request(request: wpull.protocol.abstract.request.BaseRequest)[source]