path Module

File names and paths.

class wpull.path.BasePathNamer[source]

Bases: object

Base class for path namers.

get_filename(url_info)[source]

Return the appropriate filename based on given URLInfo.

class wpull.path.PathNamer(root, index='index.html', use_dir=False, cut=None, protocol=False, hostname=False, os_type='unix', no_control=True, ascii_only=True, case=None, max_filename_length=None)[source]

Bases: wpull.path.BasePathNamer

Path namer that creates a directory hierarchy based on the URL.

Parameters:
  • root (str) – The base path.
  • index (str) – The filename to use when the URL path does not indicate one.
  • use_dir (bool) – Include directories based on the URL path.
  • cut (int) – Number of leading directories to cut from the file path.
  • protocol (bool) – Include the URL scheme in the directory structure.
  • hostname (bool) – Include the hostname in the directory structure.
  • safe_filename_args (dict) – Keyword arguments for safe_filename.

See also: url_to_filename(), url_to_dir_path(), safe_filename().

get_filename(url_info)[source]
safe_filename(part)[source]

Return a safe filename or file part.

class wpull.path.PercentEncoder(unix=False, control=False, windows=False, ascii_=False)[source]

Bases: collections.defaultdict

Percent encoder.

quote(bytes_string)[source]
wpull.path.anti_clobber_dir_path(dir_path, suffix='.d')[source]

Return a directory path free of filenames.

Parameters:
  • dir_path (str) – A directory path.
  • suffix (str) – The suffix to append to the part of the path that is a file.
Returns:

str

wpull.path.parse_content_disposition(text)[source]

Parse a Content-Disposition header value.

wpull.path.safe_filename(filename, os_type='unix', no_control=True, ascii_only=True, case=None, encoding='utf8', max_length=None)[source]

Return a safe filename or path part.

Parameters:
  • filename (str) – The filename or path component.
  • os_type (str) – If unix, escape the slash. If windows, escape extra Windows characters.
  • no_control (bool) – If True, escape control characters.
  • ascii_only (bool) – If True, escape non-ASCII characters.
  • case (str) – If lower, lowercase the string. If upper, uppercase the string.
  • encoding (str) – The character encoding.
  • max_length (int) – The maximum length of the filename.

This function assumes that filename has not already been percent-encoded.

Returns:str
wpull.path.url_to_dir_parts(url, include_protocol=False, include_hostname=False, alt_char=False)[source]

Return a list of directory parts from a URL.

Parameters:
  • url (str) – The URL.
  • include_protocol (bool) – If True, the scheme from the URL will be included.
  • include_hostname (bool) – If True, the hostname from the URL will be included.
  • alt_char (bool) – If True, the character for the port deliminator will be + intead of :.

This function does not include the filename and the paths are not sanitized.

Returns:list
wpull.path.url_to_filename(url, index='index.html', alt_char=False)[source]

Return a filename from a URL.

Parameters:
  • url (str) – The URL.
  • index (str) – If a filename could not be derived from the URL path, use index instead. For example, /images/ will return index.html.
  • alt_char (bool) – If True, the character for the query deliminator will be @ intead of ?.

This function does not include the directories and does not sanitize the filename.

Returns:str