path
Module¶
File names and paths.
-
class
wpull.path.
PathNamer
(root, index='index.html', use_dir=False, cut=None, protocol=False, hostname=False, os_type='unix', no_control=True, ascii_only=True, case=None, max_filename_length=None)[source]¶ Bases:
wpull.path.BasePathNamer
Path namer that creates a directory hierarchy based on the URL.
Parameters: - root (str) – The base path.
- index (str) – The filename to use when the URL path does not indicate one.
- use_dir (bool) – Include directories based on the URL path.
- cut (int) – Number of leading directories to cut from the file path.
- protocol (bool) – Include the URL scheme in the directory structure.
- hostname (bool) – Include the hostname in the directory structure.
- safe_filename_args (dict) – Keyword arguments for safe_filename.
See also:
url_to_filename()
,url_to_dir_path()
,safe_filename()
.
-
class
wpull.path.
PercentEncoder
(unix=False, control=False, windows=False, ascii_=False)[source]¶ Bases:
collections.defaultdict
Percent encoder.
-
wpull.path.
anti_clobber_dir_path
(dir_path, suffix='.d')[source]¶ Return a directory path free of filenames.
Parameters: - dir_path (str) – A directory path.
- suffix (str) – The suffix to append to the part of the path that is a file.
Returns: str
-
wpull.path.
safe_filename
(filename, os_type='unix', no_control=True, ascii_only=True, case=None, encoding='utf8', max_length=None)[source]¶ Return a safe filename or path part.
Parameters: - filename (str) – The filename or path component.
- os_type (str) – If
unix
, escape the slash. Ifwindows
, escape extra Windows characters. - no_control (bool) – If True, escape control characters.
- ascii_only (bool) – If True, escape non-ASCII characters.
- case (str) – If
lower
, lowercase the string. Ifupper
, uppercase the string. - encoding (str) – The character encoding.
- max_length (int) – The maximum length of the filename.
This function assumes that filename has not already been percent-encoded.
Returns: str
-
wpull.path.
url_to_dir_parts
(url, include_protocol=False, include_hostname=False, alt_char=False)[source]¶ Return a list of directory parts from a URL.
Parameters: - url (str) – The URL.
- include_protocol (bool) – If True, the scheme from the URL will be included.
- include_hostname (bool) – If True, the hostname from the URL will be included.
- alt_char (bool) – If True, the character for the port deliminator
will be
+
intead of:
.
This function does not include the filename and the paths are not sanitized.
Returns: list
-
wpull.path.
url_to_filename
(url, index='index.html', alt_char=False)[source]¶ Return a filename from a URL.
Parameters: - url (str) – The URL.
- index (str) – If a filename could not be derived from the URL path,
use index instead. For example,
/images/
will returnindex.html
. - alt_char (bool) – If True, the character for the query deliminator
will be
@
intead of?
.
This function does not include the directories and does not sanitize the filename.
Returns: str