Wget-compatible web downloader and crawler.

usage: wpull [-h] [-V] [--plugin-script FILE] [--plugin-args PLUGIN_ARGS]
             [--database FILE | --database-uri URI] [--concurrent N]
             [--debug-console-port PORT] [--debug-manhole]
             [--ignore-fatal-errors] [--monitor-disk MONITOR_DISK]
             [--monitor-memory MONITOR_MEMORY] [-o FILE | -a FILE]
             [-d | -v | -nv | -q | -qq] [--ascii-print]
             [--report-speed TYPE={bits}] [-i FILE] [-F] [-B URL]
             [--http-proxy HTTP_PROXY] [--https-proxy HTTPS_PROXY]
             [--proxy-user USER] [--proxy-password PASS] [--no-proxy]
             [--proxy-domains LIST] [--proxy-exclude-domains LIST]
             [--proxy-hostnames LIST] [--proxy-exclude-hostnames LIST]
             [-t NUMBER] [--retry-connrefused] [--retry-dns-error] [-O FILE]
             [-nc] [-c] [--progress TYPE={bar,dot,none}] [-N]
             [--no-use-server-timestamps] [-S] [-T SECONDS]
             [--dns-timeout SECS] [--connect-timeout SECS]
             [--read-timeout SECS] [--session-timeout SECS] [-w SECONDS]
             [--waitretry SECONDS] [--random-wait] [-Q NUMBER]
             [--bind-address ADDRESS] [--limit-rate RATE] [--no-dns-cache]
             [--rotate-dns] [--no-skip-getaddrinfo]
             [--restrict-file-names MODES=<ascii,lower,nocontrol,unix,upper,windows>]
             [-4 | -6 | --prefer-family FAMILY={IPv4,IPv6,none}] [--user USER]
             [--password PASSWORD] [--no-iri] [--local-encoding ENC]
             [--remote-encoding ENC] [--max-filename-length NUMBER] [-nd | -x]
             [-nH] [--protocol-directories] [-P PREFIX] [--cut-dirs NUMBER]
             [--http-user HTTP_USER] [--http-password HTTP_PASSWORD]
             [--no-cache] [--default-page NAME] [-E] [--ignore-length]
             [--header STRING] [--max-redirect NUMBER] [--referer URL]
             [--save-headers] [-U AGENT] [--no-robots] [--no-http-keep-alive]
             [--no-cookies] [--load-cookies FILE] [--save-cookies FILE]
             [--keep-session-cookies] [--post-data STRING | --post-file FILE]
             [--content-disposition] [--content-on-error] [--http-compression]
             [--html-parser {html5lib,libxml2-lxml}]
             [--link-extractors <css,html,javascript>] [--escaped-fragment]
             [--secure-protocol PR={SSLv3,TLSv1,TLSv1.1,TLSv1.2,auto}]
             [--https-only] [--no-check-certificate] [--no-strong-crypto]
             [--certificate FILE] [--certificate-type TYPE={PEM}]
             [--private-key FILE] [--private-key-type TYPE={PEM}]
             [--ca-certificate FILE] [--ca-directory DIR]
             [--no-use-internal-ca-certs] [--random-file FILE]
             [--edg-file FILE] [--ftp-user USER] [--ftp-password PASS]
             [--no-remove-listing] [--no-glob] [--preserve-permissions]
             [--retr-symlinks [{0,1,no,off,on,yes}]] [--warc-file FILENAME]
             [--warc-append] [--warc-header STRING] [--warc-max-size NUMBER]
             [--warc-move DIRECTORY] [--warc-cdx] [--warc-dedup FILE]
             [--no-warc-compression] [--no-warc-digests] [--no-warc-keep-log]
             [--warc-tempdir DIRECTORY] [-r] [-l NUMBER] [--delete-after] [-k]
             [-K] [-p] [--page-requisites-level NUMBER] [--sitemaps] [-A LIST]
             [-R LIST] [--accept-regex REGEX] [--reject-regex REGEX]
             [--regex-type TYPE={pcre}] [-D LIST] [--exclude-domains LIST]
             [--hostnames LIST] [--exclude-hostnames LIST] [--follow-ftp]
             [--follow-tags LIST] [--ignore-tags LIST]
             [-H | --span-hosts-allow LIST=<linked-pages,page-requisites>]
             [-L] [-I LIST] [--trust-server-names] [-X LIST] [-np]
             [--no-strong-redirects] [--proxy-server]
             [--proxy-server-address ADDRESS] [--proxy-server-port PORT]
             [--phantomjs] [--phantomjs-exe PATH]
             [--phantomjs-max-time PHANTOMJS_MAX_TIME]
             [--phantomjs-scroll NUM] [--phantomjs-wait SEC]
             [--no-phantomjs-snapshot] [--no-phantomjs-smart-scroll]
             [--youtube-dl] [--youtube-dl-exe PATH]
             [URL [URL ...]]
Positional arguments:
urls the URL to be downloaded
-V, --version show program’s version number and exit
 load plugin script from FILE
--plugin-args arguments for the plugin
--database save database tables into FILE instead of memory
--database-uri save database tables at SQLAlchemy URI instead of memory
--concurrent run at most N downloads at the same time
 run a web debug console at given port number
 install Manhole debugging socket
 ignore all internal fatal exception errors
--monitor-disk pause if minimum free disk space is exceeded
 pause if minimum free memory is exceeded
-o, --output-file
 write program messages to FILE
-a, --append-output
 append program messages to FILE
-d, --debug print debugging messages
-v, --verbose print informative program messages and detailed progress
-nv, --no-verbose
 print informative program messages and errors
-q, --quiet print program error messages
-qq, --very-quiet
 do not print program messages unless critical
--ascii-print print program messages in ASCII only

print speed in bits only instead of human formatted units

Possible choices: bits

-i, --input-file
 download URLs listed in FILE
-F, --force-html
 read URL input files as HTML files
-B, --base resolves input relative URLs to URL
--http-proxy HTTP proxy for HTTP requests
--https-proxy HTTP proxy for HTTPS requests
--proxy-user username for proxy “basic” authentication
 password for proxy “basic” authentication
--no-proxy disable proxy support
 use proxy only from LIST of hostname suffixes
 don’t use proxy only from LIST of hostname suffixes
 use proxy only from LIST of hostnames
 don’t use proxy only from LIST of hostnames
-t, --tries try NUMBER of times on transient errors
 retry even if the server does not accept connections
 retry even if DNS fails to resolve hostname
-O, --output-document
 stream every document into FILE
-nc, --no-clobber
 don’t use anti-clobbering filenames
-c, --continue resume downloading a partially-downloaded file

choose the type of progress indicator

Possible choices: dot, bar, none

-N, --timestamping
 only download files that are newer than local files
 don’t set the last-modified time on files
-S, --server-response
 print the protocol responses from the server
-T, --timeout set DNS, connect, read timeout options to SECONDS
--dns-timeout timeout after SECS seconds for DNS requests
 timeout after SECS seconds for connection requests
--read-timeout timeout after SECS seconds for reading requests
 timeout after SECS seconds for downloading files
-w, --wait wait SECONDS seconds between requests
--waitretry wait up to SECONDS seconds on retries
--random-wait randomly perturb the time between requests
-Q, --quota stop after downloading NUMBER bytes
--bind-address bind to ADDRESS on the local host
--limit-rate limit download bandwidth to RATE
--no-dns-cache disable caching of DNS lookups
--rotate-dns use different resolved IP addresses on requests
 always use the OS’s name resolver interface

list of safe filename modes to use

Possible choices: windows, lower, unix, ascii, nocontrol, upper

-4, --inet4-only
 connect to IPv4 addresses only
-6, --inet6-only
 connect to IPv6 addresses only

prefer to connect to FAMILY IP addresses

Possible choices: none, IPv6, IPv4

--user username for both FTP and HTTP authentication
--password password for both FTP and HTTP authentication
--no-iri use ASCII encoding only
 use ENC as the encoding of input files and options
 force decoding documents using codec ENC
 limit filename length to NUMBER characters
-nd, --no-directories
 don’t create directories
-x, --force-directories
 always create directories
-nH, --no-host-directories
 don’t create directories for hostnames
 create directories for URL schemes
-P, --directory-prefix
 save everything under the directory PREFIX
--cut-dirs don’t make NUMBER of leading directories
--http-user username for HTTP authentication
 password for HTTP authentication
--no-cache request server to not use cached version of files
--default-page use NAME as index page if not known
-E, --adjust-extension
 append HTML or CSS file extension if needed
 ignore any Content-Length provided by the server
--header adds STRING to the HTTP header
--max-redirect follow only up to NUMBER document redirects
--referer always use URL as the referrer
--save-headers include server header responses in files
-U, --user-agent
 use AGENT instead of Wpull’s user agent
--no-robots ignore robots.txt directives
 disable persistent HTTP connections
--no-cookies disables HTTP cookie support
--load-cookies load Mozilla cookies.txt from FILE
--save-cookies save Mozilla cookies.txt to FILE
 include session cookies when saving cookies to file
--post-data use POST for all requests with query STRING
--post-file use POST for all requests with query in FILE
 use filename given in Content-Disposition header
 keep error pages
 request servers to use HTTP compression

select HTML parsing library and strategy

Possible choices: libxml2-lxml, html5lib


specify which link extractors to use

Possible choices: html, css, javascript

 rewrite links with hash fragments to escaped fragments
 remove session ID tokens from links

specify the version of the SSL protocol to use

Possible choices: SSLv3, TLSv1, TLSv1.1, TLSv1.2, auto

--https-only download only HTTPS URLs
 don’t validate SSL server certificates
 don’t use secure protocols/ciphers
--certificate use FILE containing the local client certificate


Possible choices: PEM

--private-key use FILE containing the local client private key


Possible choices: PEM

 load and use CA certificate bundle from FILE
--ca-directory load and use CA certificates from DIR
 don’t use CA certificates included with Wpull
--random-file use data from FILE to seed the SSL PRNG
--edg-file connect to entropy gathering daemon using socket FILE
--ftp-user username for FTP login
--ftp-password password for FTP login
 keep directory file listings
--no-glob don’t use filename glob patterns on FTP URLs
 apply server’s Unix file permissions on downloaded files

if disabled, preserve symlinks and run with security risks

Possible choices: yes, on, 1, off, no, 0

--warc-file save WARC file to filename prefixed with FILENAME
--warc-append append instead of overwrite the output WARC file
--warc-header include STRING in WARC file metadata
 write sequential WARC files sized about NUMBER bytes
--warc-move move WARC files to DIRECTORY as they complete
--warc-cdx write CDX file along with the WARC file
--warc-dedup write revisit records using digests in FILE
 do not compress the WARC file
 do not compute and save SHA1 hash digests
 do not save a log into the WARC file
--warc-tempdir use temporary DIRECTORY for preparing WARC files
-r, --recursive
 follow links and download them
-l, --level limit recursion depth to NUMBER
--delete-after download files temporarily and delete them after
-k, --convert-links
 rewrite links in files that point to local files
-K, --backup-converted
 save original files before converting their links
-p, --page-requisites
 download objects embedded in pages
 limit page-requisites recursion depth to NUMBER
--sitemaps download Sitemaps to discover more links
-A, --accept download only files with suffix in LIST
-R, --reject don’t download files with suffix in LIST
--accept-regex download only URLs matching REGEX
--reject-regex don’t download URLs matching REGEX

use regex TYPE

Possible choices: pcre

-D, --domains download only from LIST of hostname suffixes
 don’t download from LIST of hostname suffixes
--hostnames download only from LIST of hostnames
 don’t download from LIST of hostnames
--follow-ftp follow links to FTP sites
--follow-tags follow only links contained in LIST of HTML tags
--ignore-tags don’t follow links contained in LIST of HTML tags
-H, --span-hosts
 follow links and page requisites to other hostnames

selectively span hosts for resource types in LIST

Possible choices: linked-pages, page-requisites

-L, --relative follow only relative links
-I, --include-directories
 download only paths in LIST
 use the last given URL for filename during redirects
-X, --exclude-directories
 don’t download paths in LIST
-np, --no-parent
 don’t follow to parent directories on URL path
 don’t implicitly allow span hosts for redirects
--proxy-server run HTTP proxy server for capturing requests
 bind the proxy server to ADDRESS
 bind the proxy server port to PORT
--phantomjs use PhantomJS for loading dynamic pages
 path of PhantomJS executable
 maximum duration of PhantomJS session
 scroll the page up to NUM times
 wait SEC seconds between page interactions
 don’t take dynamic page snapshots
 always scroll the page to maximum scroll count option
--youtube-dl use youtube-dl for downloading videos
 path of youtube-dl executable

Defaults may differ depending on the operating system. Use --help to see them.

This is only a programmatically generated listing from the program. In most cases, you can follow Wget’s documentation options. Wpull will follow Wget’s behavior so please check Wget online documentation and resources before asking questions.