database.sqlmodel Module

Database SQLAlchemy model.

wpull.database.sqlmodel.DBBase

alias of Base

class wpull.database.sqlmodel.QueuedURL(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

filename

Local filename of the item.

id
inline_level

Depth of the page requisite object. 0 is the object, 1 is the object’s dependency, etc.

level

Recursive depth of the item. 0 is root, 1 is child of root, etc.

Expected content type of extracted link.

parent_url

A descriptor that presents a read/write view of an object attribute.

parent_url_string
parent_url_string_id

Optional referral URL

post_data

Additional percent-encoded data for POST.

priority

Priority of item.

root_url

A descriptor that presents a read/write view of an object attribute.

root_url_string
root_url_string_id

Optional root URL

status

Status of the completion of the item.

status_code

HTTP status code or FTP rely code.

to_plain() → wpull.pipeline.item.URLRecord[source]
try_count

Number of attempts made in order to process the item.

url

A descriptor that presents a read/write view of an object attribute.

url_string
url_string_id

Target URL to fetch

classmethod watch_urls_inserted(session)[source]
class wpull.database.sqlmodel.URLString(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

Table containing the URL strings.

The URL references this table.

classmethod add_urls(session, urls: typing.Iterable)[source]
id
url
class wpull.database.sqlmodel.WARCVisit(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

Standalone table for --cdx-dedup feature.

classmethod add_visits(session, visits)[source]
classmethod get_revisit_id(session, url, payload_digest)[source]
payload_digest
url
warc_id
class wpull.database.sqlmodel.Hostname(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

hostname
id
class wpull.database.sqlmodel.QueuedFile(**kwargs)[source]

Bases: sqlalchemy.ext.declarative.api.Base

id
queued_url
queued_url_id
status