API Reference¶
DataSluice — one Python interface for open-data discovery, extraction, format normalization, and pipeline integration.
AdapterError
¶
Bases: DataSluiceError
Raised when an adapter cannot fulfil a request.
AdapterNotFoundError
¶
Bases: AdapterError
Raised when no adapter is registered for a portal type.
AuthenticationError
¶
Bases: DataSluiceError
Raised when authentication credentials are missing or invalid.
ChecksumMismatchError
¶
Bases: DownloadError
Raised when a downloaded file's checksum does not match.
Source code in src/datasluice/exceptions.py
ConfigError
¶
Bases: DataSluiceError
Raised when configuration is invalid or incomplete.
DataSluice
¶
Unified client for open-data portals.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
portal_url
|
str
|
Base URL of the open-data portal. |
required |
portal_type
|
str | None
|
Optional explicit portal type (e.g. |
None
|
auth
|
BaseAuth | None
|
Optional authentication strategy. |
None
|
settings
|
Settings | None
|
Optional pre-loaded settings. Loaded from the environment when omitted. |
None
|
transport
|
HttpClient | None
|
Optional pre-configured HTTP client. |
None
|
Example
from datasluice import DataSluice ds = DataSluice("https://catalog.data.gov") results = ds.search("climate change") for dataset in results: ... print(dataset.title)
Source code in src/datasluice/client.py
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 | |
downloader
property
¶
Lazily-initialised downloader.
download(resource, dest=None, **kwargs)
¶
download_all(dataset, dest)
¶
get_dataset(dataset_id)
¶
get_organization(organization_id)
¶
list_resources(dataset_id)
¶
read(resource)
¶
Download and parse a resource into a list of record dicts.
Source code in src/datasluice/client.py
search(query=None, **kwargs)
¶
Search for datasets.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str | Query | None
|
Search text or a :class: |
None
|
**kwargs
|
Any
|
Additional :class: |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
A |
SearchResult
|
class: |
Source code in src/datasluice/client.py
DataSluiceError
¶
Dataset
dataclass
¶
A dataset is a logical grouping of one or more resources.
Attributes:
| Name | Type | Description |
|---|---|---|
id |
str
|
Portal-native dataset identifier. |
title |
str | None
|
Human-readable dataset title. |
name |
str | None
|
Machine-friendly slug or name. |
description |
str | None
|
Longer free-text description (may contain Markdown/HTML). |
resources |
list[Resource]
|
List of downloadable resources within this dataset. |
organization |
Organization | None
|
Publishing organization, if known. |
license |
License | None
|
Default license for resources in this dataset. |
tags |
list[str]
|
Free-form tags or keywords. |
themes |
list[str]
|
Categorization themes or groups. |
language |
list[str]
|
ISO language code(s) for the data. |
created |
str | None
|
ISO-8601 creation timestamp. |
modified |
str | None
|
ISO-8601 last-modified timestamp. |
metadata_modified |
str | None
|
ISO-8601 timestamp of last metadata change. |
url |
str | None
|
Canonical URL to the dataset on the portal. |
extra |
dict[str, Any]
|
Portal-native fields not captured above. |
Source code in src/datasluice/domain/dataset.py
DownloadError
¶
Bases: DataSluiceError
Raised when a resource download fails.
FormatError
¶
Bases: DataSluiceError
Raised when a resource cannot be parsed in the expected format.
License
dataclass
¶
A license under which an open-data resource or dataset is published.
Attributes:
| Name | Type | Description |
|---|---|---|
id |
str
|
Canonical license identifier (e.g. |
title |
str | None
|
Human-readable license name. |
url |
str | None
|
URL to the full license text. |
Source code in src/datasluice/domain/license.py
NotFoundError
¶
Bases: PortalError
Raised when a requested dataset or resource does not exist.
Organization
dataclass
¶
An organization or publisher of open-data datasets.
Attributes:
| Name | Type | Description |
|---|---|---|
id |
str
|
Portal-native organization identifier. |
name |
str | None
|
Display name of the organization. |
title |
str | None
|
Alternative human-readable title. |
description |
str | None
|
Longer description, if available. |
url |
str | None
|
URL to the organization's page on the portal. |
logo_url |
str | None
|
URL to the organization's logo image. |
created |
str | None
|
ISO-8601 creation timestamp, if available. |
extra |
dict[str, Any]
|
Portal-native fields not captured above. |
Source code in src/datasluice/domain/organization.py
PortalDetectionError
¶
Bases: DataSluiceError
Raised when the portal type cannot be auto-detected.
PortalError
¶
Bases: DataSluiceError
Raised when a portal returns an error or is unreachable.
Query
dataclass
¶
Portal-agnostic search parameters.
Attributes:
| Name | Type | Description |
|---|---|---|
text |
str | None
|
Free-text search query. |
tags |
list[str]
|
Filter by one or more tags. |
organizations |
list[str]
|
Filter by organization name(s). |
groups |
list[str]
|
Filter by group or theme name(s). |
res_format |
str | None
|
Filter by resource format (e.g. |
license_id |
str | None
|
Filter by license identifier. |
sort |
str | None
|
Sort field and direction (e.g. |
limit |
int
|
Maximum number of results to return. |
offset |
int
|
Number of results to skip (for pagination). |
Source code in src/datasluice/domain/query.py
RateLimitError
¶
Bases: PortalError
Raised when the portal rate-limits requests.
Source code in src/datasluice/exceptions.py
Resource
dataclass
¶
A single downloadable resource (file) within a dataset.
Attributes:
| Name | Type | Description |
|---|---|---|
id |
str
|
Portal-native resource identifier. |
name |
str | None
|
Human-readable resource name or title. |
url |
str | None
|
Direct download URL. |
format |
str | None
|
Canonical file format (e.g. |
media_type |
str | None
|
IANA media type if known (e.g. |
description |
str | None
|
Optional longer description. |
size |
int | None
|
File size in bytes, if known. |
license |
License | None
|
License under which this resource is published. |
created |
str | None
|
ISO-8601 creation timestamp, if available. |
modified |
str | None
|
ISO-8601 last-modified timestamp, if available. |
extra |
dict[str, Any]
|
Portal-native fields not captured above. |
Source code in src/datasluice/domain/resource.py
normalize_format(raw)
classmethod
¶
Normalise a raw format string or media type to canonical form.
SearchResult
dataclass
¶
A paginated page of search results.
Attributes:
| Name | Type | Description |
|---|---|---|
datasets |
list[Dataset]
|
Datasets returned in this page. |
total |
int
|
Total number of matching datasets across all pages. |
page |
int
|
Current page number (1-based). |
page_size |
int
|
Number of results per page. |
has_next |
bool
|
Whether additional pages are available. |