Accessing your datasets
Learn the different mechanisms to access the datasets
There are two ways to access your datasets:
- Direct download URL — Quick access to individual dataset files
- S3 Bucket access — Full access to explore the entire bucket and historical datasets
Looking for information about dataset types and what's available? Learn more about our Jobs and Technographics datasets.
Direct download URL
Use direct download URLs when you need quick access to a specific dataset file without setting up programmatic access. This is the most common and straightforward way to access your datasets.
In the TheirStack App, you can see the list of available datasets in the datasets section. There you'll find all the different datasets available for you, along with a direct download button for each one.

This is the simplest way to get a specific dataset file. Just click the download button, and you'll get a direct link to download the file.
S3 bucket access
If you want to explore the entire bucket of datasets available and access the historical archive of published datasets, you can gain direct access to our S3 bucket. This gives you more flexibility to browse, list, and download multiple files programmatically.
This process consists of several steps:
- Obtaining the temporary credentials
- Connecting to the S3 bucket
- Downloading the files
Obtaining temporary S3 credentials
To get access to the datasets bucket, you need to make a request to the POST /v1/datasets/credentials endpoint.
curl -X POST "https://api.theirstack.com/v1/datasets/credentials" \
-H "accept: application/json" \
-H "Authorization: Bearer <your_token>"This endpoint will return a JSON object with the following fields:
access_key_id— The access key ID for the temporary credentialssecret_access_key— The secret access key for the temporary credentialssession_token— The session token for the temporary credentialsexpiration— The expiration date and time of the temporary credentials (in ISO 8601 format)storage— Storage configuration object containing:bucket_name— The S3 bucket name where the datasets are storedendpoint_url— The S3-compatible endpoint URL to use for accessing the bucketprefixes— One or more prefixes you must include when accessing objects
Example response:
{
"access_key_id": "a3f8b2c9d1e4f5a6b7c8d9e0f1a2b3c4",
"secret_access_key": "7e9a2b4c6d8e0f1a3b5c7d9e1f3a5b7c9d1e3f5a",
"session_token": "ZXlKaGJHY2lPaUpTVXpJMU5pSXNJblI1Y0NJNklrcFhWQ0o5LmV5SmlkV05yWlhRa",
"expiration": "2025-01-15T14:30:00Z",
"storage": {
"bucket_name": "datasets",
"endpoint_url": "https://example-datasets-url.com",
"prefixes": ["jobs/daily"]
}
}Note: These credentials are temporary and will expire at the time specified in the expiration field. You'll need to request new credentials once they expire.
Connecting to the S3 bucket
Once you have the temporary credentials, you can connect to the S3 bucket using the AWS S3-compatible API. The response includes the endpoint_url, bucket_name, and the allowed prefixes in the storage object, which you should use when configuring your client and building object keys.
Here's how to set up a client using the values from the API response:
import boto3
import requests
# Get credentials from the API
response = requests.post(
"https://api.theirstack.com/v1/datasets/credentials",
headers={"Authorization": "Bearer <your_token>"}
)
response.raise_for_status()
credentials = response.json()
# Use the endpoint_url and bucket_name from the response
client = boto3.client(
"s3",
endpoint_url=credentials["storage"]["endpoint_url"],
aws_access_key_id=credentials["access_key_id"],
aws_secret_access_key=credentials["secret_access_key"],
aws_session_token=credentials["session_token"],
)Listing files
The credentials response includes one or more allowed prefixes. Always include one of those prefixes in every S3 key you access (for example, customer-1234/). First, list objects for each allowed prefix, then download the keys you need:
for prefix in credentials["storage"]["prefixes"]:
response = client.list_objects_v2(
Bucket=credentials["storage"]["bucket_name"],
Prefix=prefix,
)
for obj in response.get('Contents', []):
print(f"Key: {obj['Key']}, Size: {obj['Size']}")Downloading files
After you identify the object you want to download, include one of the allowed prefixes in the Key:
prefix = credentials["storage"]["prefixes"][0] # choose one allowed prefix
key_to_download = f"{prefix}path/to/your/file"
client.download_file(
Bucket=credentials["storage"]["bucket_name"],
Key=key_to_download,
Filename="local_file.csv", # update to your desired local path
)How is this guide?
Last updated on
