Introduction

What is TheirStackProduct updatesBlogAPI Reference

Data

Job Data
Technographic Data
Company Data

Product

App
Webhooks
Datasets
Accessing your datasets

Pricing and Billing

Pricing
Affiliate Program

Integrations

ClayMake

Guides

Adding a technology or job filter to your company searchBackfilling a job boardHow to fetch jobs periodically using the Jobs APIHow to Choose the Best Way to Access TheirStack DataHow to find old jobsHow to identify companies with problems your software solvesOutreach companies actively hiringHow to send a slack message for every new job foundMonitoring open jobs from current and past customersIntegration guide for sales intelligence software

Other

Users and Teams
TheirStack TheirStack Logo

Docs

Datasets

Accessing your datasets

Learn the different mechanisms to access the datasets

There are two ways to access your datasets:

  • Direct download URL — Quick access to individual dataset files
  • S3 Bucket access — Full access to explore the entire bucket and historical datasets

Looking for information about dataset types and what's available? Learn more about our Jobs and Technographics datasets.

Direct download URL

Use direct download URLs when you need quick access to a specific dataset file without setting up programmatic access. This is the most common and straightforward way to access your datasets.

In the TheirStack App, you can see the list of available datasets in the datasets section. There you'll find all the different datasets available for you, along with a direct download button for each one.

Datasets page

This is the simplest way to get a specific dataset file. Just click the download button, and you'll get a direct link to download the file.

S3 bucket access

If you want to explore the entire bucket of datasets available and access the historical archive of published datasets, you can gain direct access to our S3 bucket. This gives you more flexibility to browse, list, and download multiple files programmatically.

This process consists of several steps:

  1. Obtaining the temporary credentials
  2. Connecting to the S3 bucket
  3. Downloading the files

Obtaining temporary S3 credentials

To get access to the datasets bucket, you need to make a request to the POST /v1/datasets/credentials endpoint.

curl -X POST "https://api.theirstack.com/v1/datasets/credentials" \
  -H "accept: application/json" \
  -H "Authorization: Bearer <your_token>"

This endpoint will return a JSON object with the following fields:

  • access_key_id — The access key ID for the temporary credentials
  • secret_access_key — The secret access key for the temporary credentials
  • session_token — The session token for the temporary credentials
  • expiration — The expiration date and time of the temporary credentials (in ISO 8601 format)
  • storage — Storage configuration object containing:
    • bucket_name — The S3 bucket name where the datasets are stored
    • endpoint_url — The S3-compatible endpoint URL to use for accessing the bucket
    • prefixes — One or more prefixes you must include when accessing objects

Example response:

{
    "access_key_id": "a3f8b2c9d1e4f5a6b7c8d9e0f1a2b3c4",
    "secret_access_key": "7e9a2b4c6d8e0f1a3b5c7d9e1f3a5b7c9d1e3f5a",
    "session_token": "ZXlKaGJHY2lPaUpTVXpJMU5pSXNJblI1Y0NJNklrcFhWQ0o5LmV5SmlkV05yWlhRa",
    "expiration": "2025-01-15T14:30:00Z",
    "storage": {
        "bucket_name": "datasets",
        "endpoint_url": "https://example-datasets-url.com",
        "prefixes": ["jobs/daily"]
    }
}

Note: These credentials are temporary and will expire at the time specified in the expiration field. You'll need to request new credentials once they expire.

Connecting to the S3 bucket

Once you have the temporary credentials, you can connect to the S3 bucket using the AWS S3-compatible API. The response includes the endpoint_url, bucket_name, and the allowed prefixes in the storage object, which you should use when configuring your client and building object keys.

Here's how to set up a client using the values from the API response:

import boto3
import requests

# Get credentials from the API
response = requests.post(
    "https://api.theirstack.com/v1/datasets/credentials",
    headers={"Authorization": "Bearer <your_token>"}
)
response.raise_for_status()
credentials = response.json()

# Use the endpoint_url and bucket_name from the response
client = boto3.client(
    "s3",
    endpoint_url=credentials["storage"]["endpoint_url"],
    aws_access_key_id=credentials["access_key_id"],
    aws_secret_access_key=credentials["secret_access_key"],
    aws_session_token=credentials["session_token"],
)

Listing files

The credentials response includes one or more allowed prefixes. Always include one of those prefixes in every S3 key you access (for example, customer-1234/). First, list objects for each allowed prefix, then download the keys you need:

for prefix in credentials["storage"]["prefixes"]:
    response = client.list_objects_v2(
        Bucket=credentials["storage"]["bucket_name"],
        Prefix=prefix,
    )

    for obj in response.get('Contents', []):
        print(f"Key: {obj['Key']}, Size: {obj['Size']}")

Downloading files

After you identify the object you want to download, include one of the allowed prefixes in the Key:

prefix = credentials["storage"]["prefixes"][0]  # choose one allowed prefix
key_to_download = f"{prefix}path/to/your/file"

client.download_file(
    Bucket=credentials["storage"]["bucket_name"],
    Key=key_to_download,
    Filename="local_file.csv",  # update to your desired local path
)

How is this guide?

Last updated on

Datasets

Access complete jobs, technographics, and company datasets — delivered as CSV or Parquet, with historical coverage and optional daily updates via S3.

Pricing

Learn more about plans, credit system and repurchase rules.

On this page

Direct download URL
S3 bucket access
Obtaining temporary S3 credentials
Connecting to the S3 bucket
Listing files
Downloading files