DatasetsAccessing your datasets

Options

Job DatasetCompany DatasetDictionary
TheirStack TheirStack Logo
DocumentationAPI ReferenceWebhooksDatasets

Accessing your datasets

Learn the different mechanisms to access the datasets

There are two ways to access your datasets:

  • Direct download URL — Quick access to individual dataset files
  • S3 Bucket access — Full access to explore the entire bucket and historical datasets

Looking for information about dataset types and what's available? Learn more about our Jobs and Technographics datasets.

Direct download URL

Use direct download URLs when you need quick access to a specific dataset file without setting up programmatic access. This is the most common and straightforward way to access your datasets.

In the TheirStack App, you can see the list of available datasets in the datasets section. There you'll find all the different datasets available for you, along with a direct download button for each one.

Datasets page

This is the simplest way to get a specific dataset file. Just click the download button, and you'll get a direct link to download the file.

S3 bucket access

If you want to explore the entire bucket of datasets available and access the historical archive of published datasets, you can gain direct access to our S3 bucket. This gives you more flexibility to browse, list, and download multiple files programmatically.

This process consists of several steps:

  1. Obtaining the temporary credentials
  2. Connecting to the S3 bucket
  3. Downloading the files

Obtaining temporary S3 credentials

To get access to the datasets bucket, you need to make a request to the POST /v1/datasets/credentials endpoint.

curl -X POST "https://api.theirstack.com/v1/datasets/credentials" \
  -H "accept: application/json" \
  -H "Authorization: Bearer <your_token>"

This endpoint will return a JSON object with the following fields:

  • access_key_id — The access key ID for the temporary credentials
  • secret_access_key — The secret access key for the temporary credentials
  • session_token — The session token for the temporary credentials
  • expiration — The expiration date and time of the temporary credentials (in ISO 8601 format)
  • storage — Storage configuration object containing:
    • bucket_name — The S3 bucket name where the datasets are stored
    • endpoint_url — The S3-compatible endpoint URL to use for accessing the bucket
    • prefixes — One or more prefixes you must include when accessing objects

Example response:

{
    "access_key_id": "a3f8b2c9d1e4f5a6b7c8d9e0f1a2b3c4",
    "secret_access_key": "7e9a2b4c6d8e0f1a3b5c7d9e1f3a5b7c9d1e3f5a",
    "session_token": "ZXlKaGJHY2lPaUpTVXpJMU5pSXNJblI1Y0NJNklrcFhWQ0o5LmV5SmlkV05yWlhRa",
    "expiration": "2025-01-15T14:30:00Z",
    "storage": {
        "bucket_name": "datasets",
        "endpoint_url": "https://example-datasets-url.com",
        "prefixes": ["jobs/daily"]
    }
}

Note: These credentials are temporary and will expire at the time specified in the expiration field. You'll need to request new credentials once they expire.

Connecting to the S3 bucket

Once you have the temporary credentials, you can connect to the S3 bucket using the AWS S3-compatible API. The response includes the endpoint_url, bucket_name, and the allowed prefixes in the storage object, which you should use when configuring your client and building object keys.

Here's how to set up a client using the values from the API response:

import boto3
import requests

# Get credentials from the API
response = requests.post(
    "https://api.theirstack.com/v1/datasets/credentials",
    headers={"Authorization": "Bearer <your_token>"}
)
response.raise_for_status()
credentials = response.json()

# Use the endpoint_url and bucket_name from the response
client = boto3.client(
    "s3",
    endpoint_url=credentials["storage"]["endpoint_url"],
    aws_access_key_id=credentials["access_key_id"],
    aws_secret_access_key=credentials["secret_access_key"],
    aws_session_token=credentials["session_token"],
)

Listing files

The credentials response includes one or more allowed prefixes. Always include one of those prefixes in every S3 key you access (for example, customer-1234/). First, list objects for each allowed prefix, then download the keys you need:

for prefix in credentials["storage"]["prefixes"]:
    response = client.list_objects_v2(
        Bucket=credentials["storage"]["bucket_name"],
        Prefix=prefix,
    )

    for obj in response.get('Contents', []):
        print(f"Key: {obj['Key']}, Size: {obj['Size']}")

Downloading files

After you identify the object you want to download, include one of the allowed prefixes in the Key:

prefix = credentials["storage"]["prefixes"][0]  # choose one allowed prefix
key_to_download = f"{prefix}path/to/your/file"

client.download_file(
    Bucket=credentials["storage"]["bucket_name"],
    Key=key_to_download,
    Filename="local_file.csv",  # update to your desired local path
)

How is this guide?

Last updated on

Datasets

Access complete jobs, technographics, and company datasets — delivered as CSV or Parquet, with historical coverage and optional daily updates via S3.

Job Dataset

Explore all 60 job dataset fields including column names, data types, descriptions, and fill rates showing data completeness metrics for each field

On this page

Direct download URL
S3 bucket access
Obtaining temporary S3 credentials
Connecting to the S3 bucket
Listing files
Downloading files