Introduction

What is TheirStackProduct updatesBlogAPI Reference

Data

Job Data
Technographic Data

Product

App
Webhooks
Datasets

Pricing and Billing

Affiliate ProgramAuto recharge creditsCreditsPricing FAQs

Integrations

ClayMake

Guides

Adding a technology or job filter to your company searchBackfilling a job boardHow to fetch jobs periodically using the Jobs APIHow to Choose the Best Way to Access TheirStack DataHow to find old jobsHow to get access to datasetsHow to identify companies with problems your software solvesOutreach companies actively hiringHow to send a slack message for every new job foundMonitoring open jobs from current and past customersIntegration guide for sales intelligence software

Other

Users and Teams
TheirStack TheirStack Logo

Docs

How to get access to datasets

Learn how to access TheirStack's S3 datasets bucket using temporary credentials. Complete guide with code examples for Python, Node.js, Java, and ClickHouse to download company and job data efficiently.

Our datasets are stored in an S3 bucket, which allows us to manage the data more efficiently. Instead of a single large file, we split larger datasets—such as job listings—into separate files by day. This guide will show you how to access the S3 bucket and download the datasets.

1. Get the temporary credentials

In order to get access to the datasets bucket, you need to do a GET /v1/datasets/temp-access-credentials.

curl -X GET "https://api.theirstack.com/v1/datasets/temp-access-credentials" \
-H "accept: application/json" \
-H "Authorization: Bearer <your_token>"

This endpoint will return a JSON object with the following fields:

  • access_key_id: The access key id for the temporary credentials
  • secret_access_key: The secret access key for the temporary credentials
  • session_token: The session token for the temporary credentials
  • expiration: The expiration date of the temporary credentials

2. Use the temporary credentials to access the datasets bucket

Python

import boto3

s3 = boto3.client(
    "s3",
    aws_access_key_id="ACCESS_KEY_ID",
    aws_secret_access_key="SECRET_ACCESS_KEY",
    aws_session_token="SESSION_TOKEN",
)

s3 = session.client(
    service_name='s3',
    endpoint_url='ENDPOINT_URL'
)

s3.download_file(
    Bucket="BUCKET_NAME",
    Key="KEY",
    Filename="FILENAME",
)

Node.js

const AWS = require('aws-sdk')

const s3 = new AWS.S3({
    accessKeyId: 'ACCESS_KEY_ID',
    secretAccessKey: 'SECRET_ACCESS_KEY',
    sessionToken: 'SESSION_TOKEN',
    endpoint: 'ENDPOINT_URL',
})

s3.downloadFile({
    Bucket: 'BUCKET_NAME',
    Key: 'KEY',
    Filename: 'FILENAME',
})

Java

import software.amazon.awssdk.services.s3.S3Client;
import software.amazon.awssdk.services.s3.model.GetObjectRequest;
import software.amazon.awssdk.services.s3.model.GetObjectResponse;
import software.amazon.awssdk.services.s3.model.S3Exception;

public class S3Downloader {
    public static void main(String[] args) {
        S3Client s3 = S3Client.builder()
            .endpointOverride(URI.create("ENDPOINT_URL"))
            .credentialsProvider(StaticCredentialsProvider.create(
                AwsBasicCredentials.create("ACCESS_KEY_ID", "SECRET_ACCESS_KEY")
            ))
            .build();

        GetObjectRequest getObjectRequest = GetObjectRequest.builder()
            .bucket("BUCKET_NAME")
            .key("KEY")
            .build();

        GetObjectResponse response = s3.getObject(getObjectRequest, "FILENAME");

        response.getObjectContent().transferTo(System.out);

        s3.close();
    }
}

Clikhouse

SELECT *
FROM s3(
    'ENDPOINT_URL/BUCKET_NAME/KEY',
    'ACCESS_KEY_ID',
    'SECRET_ACCESS_KEY',
    'SESSION_TOKEN',
    'FORMAT'
)

How is this guide?

Last updated on

How to find old jobs

Learn how to find old job postings from any company. Discover two methods: using our company lookup feature to see all jobs from a specific company, or performing a new job search and filtering by date. Old job postings can be a valuable source of information.

How to identify companies with problems your software solves

Learn how to use job postings to discover companies actively hiring for tasks your software automates. Find your ideal customers by analyzing 300M+ job descriptions for specific pain points and manual processes.

On this page

1. Get the temporary credentials
2. Use the temporary credentials to access the datasets bucket
Python
Node.js
Java
Clikhouse