How to get access to datasets

Our datasets are stored in an S3 bucket, which allows us to manage the data more efficiently. Instead of a single large file, we split larger datasets—such as job listings—into separate files by day. This guide will show you how to access the S3 bucket and download the datasets.

1. Get the temporary credentials

In order to get access to the datasets bucket, you need to do a GET /v1/datasets/temp-access-credentials.

curl -X GET "https://api.theirstack.com/v1/datasets/temp-access-credentials" \
-H "accept: application/json" \
-H "Authorization: Bearer <your_token>"

This endpoint will return a JSON object with the following fields:

access_key_id: The access key id for the temporary credentials
secret_access_key: The secret access key for the temporary credentials
session_token: The session token for the temporary credentials
expiration: The expiration date of the temporary credentials

2. Use the temporary credentials to access the datasets bucket

Python

import boto3

s3 = boto3.client(
    "s3",
    aws_access_key_id="ACCESS_KEY_ID",
    aws_secret_access_key="SECRET_ACCESS_KEY",
    aws_session_token="SESSION_TOKEN",
)

s3 = session.client(
    service_name='s3',
    endpoint_url='ENDPOINT_URL'
)

s3.download_file(
    Bucket="BUCKET_NAME",
    Key="KEY",
    Filename="FILENAME",
)

Node.js

const AWS = require('aws-sdk')

const s3 = new AWS.S3({
    accessKeyId: 'ACCESS_KEY_ID',
    secretAccessKey: 'SECRET_ACCESS_KEY',
    sessionToken: 'SESSION_TOKEN',
    endpoint: 'ENDPOINT_URL',
})

s3.downloadFile({
    Bucket: 'BUCKET_NAME',
    Key: 'KEY',
    Filename: 'FILENAME',
})

Java

import software.amazon.awssdk.services.s3.S3Client;
import software.amazon.awssdk.services.s3.model.GetObjectRequest;
import software.amazon.awssdk.services.s3.model.GetObjectResponse;
import software.amazon.awssdk.services.s3.model.S3Exception;

public class S3Downloader {
    public static void main(String[] args) {
        S3Client s3 = S3Client.builder()
            .endpointOverride(URI.create("ENDPOINT_URL"))
            .credentialsProvider(StaticCredentialsProvider.create(
                AwsBasicCredentials.create("ACCESS_KEY_ID", "SECRET_ACCESS_KEY")
            ))
            .build();

        GetObjectRequest getObjectRequest = GetObjectRequest.builder()
            .bucket("BUCKET_NAME")
            .key("KEY")
            .build();

        GetObjectResponse response = s3.getObject(getObjectRequest, "FILENAME");

        response.getObjectContent().transferTo(System.out);

        s3.close();
    }
}

Clikhouse

SELECT *
FROM s3(
    'ENDPOINT_URL/BUCKET_NAME/KEY',
    'ACCESS_KEY_ID',
    'SECRET_ACCESS_KEY',
    'SESSION_TOKEN',
    'FORMAT'
)

1. Get the temporary credentials

In order to get access to the datasets bucket, you need to do a GET /v1/datasets/temp-access-credentials.

curl -X GET "https://api.theirstack.com/v1/datasets/temp-access-credentials" \
-H "accept: application/json" \
-H "Authorization: Bearer <your_token>"

This endpoint will return a JSON object with the following fields:

access_key_id: The access key id for the temporary credentials

secret_access_key: The secret access key for the temporary credentials

session_token: The session token for the temporary credentials

expiration: The expiration date of the temporary credentials

2. Use the temporary credentials to access the datasets bucket

import boto3

s3 = boto3.client(
    "s3",
    aws_access_key_id="ACCESS_KEY_ID",
    aws_secret_access_key="SECRET_ACCESS_KEY",
    aws_session_token="SESSION_TOKEN",
)

s3 = session.client(
    service_name='s3',
    endpoint_url='ENDPOINT_URL'
)

s3.download_file(
    Bucket="BUCKET_NAME",
    Key="KEY",
    Filename="FILENAME",
)

const AWS = require('aws-sdk')

const s3 = new AWS.S3({
    accessKeyId: 'ACCESS_KEY_ID',
    secretAccessKey: 'SECRET_ACCESS_KEY',
    sessionToken: 'SESSION_TOKEN',
    endpoint: 'ENDPOINT_URL',
})

s3.downloadFile({
    Bucket: 'BUCKET_NAME',
    Key: 'KEY',
    Filename: 'FILENAME',
})

import software.amazon.awssdk.services.s3.S3Client;
import software.amazon.awssdk.services.s3.model.GetObjectRequest;
import software.amazon.awssdk.services.s3.model.GetObjectResponse;
import software.amazon.awssdk.services.s3.model.S3Exception;

public class S3Downloader {
    public static void main(String[] args) {
        S3Client s3 = S3Client.builder()
            .endpointOverride(URI.create("ENDPOINT_URL"))
            .credentialsProvider(StaticCredentialsProvider.create(
                AwsBasicCredentials.create("ACCESS_KEY_ID", "SECRET_ACCESS_KEY")
            ))
            .build();

        GetObjectRequest getObjectRequest = GetObjectRequest.builder()
            .bucket("BUCKET_NAME")
            .key("KEY")
            .build();

        GetObjectResponse response = s3.getObject(getObjectRequest, "FILENAME");

        response.getObjectContent().transferTo(System.out);

        s3.close();
    }
}

SELECT *
FROM s3(
    'ENDPOINT_URL/BUCKET_NAME/KEY',
    'ACCESS_KEY_ID',
    'SECRET_ACCESS_KEY',
    'SESSION_TOKEN',
    'FORMAT'
)

How to get access to datasets

1. Get the temporary credentials

2. Use the temporary credentials to access the datasets bucket

Python

Node.js

Java

Clikhouse

On this page

How to get access to datasets

1. Get the temporary credentials

2. Use the temporary credentials to access the datasets bucket

Python

Node.js

Java

Clikhouse

On this page