How to get access to datasets
Learn how to access TheirStack's S3 datasets bucket using temporary credentials. Complete guide with code examples for Python, Node.js, Java, and ClickHouse to download company and job data efficiently.
Our datasets are stored in an S3 bucket, which allows us to manage the data more efficiently. Instead of a single large file, we split larger datasets—such as job listings—into separate files by day. This guide will show you how to access the S3 bucket and download the datasets.
1. Get the temporary credentials
In order to get access to the datasets bucket, you need to do a GET /v1/datasets/temp-access-credentials.
curl -X GET "https://api.theirstack.com/v1/datasets/temp-access-credentials" \
-H "accept: application/json" \
-H "Authorization: Bearer <your_token>"
This endpoint will return a JSON object with the following fields:
access_key_id
: The access key id for the temporary credentialssecret_access_key
: The secret access key for the temporary credentialssession_token
: The session token for the temporary credentialsexpiration
: The expiration date of the temporary credentials
2. Use the temporary credentials to access the datasets bucket
Python
import boto3
s3 = boto3.client(
"s3",
aws_access_key_id="ACCESS_KEY_ID",
aws_secret_access_key="SECRET_ACCESS_KEY",
aws_session_token="SESSION_TOKEN",
)
s3 = session.client(
service_name='s3',
endpoint_url='ENDPOINT_URL'
)
s3.download_file(
Bucket="BUCKET_NAME",
Key="KEY",
Filename="FILENAME",
)
Node.js
const AWS = require('aws-sdk')
const s3 = new AWS.S3({
accessKeyId: 'ACCESS_KEY_ID',
secretAccessKey: 'SECRET_ACCESS_KEY',
sessionToken: 'SESSION_TOKEN',
endpoint: 'ENDPOINT_URL',
})
s3.downloadFile({
Bucket: 'BUCKET_NAME',
Key: 'KEY',
Filename: 'FILENAME',
})
Java
import software.amazon.awssdk.services.s3.S3Client;
import software.amazon.awssdk.services.s3.model.GetObjectRequest;
import software.amazon.awssdk.services.s3.model.GetObjectResponse;
import software.amazon.awssdk.services.s3.model.S3Exception;
public class S3Downloader {
public static void main(String[] args) {
S3Client s3 = S3Client.builder()
.endpointOverride(URI.create("ENDPOINT_URL"))
.credentialsProvider(StaticCredentialsProvider.create(
AwsBasicCredentials.create("ACCESS_KEY_ID", "SECRET_ACCESS_KEY")
))
.build();
GetObjectRequest getObjectRequest = GetObjectRequest.builder()
.bucket("BUCKET_NAME")
.key("KEY")
.build();
GetObjectResponse response = s3.getObject(getObjectRequest, "FILENAME");
response.getObjectContent().transferTo(System.out);
s3.close();
}
}
Clikhouse
SELECT *
FROM s3(
'ENDPOINT_URL/BUCKET_NAME/KEY',
'ACCESS_KEY_ID',
'SECRET_ACCESS_KEY',
'SESSION_TOKEN',
'FORMAT'
)
How is this guide?
Last updated on
How to find old jobs
Learn how to find old job postings from any company. Discover two methods: using our company lookup feature to see all jobs from a specific company, or performing a new job search and filtering by date. Old job postings can be a valuable source of information.
How to identify companies with problems your software solves
Learn how to use job postings to discover companies actively hiring for tasks your software automates. Find your ideal customers by analyzing 300M+ job descriptions for specific pain points and manual processes.