How to backfill a job board with TheirStack
Step-by-step guide to backfilling your job board using TheirStack webhooks, API, or datasets — with field mapping, deduplication tips, and best practices for job board operators.
New to backfilling? Read How to backfill a job board first for an overview of strategies, sourcing methods, and quality standards.
Why TheirStack for backfilling
TheirStack's job data platform was built with job board operators in mind:
- Original job link to company website — When a job originates from a company's career page, we include the URL (
final_url) so you can redirect users to the correct source. Filter for career-page-only jobs withfinal_url_exists. - Standardized job descriptions — Descriptions are normalized to Markdown across all sources, so your front end renders them consistently.
- Company enrichment — Most jobs include company
logo,domain,industry,headcount,revenue,type,location, and technologies used — everything you need for a company profile page. Media assets like logos are hosted on our infrastructure with stable URLs. - Real-time data — New jobs are added every minute. Webhook delivery means your board updates in near real-time.
- 20+ filters — Use
job_title_or,industry_id_or,technology_slug_or,country_code_or, and more to get only the jobs that match your niche. - 323k data sources — Career pages, ATS platforms, and job boards worldwide. Learn more
How to get the data into your job board
TheirStack offers three ways to ingest job data: webhooks for real-time streaming, the API for on-demand pulls, and datasets for high-volume bulk loads. As a rule of thumb, use webhooks if you need fewer than 1M jobs/month and datasets if you need more. You can also combine them — for example, seed your board with a dataset and keep it current with webhooks. See How to choose the best way to access TheirStack data for a detailed comparison.
Option 1: Webhooks (Recommended for <1M jobs/month)
Webhooks push new jobs to your endpoint as soon as they match your criteria. No polling, no cron jobs. Set up a webhook to listen for the new.job event, apply your filters, and start receiving jobs automatically.
See How to set up a webhook for the full walkthrough.
Option 2: API polling
We strongly recommend webhooks over API polling. Webhooks are simpler to implement and give you real-time updates. Use the API only if your architecture requires a pull-based approach.
Use the Jobs API to fetch jobs on a schedule (e.g., every 15 minutes). The same filters and fields are available.
See Fetch jobs periodically for implementation details.
Option 3: Datasets (Recommended for >1M jobs/month)
If you need high-volume inventory — for example, launching with 50,000+ listings or maintaining a broad, multi-country board — datasets are the most cost-effective option. Instead of pulling jobs one API call at a time, you receive the full file via S3 in CSV or Parquet format.
Datasets are a good fit when:
- You want a large initial seed — Download a historical snapshot to populate your board on day one, then layer webhooks on top for ongoing updates.
- You operate at high volume — If you ingest more than 1M records/month, datasets have flat-rate pricing that is more efficient than per-record API or webhook credits.
- You load data into a warehouse first — If your pipeline goes S3 → warehouse → job board (e.g., via dbt or Airflow), datasets slot in naturally.
A common pattern is to combine datasets with webhooks: use a dataset for the initial bulk load and historical backfill, then subscribe to new.job webhooks to keep your board current going forward.
See Datasets for delivery formats, update frequencies, and the jobs data dictionary for the full field reference.
Further reading
How to Backfill a Job Board: Strategy & Tips
How to set up a webhook
Job Search
Datasets
Sources
How to Choose the Best Way to Access TheirStack Data
How is this guide?
Last updated on
Adding a technology or job filter to your company search
Learn how to add a technology or job filter to your company search in TheirStack, enabling you to find companies based on specific tools they use or positions they are hiring for.
How to fetch jobs periodically using the Jobs API
This guide demonstrates how to fetch jobs periodically from the TheirStack API, ensuring fresh data, avoiding duplicates, and minimizing API credit costs.
