---
title: How to backfill a job board with TheirStack
description: Step-by-step guide to backfilling your job board using TheirStack webhooks, API, or datasets — with field mapping, deduplication tips, and best practices for job board operators.
url: https://theirstack.com/en/docs/guides/backfill-job-board
---

New to backfilling? Read [How to backfill a job board](/en/blog/backfill-job-board) first for an overview of strategies, sourcing methods, and quality standards.

## Why TheirStack for backfilling

TheirStack's [job data](/en/docs/data/job) platform was built with job board operators in mind:

-   **Original job link to company website** — When a job originates from a company's career page, we include the URL (`final_url`) so you can redirect users to the correct source. Filter for career-page-only jobs with `final_url_exists`.
-   **Standardized job descriptions** — Descriptions are normalized to Markdown across all sources, so your front end renders them consistently.
-   **Company enrichment** — Most jobs include company `logo`, `domain`, `industry`, `headcount`, `revenue`, `type`, `location`, and technologies used — everything you need for a company profile page. Media assets like logos are hosted on our infrastructure with stable URLs.
-   **Real-time data** — New jobs are added every minute. Webhook delivery means your board updates in near real-time.
-   **20+ filters** — Use `job_title_or`, `industry_id_or`, `technology_slug_or`, `country_code_or`, and more to get only the jobs that match your niche.
-   **336k data sources** — Career pages, ATS platforms, and job boards worldwide. [Learn more](/en/docs/data/job/sources)

## How to get the data into your job board

TheirStack offers three ways to ingest job data: **webhooks** for both real-time streaming and historical backfilling, the **API** for on-demand pulls, and **datasets** for high-volume bulk loads. As a rule of thumb, use **webhooks if you need fewer than 1M jobs/month** and **datasets if you need more**. See [How to choose the best way to access TheirStack data](/en/docs/guides/how-to-choose-best-way-to-access-theirstack-data) for a detailed comparison.

### Option 1: Webhooks (Recommended for <1M jobs/month)

Webhooks push jobs to your endpoint as soon as they match your criteria. No polling, no cron jobs. [Set up a webhook](/en/docs/webhooks/how-to-set-up-a-webhook) to listen for the `new.job` event, apply your filters, and start receiving jobs automatically.

Webhooks handle both ongoing updates and historical backfilling:

-   **Ongoing updates**: Once your webhook is active, new jobs matching your filters are delivered in near real-time as they are indexed.
-   **Historical backfill**: To seed your board with existing jobs, set a wider date range in your webhook filters (e.g., `posted_at_max_age_days: 90`), enable **"Send all matching jobs"**, and wait a few minutes for all matching jobs to be delivered. Once the backfill is complete, you can either adjust the date range back to a shorter window or cancel and recreate the webhook with your ongoing filters.

See [How to set up a webhook](/en/docs/webhooks/how-to-set-up-a-webhook) for the full walkthrough.

### Option 2: API polling

We strongly recommend [webhooks](/en/docs/webhooks) over API polling. Webhooks are simpler to implement and give you real-time updates. Use the API only if your architecture requires a pull-based approach.

Use the [Jobs API](/en/docs/api-reference/jobs/search_jobs_v1) to fetch jobs on a schedule (e.g., every 15 minutes). The same filters and fields are available.

See [Fetch jobs periodically](/en/docs/guides/fetch-jobs-periodically) for implementation details.

### Option 3: Datasets (Recommended for >1M jobs/month)

If you need high-volume inventory — for example, launching with 50,000+ listings or maintaining a broad, multi-country board — [datasets](/en/docs/datasets) are the most cost-effective option. Instead of pulling jobs one API call at a time, you receive the full file via S3 in CSV or Parquet format.

Datasets are a good fit when:

-   **You want a large initial seed beyond what webhooks can deliver** — Download a historical snapshot to populate your board on day one. For smaller backfills, webhooks can handle this too (see Option 1 above).
-   **You operate at high volume** — If you ingest more than 1M records/month, datasets have flat-rate pricing that is more efficient than per-record API or webhook [credits](/en/docs/pricing/credits).
-   **You load data into a warehouse first** — If your pipeline goes S3 → warehouse → job board (e.g., via dbt or Airflow), datasets slot in naturally.

A common pattern is to combine datasets with webhooks: use a dataset for a large initial bulk load, then subscribe to `new.job` webhooks to keep your board current going forward.

See [Datasets](/en/docs/datasets) for delivery formats, update frequencies, and the [jobs data dictionary](/en/docs/datasets/options/job) for the full field reference.

## Further reading

[/blog/backfill-job-board](/blog/backfill-job-board)[/docs/webhooks/how-to-set-up-a-webhook](/docs/webhooks/how-to-set-up-a-webhook)[/docs/api-reference/jobs/search\_jobs\_v1](/docs/api-reference/jobs/search_jobs_v1)[/docs/datasets](/docs/datasets)[/docs/data/job/sources](/docs/data/job/sources)[/docs/guides/how-to-choose-best-way-to-access-theirstack-data](/docs/guides/how-to-choose-best-way-to-access-theirstack-data)