---
title: How to Scrape Job Data for a List of Company Domains
description: Learn how to get job posting data for hundreds of company domains at once — without building scrapers. Query TheirStack's pre-indexed database of 203M+ jobs from 336k+ sources by domain via the app or API.
url: https://theirstack.com/en/docs/guides/how-to-scrape-job-data-for-a-list-of-company-domains
---

You have a list of company domains and you need their job postings — titles, descriptions, locations, dates, salary data. The traditional approach is to build and maintain scrapers for each company's career page, LinkedIn, Indeed, Greenhouse, Lever, and dozens of other sources. That works for 5 domains. It breaks at 500.

This guide shows how to get structured [job data](/en/docs/data/job) for any list of domains in a single API call or via the [TheirStack app](/en/docs/app) — no scrapers, no HTML parsing, no proxy rotation. TheirStack indexes jobs from 336k+ sources across 195 countries into a single queryable database.

## Why scraping job data by domain is hard

If you've tried building job scrapers, you already know these pain points:

-   **Fragmented sources** — A company's jobs are spread across their career page, LinkedIn, Indeed, Greenhouse, Lever, Workday, and more. No single source has everything.
-   **Anti-bot protections** — CAPTCHAs, rate limits, IP blocking, and browser fingerprinting make automated access unreliable.
-   **HTML parsing varies per ATS** — Each applicant tracking system renders job listings differently. Custom parsers break when UIs change.
-   **Deduplication** — The same job posted on 3 boards means 3 records in your data. Expect 30–50% duplicates without dedup logic.
-   **Maintenance** — Scrapers break regularly. URLs change, DOM structures shift, new anti-bot measures appear.
-   **Scale** — 50 domains is manageable. 5,000 domains across all sources takes days of compute time and constant babysitting.

## The alternative: query a pre-indexed job database

Instead of scraping each source yourself, you can query a database where the scraping is already done.

TheirStack continuously indexes job postings from 336k+ sources into a structured, deduplicated database. When you query by domain:

-   **Response in under 2 seconds**, not minutes or hours
-   **Automatic deduplication** across all sources
-   **Structured JSON** with title, description, location, salary, date, company info — no HTML parsing
-   **1 credit per job returned** — you only pay for results, not requests
-   **30+ filters** beyond domain: job title, location, technology, date range, salary, remote status, and more

## How to get job data for a list of company domains

1.  **Prepare your domain list**
    
    Clean your domains to root format — `stripe.com`, not `https://www.stripe.com/careers`. The API matches on the root domain, so subdomains and paths are unnecessary.
    
    Example list:
    
    ```
    stripe.com
    notion.so
    linear.app
    vercel.com
    figma.com
    ```
    
    Pricing is based on results returned, not domains submitted. If a domain has no jobs, it costs nothing.
    
2.  **Option A: Use the [TheirStack app](https://app.theirstack.com) (no code)**
    
    If you prefer a visual interface:
    
    1.  Open a new [job search](https://app.theirstack.com/search/jobs/new)
    2.  Click **Add filter** and select **Company domain**
    3.  Paste your list of domains (one per line)
    4.  Add a **Date posted** filter (e.g. last 30 days) to control volume
    5.  Click **Search** to see results
    6.  Click **Export** to download as CSV or Excel
    
    [Open job search](https://app.theirstack.com/search/jobs/new)
3.  **Option B: Use the [Jobs API](/en/docs/api-reference/jobs/search_jobs_v1)**
    
    Send a POST request to `/v1/jobs/search` with `company_domain_or` set to your list of domains.
    
    **curl:**
    
    ```
    curl --request POST \
      --url "https://api.theirstack.com/v1/jobs/search" \
      --header "Accept: application/json" \
      --header "Content-Type: application/json" \
      --header "Authorization: Bearer <your_api_key>" \
      -d '{
        "company_domain_or": [
          "stripe.com",
          "notion.so",
          "linear.app",
          "vercel.com",
          "figma.com"
        ],
        "posted_at_max_age_days": 30,
        "limit": 100,
        "offset": 0
      }'
    ```
    
    **Python:**
    
    ```
    import requests
    
    response = requests.post(
        "https://api.theirstack.com/v1/jobs/search",
        headers={
            "Content-Type": "application/json",
            "Authorization": "Bearer <your_api_key>",
        },
        json={
            "company_domain_or": [
                "stripe.com",
                "notion.so",
                "linear.app",
                "vercel.com",
                "figma.com",
            ],
            "posted_at_max_age_days": 30,
            "limit": 100,
            "offset": 0,
        },
    )
    
    data = response.json()
    print(f"Total jobs found: {data['metadata']['total']}")
    
    for job in data["data"]:
        print(f"{job['company_name']} — {job['name']} ({job['url']})")
    ```
    
    Each job in the response includes structured fields: `name`, `company_name`, `company_domain`, `url`, `location`, `posted_at`, `description`, `salary_string`, `remote`, and more.
    
4.  **Paginate through all results**
    
    The API returns up to 500 jobs per request. For larger result sets, increment the `offset`:
    
    ```
    import requests
    
    all_jobs = []
    offset = 0
    limit = 500
    
    while True:
        response = requests.post(
            "https://api.theirstack.com/v1/jobs/search",
            headers={
                "Content-Type": "application/json",
                "Authorization": "Bearer <your_api_key>",
            },
            json={
                "company_domain_or": [
                    "stripe.com",
                    "notion.so",
                    "linear.app",
                    "vercel.com",
                    "figma.com",
                ],
                "posted_at_max_age_days": 30,
                "limit": limit,
                "offset": offset,
            },
        )
    
        data = response.json()
        jobs = data["data"]
        all_jobs.extend(jobs)
    
        if len(jobs) < limit:
            break
    
        offset += limit
    
    print(f"Fetched {len(all_jobs)} total jobs")
    ```
    
5.  **Add filters to narrow results (optional)**
    
    Combine `company_domain_or` with other filters to get exactly the data you need:
    
    ```
    curl --request POST \
      --url "https://api.theirstack.com/v1/jobs/search" \
      --header "Accept: application/json" \
      --header "Content-Type: application/json" \
      --header "Authorization: Bearer <your_api_key>" \
      -d '{
        "company_domain_or": [
          "stripe.com",
          "notion.so",
          "linear.app"
        ],
        "job_title_or": ["Software Engineer", "Data Engineer"],
        "job_country_code_or": ["US", "GB"],
        "job_technology_slug_or": ["python", "typescript"],
        "posted_at_max_age_days": 15,
        "limit": 100,
        "offset": 0
      }'
    ```
    
    Available filters include `job_title_or`, `job_country_code_or`, `job_technology_slug_or`, `posted_at_max_age_days`, `remote`, and [many more](/en/docs/api-reference/jobs/search_jobs_v1).
    

## Further reading

[/docs/api-reference/jobs/search\_jobs\_v1](/docs/api-reference/jobs/search_jobs_v1)[/docs/guides/monitoring-open-jobs-from-current-and-past-customers](/docs/guides/monitoring-open-jobs-from-current-and-past-customers)[/docs/guides/adding-technology-filter-to-search](/docs/guides/adding-technology-filter-to-search)[/docs/guides/how-to-monitor-job-postings-automatically](/docs/guides/how-to-monitor-job-postings-automatically)[/docs/guides/fetch-jobs-periodically](/docs/guides/fetch-jobs-periodically)[/docs/api-reference/features/free-count](/docs/api-reference/features/free-count)[/docs/data/job/sources](/docs/data/job/sources)