How to Scrape Job Data for a List of Company Domains

You have a list of company domains and you need their job postings — titles, descriptions, locations, dates, salary data. The traditional approach is to build and maintain scrapers for each company's career page, LinkedIn, Indeed, Greenhouse, Lever, and dozens of other sources. That works for 5 domains. It breaks at 500.

This guide shows how to get structured job data for any list of domains in a single API call or via the TheirStack app — no scrapers, no HTML parsing, no proxy rotation. TheirStack indexes jobs from 333k+ sources across 195 countries into a single queryable database.

Why scraping job data by domain is hard

If you've tried building job scrapers, you already know these pain points:

Fragmented sources — A company's jobs are spread across their career page, LinkedIn, Indeed, Greenhouse, Lever, Workday, and more. No single source has everything.
Anti-bot protections — CAPTCHAs, rate limits, IP blocking, and browser fingerprinting make automated access unreliable.
HTML parsing varies per ATS — Each applicant tracking system renders job listings differently. Custom parsers break when UIs change.
Deduplication — The same job posted on 3 boards means 3 records in your data. Expect 30–50% duplicates without dedup logic.
Maintenance — Scrapers break regularly. URLs change, DOM structures shift, new anti-bot measures appear.
Scale — 50 domains is manageable. 5,000 domains across all sources takes days of compute time and constant babysitting.

The alternative: query a pre-indexed job database

Instead of scraping each source yourself, you can query a database where the scraping is already done.

TheirStack continuously indexes job postings from 333k+ sources into a structured, deduplicated database. When you query by domain:

Response in under 2 seconds, not minutes or hours
Automatic deduplication across all sources
Structured JSON with title, description, location, salary, date, company info — no HTML parsing
1 credit per job returned — you only pay for results, not requests
30+ filters beyond domain: job title, location, technology, date range, salary, remote status, and more

How to get job data for a list of company domains

Prepare your domain list

Clean your domains to root format — stripe.com, not https://www.stripe.com/careers. The API matches on the root domain, so subdomains and paths are unnecessary.

Example list:

stripe.com
notion.so
linear.app
vercel.com
figma.com

Pricing is based on results returned, not domains submitted. If a domain has no jobs, it costs nothing.

Option A: Use the TheirStack app (no code)

If you prefer a visual interface:

Open a new job search
Click Add filter and select Company domain
Paste your list of domains (one per line)
Add a Date posted filter (e.g. last 30 days) to control volume
Click Search to see results
Click Export to download as CSV or Excel

Option B: Use the Jobs API

Send a POST request to /v1/jobs/search with company_domain_or set to your list of domains.

curl:

curl --request POST \
  --url "https://api.theirstack.com/v1/jobs/search" \
  --header "Accept: application/json" \
  --header "Content-Type: application/json" \
  --header "Authorization: Bearer <your_api_key>" \
  -d '{
    "company_domain_or": [
      "stripe.com",
      "notion.so",
      "linear.app",
      "vercel.com",
      "figma.com"
    ],
    "posted_at_max_age_days": 30,
    "limit": 100,
    "offset": 0
  }'

Python:

import requests

response = requests.post(
    "https://api.theirstack.com/v1/jobs/search",
    headers={
        "Content-Type": "application/json",
        "Authorization": "Bearer <your_api_key>",
    },
    json={
        "company_domain_or": [
            "stripe.com",
            "notion.so",
            "linear.app",
            "vercel.com",
            "figma.com",
        ],
        "posted_at_max_age_days": 30,
        "limit": 100,
        "offset": 0,
    },
)

data = response.json()
print(f"Total jobs found: {data['metadata']['total']}")

for job in data["data"]:
    print(f"{job['company_name']} — {job['name']} ({job['url']})")

Each job in the response includes structured fields: name, company_name, company_domain, url, location, posted_at, description, salary_string, remote, and more.

Paginate through all results

The API returns up to 500 jobs per request. For larger result sets, increment the offset:

import requests

all_jobs = []
offset = 0
limit = 500

while True:
    response = requests.post(
        "https://api.theirstack.com/v1/jobs/search",
        headers={
            "Content-Type": "application/json",
            "Authorization": "Bearer <your_api_key>",
        },
        json={
            "company_domain_or": [
                "stripe.com",
                "notion.so",
                "linear.app",
                "vercel.com",
                "figma.com",
            ],
            "posted_at_max_age_days": 30,
            "limit": limit,
            "offset": offset,
        },
    )

    data = response.json()
    jobs = data["data"]
    all_jobs.extend(jobs)

    if len(jobs) < limit:
        break

    offset += limit

print(f"Fetched {len(all_jobs)} total jobs")

Add filters to narrow results (optional)

Combine company_domain_or with other filters to get exactly the data you need:

curl --request POST \
  --url "https://api.theirstack.com/v1/jobs/search" \
  --header "Accept: application/json" \
  --header "Content-Type: application/json" \
  --header "Authorization: Bearer <your_api_key>" \
  -d '{
    "company_domain_or": [
      "stripe.com",
      "notion.so",
      "linear.app"
    ],
    "job_title_or": ["Software Engineer", "Data Engineer"],
    "job_country_code_or": ["US", "GB"],
    "job_technology_slug_or": ["python", "typescript"],
    "posted_at_max_age_days": 15,
    "limit": 100,
    "offset": 0
  }'

Available filters include job_title_or, job_country_code_or, job_technology_slug_or, posted_at_max_age_days, remote, and many more.

Why scraping job data by domain is hard

If you've tried building job scrapers, you already know these pain points:

Fragmented sources — A company's jobs are spread across their career page, LinkedIn, Indeed, Greenhouse, Lever, Workday, and more. No single source has everything.
Anti-bot protections — CAPTCHAs, rate limits, IP blocking, and browser fingerprinting make automated access unreliable.
HTML parsing varies per ATS — Each applicant tracking system renders job listings differently. Custom parsers break when UIs change.
Deduplication — The same job posted on 3 boards means 3 records in your data. Expect 30–50% duplicates without dedup logic.
Maintenance — Scrapers break regularly. URLs change, DOM structures shift, new anti-bot measures appear.
Scale — 50 domains is manageable. 5,000 domains across all sources takes days of compute time and constant babysitting.

The alternative: query a pre-indexed job database

Instead of scraping each source yourself, you can query a database where the scraping is already done.

TheirStack continuously indexes job postings from 333k+ sources into a structured, deduplicated database. When you query by domain:

Response in under 2 seconds, not minutes or hours
Automatic deduplication across all sources
Structured JSON with title, description, location, salary, date, company info — no HTML parsing
1 credit per job returned — you only pay for results, not requests
30+ filters beyond domain: job title, location, technology, date range, salary, remote status, and more

How to get job data for a list of company domains

Prepare your domain list

Clean your domains to root format — stripe.com, not https://www.stripe.com/careers. The API matches on the root domain, so subdomains and paths are unnecessary.

Example list:

stripe.com
notion.so
linear.app
vercel.com
figma.com

Pricing is based on results returned, not domains submitted. If a domain has no jobs, it costs nothing.

Option A: Use the TheirStack app (no code)

If you prefer a visual interface:

Open a new job search
Click Add filter and select Company domain
Paste your list of domains (one per line)
Add a Date posted filter (e.g. last 30 days) to control volume
Click Search to see results
Click Export to download as CSV or Excel

Option B: Use the Jobs API

Send a POST request to /v1/jobs/search with company_domain_or set to your list of domains.

curl:

curl --request POST \
  --url "https://api.theirstack.com/v1/jobs/search" \
  --header "Accept: application/json" \
  --header "Content-Type: application/json" \
  --header "Authorization: Bearer <your_api_key>" \
  -d '{
    "company_domain_or": [
      "stripe.com",
      "notion.so",
      "linear.app",
      "vercel.com",
      "figma.com"
    ],
    "posted_at_max_age_days": 30,
    "limit": 100,
    "offset": 0
  }'

Python:

import requests

response = requests.post(
    "https://api.theirstack.com/v1/jobs/search",
    headers={
        "Content-Type": "application/json",
        "Authorization": "Bearer <your_api_key>",
    },
    json={
        "company_domain_or": [
            "stripe.com",
            "notion.so",
            "linear.app",
            "vercel.com",
            "figma.com",
        ],
        "posted_at_max_age_days": 30,
        "limit": 100,
        "offset": 0,
    },
)

data = response.json()
print(f"Total jobs found: {data['metadata']['total']}")

for job in data["data"]:
    print(f"{job['company_name']} — {job['name']} ({job['url']})")

Each job in the response includes structured fields: name, company_name, company_domain, url, location, posted_at, description, salary_string, remote, and more.

Paginate through all results

The API returns up to 500 jobs per request. For larger result sets, increment the offset:

import requests

all_jobs = []
offset = 0
limit = 500

while True:
    response = requests.post(
        "https://api.theirstack.com/v1/jobs/search",
        headers={
            "Content-Type": "application/json",
            "Authorization": "Bearer <your_api_key>",
        },
        json={
            "company_domain_or": [
                "stripe.com",
                "notion.so",
                "linear.app",
                "vercel.com",
                "figma.com",
            ],
            "posted_at_max_age_days": 30,
            "limit": limit,
            "offset": offset,
        },
    )

    data = response.json()
    jobs = data["data"]
    all_jobs.extend(jobs)

    if len(jobs) < limit:
        break

    offset += limit

print(f"Fetched {len(all_jobs)} total jobs")

Add filters to narrow results (optional)

Combine company_domain_or with other filters to get exactly the data you need:

curl --request POST \
  --url "https://api.theirstack.com/v1/jobs/search" \
  --header "Accept: application/json" \
  --header "Content-Type: application/json" \
  --header "Authorization: Bearer <your_api_key>" \
  -d '{
    "company_domain_or": [
      "stripe.com",
      "notion.so",
      "linear.app"
    ],
    "job_title_or": ["Software Engineer", "Data Engineer"],
    "job_country_code_or": ["US", "GB"],
    "job_technology_slug_or": ["python", "typescript"],
    "posted_at_max_age_days": 15,
    "limit": 100,
    "offset": 0
  }'

Available filters include job_title_or, job_country_code_or, job_technology_slug_or, posted_at_max_age_days, remote, and many more.

How to Scrape Job Data for a List of Company Domains

Why scraping job data by domain is hard

The alternative: query a pre-indexed job database

How to get job data for a list of company domains

Further reading

Job Search

Monitoring open jobs from current and past customers

Adding a technology or job filter to your company search

How to monitor job postings automatically

How to fetch jobs periodically using the Jobs API

Free count

Sources

On this page

How to Scrape Job Data for a List of Company Domains

Why scraping job data by domain is hard

The alternative: query a pre-indexed job database

How to get job data for a list of company domains

Further reading

Job Search

Monitoring open jobs from current and past customers

Adding a technology or job filter to your company search

How to monitor job postings automatically

How to fetch jobs periodically using the Jobs API

Free count

Sources

On this page