--- title: How to Scrape Job Data for a List of Company Domains description: Learn how to get job posting data for hundreds of company domains at once — without building scrapers. Query TheirStack's pre-indexed database of 203M+ jobs from 336k+ sources by domain via the app or API. url: https://theirstack.com/en/docs/guides/how-to-scrape-job-data-for-a-list-of-company-domains --- You have a list of company domains and you need their job postings — titles, descriptions, locations, dates, salary data. The traditional approach is to build and maintain scrapers for each company's career page, LinkedIn, Indeed, Greenhouse, Lever, and dozens of other sources. That works for 5 domains. It breaks at 500. This guide shows how to get structured [job data](/en/docs/data/job) for any list of domains in a single API call or via the [TheirStack app](/en/docs/app) — no scrapers, no HTML parsing, no proxy rotation. TheirStack indexes jobs from 336k+ sources across 195 countries into a single queryable database. ## Why scraping job data by domain is hard If you've tried building job scrapers, you already know these pain points: - **Fragmented sources** — A company's jobs are spread across their career page, LinkedIn, Indeed, Greenhouse, Lever, Workday, and more. No single source has everything. - **Anti-bot protections** — CAPTCHAs, rate limits, IP blocking, and browser fingerprinting make automated access unreliable. - **HTML parsing varies per ATS** — Each applicant tracking system renders job listings differently. Custom parsers break when UIs change. - **Deduplication** — The same job posted on 3 boards means 3 records in your data. Expect 30–50% duplicates without dedup logic. - **Maintenance** — Scrapers break regularly. URLs change, DOM structures shift, new anti-bot measures appear. - **Scale** — 50 domains is manageable. 5,000 domains across all sources takes days of compute time and constant babysitting. ## The alternative: query a pre-indexed job database Instead of scraping each source yourself, you can query a database where the scraping is already done. TheirStack continuously indexes job postings from 336k+ sources into a structured, deduplicated database. When you query by domain: - **Response in under 2 seconds**, not minutes or hours - **Automatic deduplication** across all sources - **Structured JSON** with title, description, location, salary, date, company info — no HTML parsing - **1 credit per job returned** — you only pay for results, not requests - **30+ filters** beyond domain: job title, location, technology, date range, salary, remote status, and more ## How to get job data for a list of company domains 1. **Prepare your domain list** Clean your domains to root format — `stripe.com`, not `https://www.stripe.com/careers`. The API matches on the root domain, so subdomains and paths are unnecessary. Example list: ``` stripe.com notion.so linear.app vercel.com figma.com ``` Pricing is based on results returned, not domains submitted. If a domain has no jobs, it costs nothing. 2. **Option A: Use the [TheirStack app](https://app.theirstack.com) (no code)** If you prefer a visual interface: 1. Open a new [job search](https://app.theirstack.com/search/jobs/new) 2. Click **Add filter** and select **Company domain** 3. Paste your list of domains (one per line) 4. Add a **Date posted** filter (e.g. last 30 days) to control volume 5. Click **Search** to see results 6. Click **Export** to download as CSV or Excel [Open job search](https://app.theirstack.com/search/jobs/new) 3. **Option B: Use the [Jobs API](/en/docs/api-reference/jobs/search_jobs_v1)** Send a POST request to `/v1/jobs/search` with `company_domain_or` set to your list of domains. **curl:** ``` curl --request POST \ --url "https://api.theirstack.com/v1/jobs/search" \ --header "Accept: application/json" \ --header "Content-Type: application/json" \ --header "Authorization: Bearer " \ -d '{ "company_domain_or": [ "stripe.com", "notion.so", "linear.app", "vercel.com", "figma.com" ], "posted_at_max_age_days": 30, "limit": 100, "offset": 0 }' ``` **Python:** ``` import requests response = requests.post( "https://api.theirstack.com/v1/jobs/search", headers={ "Content-Type": "application/json", "Authorization": "Bearer ", }, json={ "company_domain_or": [ "stripe.com", "notion.so", "linear.app", "vercel.com", "figma.com", ], "posted_at_max_age_days": 30, "limit": 100, "offset": 0, }, ) data = response.json() print(f"Total jobs found: {data['metadata']['total']}") for job in data["data"]: print(f"{job['company_name']} — {job['name']} ({job['url']})") ``` Each job in the response includes structured fields: `name`, `company_name`, `company_domain`, `url`, `location`, `posted_at`, `description`, `salary_string`, `remote`, and more. 4. **Paginate through all results** The API returns up to 500 jobs per request. For larger result sets, increment the `offset`: ``` import requests all_jobs = [] offset = 0 limit = 500 while True: response = requests.post( "https://api.theirstack.com/v1/jobs/search", headers={ "Content-Type": "application/json", "Authorization": "Bearer ", }, json={ "company_domain_or": [ "stripe.com", "notion.so", "linear.app", "vercel.com", "figma.com", ], "posted_at_max_age_days": 30, "limit": limit, "offset": offset, }, ) data = response.json() jobs = data["data"] all_jobs.extend(jobs) if len(jobs) < limit: break offset += limit print(f"Fetched {len(all_jobs)} total jobs") ``` 5. **Add filters to narrow results (optional)** Combine `company_domain_or` with other filters to get exactly the data you need: ``` curl --request POST \ --url "https://api.theirstack.com/v1/jobs/search" \ --header "Accept: application/json" \ --header "Content-Type: application/json" \ --header "Authorization: Bearer " \ -d '{ "company_domain_or": [ "stripe.com", "notion.so", "linear.app" ], "job_title_or": ["Software Engineer", "Data Engineer"], "job_country_code_or": ["US", "GB"], "job_technology_slug_or": ["python", "typescript"], "posted_at_max_age_days": 15, "limit": 100, "offset": 0 }' ``` Available filters include `job_title_or`, `job_country_code_or`, `job_technology_slug_or`, `posted_at_max_age_days`, `remote`, and [many more](/en/docs/api-reference/jobs/search_jobs_v1). ## Further reading [/docs/api-reference/jobs/search\_jobs\_v1](/docs/api-reference/jobs/search_jobs_v1)[/docs/guides/monitoring-open-jobs-from-current-and-past-customers](/docs/guides/monitoring-open-jobs-from-current-and-past-customers)[/docs/guides/adding-technology-filter-to-search](/docs/guides/adding-technology-filter-to-search)[/docs/guides/how-to-monitor-job-postings-automatically](/docs/guides/how-to-monitor-job-postings-automatically)[/docs/guides/fetch-jobs-periodically](/docs/guides/fetch-jobs-periodically)[/docs/api-reference/features/free-count](/docs/api-reference/features/free-count)[/docs/data/job/sources](/docs/data/job/sources)