Introduction

What is TheirStackProduct updatesBlog

Data

Job Data
Data workflowFreshnessSourcesStatisticsUse cases
Technographic Data
Company Data

Product

App
APIWebhooksDatasets

Pricing and Billing

Pricing
Affiliate Program

Integrations

ClayMake

Other

Users and Teams
TheirStackTheirStack Logo
Log inSign up
DocumentationAPI ReferenceWebhooksDatasetsMCPGuides
Job Data

Data workflow

Learn how we transform raw job postings into high-quality, normalized data through our 5-step workflow covering acquisition, extraction, deduplication, enrichment, and quality assurance

How it works

When you use our API, App, or Datasets, you're accessing data directly from our database, ensuring you always have the most up-to-date information available. Instead of making live calls to sources for each request, we proactively handle critical steps like data extraction, normalization, company enrichment, and quality assurance. This process guarantees both high-quality data and fast response times.

To guarantee the highest possible quality of data, we follow this process:

Content acquisition

We continuously crawl different data sources, such as job boards, ATSs, and company websites. The frequency of our scrapers varies, with some running as often as every 10 minutes and others hourly, to ensure our data is always up-to-date.

More details:

  • Jobs from 321k different websites
  • Data freshness and scraping frequency

Extraction & Normalization

For every job we collect, we extract key entities including the job posting, company, and, when available, the hiring manager.

We also identify and extract technologies mentioned in the job title, description, and URL, then index them to enable filtering by tech stack.

To ensure data quality and consistency across the platform, we normalize all extracted fields.

Additionally, media assets such as company logos are downloaded and stored in our own infrastructure, allowing you to access them reliably via stable URLs for use in your own applications.

Job deduplication

Most companies use an Applicant Tracking System (ATS) to manage their hiring process. ATSs help streamline the candidate lifecycle, power career pages listing open roles, and sync job postings with major job boards like LinkedIn and Indeed.

When a company posts a job through an ATS, it's common for that listing to appear on multiple job boards simultaneously. As a result, a single position may end up with 3–5 different references across various platforms. Additionally, job boards often scrape and repost listings from each other, further increasing duplication.

Job posting deduplication is a crucial step to avoid having the same job posted multiple times in our database. If you use our data as sales signal, you wont' trigger the same signal multiple times. If you use our data to build a job board, you wont' have the same job posted multiple times.

We apply both algorithmic techniques and manual checks to eliminate duplicates effectively.

Data Enrichment

Our job posting collection gives you a solid foundation, but we don't stop there. We enhance every piece of data through a comprehensive enrichment process that adds valuable context and verified extra details.

This transformation turns basic job listings into actionable intelligence that helps you make better decisions and build more powerful applications.

Quality Assurance

Multiple data analysts monitor and verify data on a daily basis to ensure the data is of the highest quality.

FAQ

Do you track expired jobs?

Not yet. Today, our focus is on collecting and ingesting any kind of job found.

If your use case needs "active-only" jobs, our current recommendation is:

Tracking expirations reliably requires additional follow-up crawling requests and more proxy traffic, which increases infrastructure cost. That extra cost would need to be passed to customers who need this capability.

If this is important for your use case, email us at hi@theirstack.com and we'll take your request into consideration to prioritize that feature.

  • Filter to jobs posted in the last 1-2 weeks.
  • Use a shorter cutoff (around 1 week since first seen) if you want to minimize publishing jobs that may already be closed.
  • Use a longer cutoff (up to 1 month since first seen) if you want to maximize inventory, accepting that some jobs may already be closed.

How is this guide?

Last updated on

Job Data

Access millions of real-time job listings from [[total_job_sources]]+ sources across [[n_countries]] countries. Get comprehensive job market data with advanced filters, company insights, and lightning-fast API responses.

Freshness

Learn how job data freshness works with multi-tiered scraping from 10 minutes to daily, discovering 90% of new tech postings and 73% same-day detection.

On this page

How it works
Content acquisition
Extraction & Normalization
Job deduplication
Data Enrichment
Quality Assurance
FAQ
Do you track expired jobs?