Data workflow

Steps we follow to source job data

How it works

When you use our API, App, or Datasets, you're accessing data directly from our database, ensuring you always have the most up-to-date information available. Instead of making live calls to sources for each request, we proactively handle critical steps like data extraction, normalization, company enrichment, and quality assurance. This process guarantees both high-quality data and fast response times.

To guarantee the highest possible quality of data, we follow this process:

Content acquisition

We continuously crawl different data sources, such as job boards, ATSs, and company websites. The frequency of our scrapers varies, with some running as often as every 10 minutes and others hourly, to ensure our data is always up-to-date.

More details:

Extraction & Normalization

For every job we collect, we extract key entities including the job posting, company, and, when available, the hiring manager.

We also identify and extract technologies mentioned in the job title, description, and URL, then index them to enable filtering by tech stack.

To ensure data quality and consistency across the platform, we normalize all extracted fields.

Additionally, media assets such as company logos are downloaded and stored in our own infrastructure, allowing you to access them reliably via stable URLs for use in your own applications.

Job deduplication

Most companies use an Applicant Tracking System (ATS) to manage their hiring process. ATSs help streamline the candidate lifecycle, power career pages listing open roles, and sync job postings with major job boards like LinkedIn and Indeed.

When a company posts a job through an ATS, it's common for that listing to appear on multiple job boards simultaneously. As a result, a single position may end up with 3–5 different references across various platforms. Additionally, job boards often scrape and repost listings from each other, further increasing duplication.

Job posting deduplication is a crucial step to avoid having the same job posted multiple times in our database. If you use our data as sales signal, you wont' trigger the same signal multiple times. If you use our data to build a job board, you wont' have the same job posted multiple times.

We apply both algorithmic techniques and manual checks to eliminate duplicates effectively.

Data Enrichment

Our job posting collection gives you a solid foundation, but we don't stop there. We enhance every piece of data through a comprehensive enrichment process that adds valuable context and verified extra details.

This transformation turns basic job listings into actionable intelligence that helps you make better decisions and build more powerful applications.

Quality Assurance

Multiple data analysts monitor and verify data on a daily basis to ensure the data is of the highest quality.

How is this guide?

Last updated on