Data workflow

Steps we follow to source job data

How it works

When you use our API, App, or Datasets, you're accessing data directly from our database, ensuring you always have the most up-to-date information available. Instead of making live calls to sources for each request, we proactively handle critical steps like data extraction, normalization, company enrichment, and quality assurance. This process guarantees both high-quality data and fast response times.

To guarantee the highest possible quality of data, we follow this process:

Content acquisition

We continuously crawl different data sources, such as job boards, ATSs, and company websites. The frequency of our scrapers varies, with some running as often as every 10 minutes and others hourly, to ensure our data is always up-to-date.

More details:

Extraction & Normalization

For every job we collect, we extract key entities including the job posting, company, and, when available, the hiring manager.

We also identify and extract technologies mentioned in the job title, description, and URL, then index them to enable filtering by tech stack.

To ensure data quality and consistency across the platform, we normalize all extracted fields. For example:

  • Job descriptions are converted into standardized Markdown format.
  • Locations are structured into city, state, country, and other geographic components.
  • Job titles are cleaned to remove special characters and extra whitespace.
  • Company industries are harmonized across different data sources.

Additionally, media assets such as company logos are downloaded and stored in our own infrastructure, allowing you to access them reliably via stable URLs for use in your own applications.

Job deduplication

Most companies use an Applicant Tracking System (ATS) to manage their hiring process. ATSs help streamline the candidate lifecycle, power career pages listing open roles, and sync job postings with major job boards like LinkedIn and Indeed.

When a company posts a job through an ATS, it's common for that listing to appear on multiple job boards simultaneously. As a result, a single position may end up with 3–5 different references across various platforms. Additionally, job boards often scrape and repost listings from each other, further increasing duplication.

Job posting deduplication is a crucial step to avoid having the same job posted multiple times in our database. If you use our data as sales signal, you wont' trigger the same signal multiple times. If you use our data to build a job board, you wont' have the same job posted multiple times.

We apply both algorithmic techniques and manual checks to eliminate duplicates effectively.

Company enrichment

Each time we collect a job posting, we also extract all available company information. However, in most cases, this initial data is quite limited—usually just the company's name, logo, and domain. While useful as a starting point, this basic information isn't enough to deliver meaningful insights or enable advanced filtering.

To unlock the full potential of our platform, we enrich these company records with a broader and deeper profile—a process we call company enrichment. This allows you to search, filter, and segment companies based on valuable attributes, helping you identify high-quality prospects faster and with greater precision.

Our enrichment process adds the following data points to each company:

  • Industry – Understand the market in which the company operates.
  • Company Size (Headcount) – Target organizations based on their scale, from startups to large enterprises.
  • Estimated Revenue – Gauge the financial size of a company to prioritize outreach.
  • Funding Details – See how much capital the company has raised and from which investors.
  • Headquarters Location – Identify geographic focus areas for your go-to-market strategy.
  • LinkedIn URL – Access their LinkedIn profile for further context and contact discovery.
  • Website URL – Navigate directly to the company's official site.
  • Company Description – Gain a quick overview of the company's mission, products, or services.

To ensure accuracy and broad coverage, we aggregate and validate this data from multiple trusted providers. While we strive for comprehensive enrichment, please note that coverage may vary depending on the availability of external data—some fields may not be available for every company.

Quality Assurance

Multiple data analysts monitor and verify data on a daily basis to ensure the data is of the highest quality.

How is this guide?

Last updated on