Data workflow

Steps we follow to source job data

How it works

To guarantee the highest possible quality of data, we follow this process:

Content acquisition.

We crawl different data sources every hour, such as job boards, ATSs, and company websites.

Extraction & Normalization.

  • For each job, we extract the following entities: job, company, and hiring manager (if available).
  • We normalize the data to ensure consistency and accuracy. E.g., the job description is standardized to Markdown format, locations are standardized to city, state, country, etc.
  • Media files like company logos are downloaded and stored in our own storage. So you can use stable URLs to access them from your own applications.

Deduplication.

We deduplicate the data to ensure that we only have one job for each company. It's common for a company to post the same job in multiple job boards.

Enrichment.

For each job posting, we enrich it with company information such as industry, size, location, etc.

Quality Assurance.

Multiple data analysts monitor and verify data on a daily basis to ensure the data is of the highest quality.

How is this guide?

Last updated on

On this page