Company domain enrichment improvements

Co-Founder at TheirStack
When building our company dataset, we combine information from multiple sources. One of the most important pieces of company information we provide is the domain, and we made 2 major improvements to it:
-
Better identification of URLs from social sites, link shorteners, etc. - A significant number of companies set their LinkedIn URL to a link to
bit.ly
,linktr.ee
,sites.google.com
,facebook.com
,instagram.com
... When extracting the domain from those URLs, we'd say that Bit.ly, Linktree, Google, Facebook or Instagram (among others) were associated to tens of thousands of companies, which is obviously not the case. This is now fixed. For example,bit.ly
was originally associated to thousands of companies because many of them had set their LinkedIn URL to abit.ly
link. Now it's only associated with the correct company, as shown in this bit.ly domain search. -
Better company linkage to companies with similar names - Previously, in many cases we'd match small companies with names similar to those of larger companies to the larger company. For example, we'd say that the domain of a company named
AIR+
wasairbnb.com
, which is also not true. Now, the name similarity has to be much higher for us to consider two companies to be the same.