Startup in Focus: Diffbot

By on 23/05/2015

One of the criticisms being levied at the current state of e-commerce these days is the disconnect between the information that is available and the lack of an APIthat can understand and extract data from e-commerce sites effectively. This results in search engine algorithms that have trouble providing relevant information to users, while unscrupulous marketers kept finding ways to “game” the system and manipulate search engine results. In light of this, Diffbot has announced a brand new API that uses robots capable of understanding and extracting relevant data from e-commerce sites, with capabilities that would put existing search engine algorithms to shame.

Diffbot uses vision, machine learning, and artificial intelligence to extract and process data from web pages, which they plan to use in order to make the entire web machine-readable. Their API uses computer vision in order to translate sites into a product database, which can then be used by software developers to extract a wide variety of data, including product images, shipping costs, discount prices, SKU codes, MSRP, etc. The API cannot me manipulated or “gamed,” as it can identify and structure data regardless of the site’s design, layout or language.

Diffbot also has an AI-driven spider technology that can analyze an entire site using the same ability to discern between different types of information, which means it can skip non-product pages and just extract data from the important page types.

According to Diffbot CEO Mike Tung, they have only taken the first steps to pushing the capabilities of automated page extraction, and that their main goal is to make the entire web machine-readable. He also noted that Diffbot will continue to roll out APIs for other types of page types until they reach their imminent goal.

The company is currently being backed by Sky Dayton, who was the founder of Earthlink and is part of the board.