Powering AI Through Crowdsourcing #4: Building A Smarter Search Engine

Oct18
Tags

Powering AI Through Crowdsourcing #4: Building A Smarter Search Engine

By Adam Karides

Search engines have become incredibly adept at presenting the exact information a user desires when inputting keywords into a search bar. They revolutionized how information is accessed and the algorithms behind them are constantly tweaked and sharpened to generate more pinpointed results. Google’s PageRank dominates the global market, but artificial intelligence (AI) might open the playing field for the first time since it first claimed the majority of the market share in 2007. As search engines are increasingly exposed of their shortcomings, AI is being explored as a mechanism to eliminate these flaws. While search engines are purely formulaic, machine learning runs these same algorithms but incorporates linguistic nuances to deliver even more precise results. However, since traditional models almost two decades’ worth of data in their systems, AI-based search engines have some catching up to do. To bring this type of machine learning up to speed, a team of researchers is fusing artificial intelligence with crowdsourcing to build a smart search engine at scale.

Search engines are reliable tools for finding valuable information online at the click of a button, but the current natural language processing (NLP) programs that drive search results lack a nuanced understanding of language. For example, they are not integrated with resources such as WordNet that provide a more holistic comprehension of words by grouping them with synonyms and other relevant semantic contexts. Additionally, they still struggle to effectively annotate other types of content such as images and videos. To improve these systems, a team at the University of Texas at Austin is exploring ways to build more intelligent search engines by injecting them with neural networks, the devices that digest and learn from human-input information to engage in maching learning. But how will these machines enhance the results of the billions of daily search queries? By crowdsourcing the data entries that comprise these complex algorithms, the researchers are striving to create a technology that better grasps the keywords punched into them.

This search engine model is a stark contrast from how they currently function. While gauging the relevance of content based on an algorithmic assessment of the quality of links it contains is useful, the research team is looking to include more human input into its ranking criteria. For example, news articles that are labeled with additional context such as where they took place or who was involved can render more accurate search results. However, such a process requires manual labor at scale; a single person could not imaginably singlehandedly rewire how content is indexed online. As a result, the researchers turned to crowdsourcing to fortify and accelerate the development of a smarter search engine.

As part of this project, the team has leveraged the collective power of the crowd to not only label content on a deeper level, but to also feed these classifications into its neural network. In particular, individuals have been tasked with reading medical journals and news articles and distilling their key details such as names, places, and events to enrich their searchability past the link tagging system that Google’s PageRank has perfected. The search engine then has a more robust taxonomy for it to more precisely link content with given search terms. This human-in-the-loop (HITL) approach has already proven to accurately predict the names of items included in articles and cull information from texts that haven’t even been indexed yet.

Developing a neural network was not the challenge, but training it at scale to be able to compete with and eventually supplant Google Search was a significant obstacle for the team. Fortunately, crowdsourcing was the solution the team needed. According to Matthew Lease, this project’s leader and assistant professor at the University’s School of Information, crowdsourcing enables them to “really ramp up the scale of label data” they gathered during this experiment.

This real-world application of marrying two of technology’s hottest trends is another example of how crowdsourcing human input can augment the development of AI. But what makes this case unique is the determination to disrupt such a stable and lucrative business. Google Search and its competitors are already such a fixture in our digital society that it is difficult to imagine an alternative technology. However, do not overlook the potential of machine learning innovation at scale.