Semantic Search Sourcing Success Beyond Boolean Search
Q3, 2009
Authors: Shally Steckerl EVP, Arbita
Bryan Starbuck Founder & CEO, TalentSpring, Inc.
Semantic Search – Sourcing Success Beyond Boolean Search
ARBITA
Table of Contents The Future of Candidate Sourcing .................................................................................. 3 Ankle Deep in the Deep Web, but Inching Closer to Semantic Search ........................... 3 Semantic Search: Why Recruiters Should Care .............................................................. 3 Semantic Search for Recruiting ....................................................................................... 4 Understanding Semantic Search Fundamentals ............................................................. 6 Literal versus Equivalent Match Searches ................................................................... 6 Soft Keywords: The Hidden Power of Semantic Search .............................................. 7 Search Term Expansion Sets ...................................................................................... 8 Three Different Semantic Approaches .......................................................................... 10 Lexicon- and Ontological- Based Search................................................................... 10 Statistical Analysis and Pattern Matching .................................................................. 10 Contextual Search ..................................................................................................... 10 Broad vs. Narrow Match Semantic Search.................................................................... 10 Targeting Semantic Search ........................................................................................... 11 Example Semantic Search Technologies ...................................................................... 11 Full High End Semantic Search Solutions for Recruiters ........................................... 11 Free Semantic Search Tools for Recruiters ............................................................... 12 Semantic Search RFP Check List ................................................................................. 14 About the Authors: ........................................................................................................ 15 APPENDIX I – Alternate Search Engines...................................................................... 16 TABLE I – Semantic Search Engine Types ................................................................... 17 TABLE II – Semantic Search Engine Types .................................................................. 17
2 Copyright 2009, TalentSpring, Inc. (www.talentspring.com) and Arbita, Inc. (www.arbita.net).
Semantic Search – Sourcing Success Beyond Boolean Search
ARBITA
The Future of Candidate Sourcing Today, the recruiting industry stands on the forefront of a technological revolution as profound as the initial adoption of candidate sourcing via the Internet. It is a revolution that will change the way recruiters spend their days, the way organizations allocate their resources, and the way candidates find jobs. This new revolution comes in the form of newly advanced tools that leverage Semantic Search technology. Semantic Search will change the ‘keyword’ focus of electronic sourcing to the ‘actual’ meaning found within resumes and job descriptions.
These new tools will cut in half the time recruiters spend on
sourcing, vastly improve candidate match quality, and simplify candidate information found on social networks, job boards and corporate Application Tracking Systems (ATS).
Ankle Deep in the Deep Web, but Inching Closer to Semantic Search Most people agree that the biggest problem with online recruiting today is too much available information. Without a good search engine, you simply get lost in all the information. Unfortunately, today’s search engines are still inefficient, delivering mismatched information and requiring complex search string knowledge to use effectively. In an ideal world a search engine would function like a human, understanding the underlying meaning of the user’s search and then matching the search results accordingly. Many expert communities talk about Semantic Search applications being the most likely technology to deliver this kind of result, but few take the time to explain it in plain language. This whitepaper explains Semantic Search for candidate sourcing.
Semantic Search: Why Recruiters Should Care Semantics is the field of study that focuses on ‘meaning’. Linguistic Semantics seeks to understand the meaning behind language, symbols, words, phrases, sentences, and larger blocks of text. Semantic Search engines apply grammatical analysis, logical interpretation, and linguistic morphology to identify the unstructured meaning of the user’s search and find relevant results. In simple terms, a perfect Semantic Search engine would instantly take into consideration the meaning behind your question and deliver the result you were actually looking for.
Today, most people think of online search in terms of the capabilities of major search engines like GoogleTM, Yahoo!TM, or Microsoft’s BingTM.
These big search engines utilize Boolean-based keyword
search technology and often require the use of complex syntax and field search commands to find specific occurrences of information (keywords) within documents. Results are based solely on whether 3 Copyright 2009, TalentSpring, Inc. (www.talentspring.com) and Arbita, Inc. (www.arbita.net).
Semantic Search – Sourcing Success Beyond Boolean Search
ARBITA
those keywords are present. The major search engines are constantly experimenting with new ways to simplify their search queries for users. However, these simplifying efforts don’t really work to understand the true ‘meaning’ of what is being searched for.
In contrast, Semantic Search technologies seek to simplify search by understanding the actual concept being sought. Semantic Search engines discover the true relationship between the question being asked and the content being delivered. Consequently, the user’s ‘experience’ of the search is shifted from sifting through documents that contain a specific keyword to reading documents that express the concept originally being sought.
One of the biggest challenges of search engines is their difficulty to understand the ‘context’ of the search. It is context that determines if the word ‘well’ refers to a ‘bucket’ as in, “Draw water from the well” or a ‘person’, as in, “Is she not feeling well?” As a human, if you read “stair well” you automatically know what it means. Computers, on the other hand, have to calculate hundreds of variations and probabilities to arrive at a best guess.
Semantic Search engines make sense of sentence context by being pre-configured (trained) to understand who the user is and what the likely context of the search term is. To illustrate, imagine two people searching for a Marketing Manager position on the Web. One person is a recruiter, the other is a job candidate. With a regular search engine both people would get the same results. However, with a Semantic Search engine, that knew the user was a recruiter, only candidate resumes would be received, while job listings would be ignored. Likewise, the job candidate would only see job listings.
“Semantic Search technology has not yet reached the level of fully comprehension. However, a number of technology vendors have taken Semantic Search far beyond the capabilities of Boolean search to make online recruiting simpler and faster.” Shally Steckerl – Arbita
Semantic Search for Recruiting When we talk about effective Semantic Search for recruiting today, we are really talking about two capabilities. First, we are talking about the unique capabilities of Semantic Search engines to understand the concept being searched for. Second, we are talking about the ability of Semantic Search engines to rank and filter search results using ‘smart’ ranking systems. Combined, these two capabilities create intelligent search tools that are capable of duplicating the most tedious and time-consuming aspects of candidate sourcing. Recruiters have no time to waste when a computer should be smart enough to derive context, subtext, and meaning for us. 4 Copyright 2009, TalentSpring, Inc. (www.talentspring.com) and Arbita, Inc. (www.arbita.net).
Semantic Search – Sourcing Success Beyond Boolean Search
ARBITA
Today, intelligent Semantic Search tools are becoming available to recruiters. The best of these search tools outpace the ability of complex syntax and Boolean search to automatically match the content of an individual’s resumes to a job description, and rank the results according to what is most important to the recruiter.
Benefits of Semantic Search over Boolean Search for Recruiters: •
Semantic Search is far easier to learn than complex syntax and field search commands because it doesn’t require significant technical skills to get good results (i.e. there’s no need to use commands like intitle, inurl, site, and filetype).
•
Semantic Search can save recruiters significant time by automatically identifying which terms to search on in the job description.
•
Semantic Search provides recruiters with more accurate resume matches by pre-filtering results for such things as candidate qualifications (skills, experience, education, etc.) and work history characteristics (job hopping, job similarity, etc.)
•
Semantic Search increases search match quality by taking into account all needs of the job requisition and candidate resume (e.g. detects job seekers who no longer work in the job title matching the requisition).
•
Semantic Search can identify high-quality candidates whose resumes don’t conform (are hidden) to the rigid terms used in complex search strings.
From a practical perspective, Semantic Search for recruiters means that they don’t have to acquire special skills building search strings to find candidates. In fact, applications that do Semantic Search well don’t even require the recruiter to interact with keywords at all. The Semantic Search engines ‘read’ the job description, understand the key attributes being sought, and then automatically builds an expanded content list to search for resumes. The result is that match quality is much better than regular search methods.
Definition of Semantic Search for Recruiting: Semantic Search for recruiting refers to finding the best resumes or profiles that match the needs of a job description. It requires going beyond search that simply understands sentence structure to factoring in the needs of employer to find resumes
5 Copyright 2009, TalentSpring, Inc. (www.talentspring.com) and Arbita, Inc. (www.arbita.net).
Semantic Search – Sourcing Success Beyond Boolean Search
ARBITA
Understanding Semantic Search Fundamentals There are three interrelated aspects of Semantic Search Engines that warrant additional discussion:
1. Literal vs. Equivalent Match Searches 2. Search Term Expansion Sets 3. Sentence Structure Understanding
Literal versus Equivalent Match Searches For the layperson, the most noticeable difference between Boolean and Semantic Search engines is the flexibility around matching the search keywords used. With Boolean search, an exact match to the search terms (keywords(s)) is required. With Semantic Search, matches can include equivalent words as well. While Literal versus Equivalent might sound simple, it is worth looking at an example: Imagine a recruiter has the option of using a Boolean Search engine or a Semantic Search engine to fill the following position:
Job Title: Software Engineer, Level: Team Supervisor, Company: Hewlett-Packard, Product Line: Scanners, Requirements: 1. background in image processing algorithms, 2. experience writing hardware device drivers, 3. 5 years experience, 4. Masters in Computer Science
With the Boolean search engine s/he might search on: “Software Engineer” AND “image processing” and “device drivers”. Only resumes that literally conformed to the search term would be included in the results.
With a Semantic Search engine, trained for recruiting, the user would see all candidates that had equivalent term matches (i.e. computer programmer, image algorithms, image biubic, device DDK, etc.).
6 Copyright 2009, TalentSpring, Inc. (www.talentspring.com) and Arbita, Inc. (www.arbita.net).
Semantic Search – Sourcing Success Beyond Boolean Search
ARBITA
Soft Keywords: The Hidden Power of Semantic Search Another big advantage of Semantic Search engines is that they provide greater linguistic variation between job descriptions and job resumes by ‘weighting’ the value of individual keywords.
These
weighted or ‘soft’ search terms enable the search engine to find the most relevant content.
An example of how ‘soft’ keywords can increase the flexibility of a job search can be seen in the following example where the job description is for a Systems/Mechanical Engineer responsible for the design, analysis and development of optimal surfaces specifically used in gears, joints, and actuators.
Semantic Search finds unexpected matches between the job description and candidate resumes by searching for ‘soft’ keywords in the job description and candidate resumes. Boolean Search's exact match logic doesn’t allow for this kind of matching flexibility.
7 Copyright 2009, TalentSpring, Inc. (www.talentspring.com) and Arbita, Inc. (www.arbita.net).
Semantic Search – Sourcing Success Beyond Boolean Search
ARBITA
Search Term Expansion Sets The main advantage of Semantic Search engines is the ability to find keywords and phrases that expand from the original keyword(s) being searched for. Semantic Search engines do this by building expansion sets, or lists of linguistically-equivalent meanings. This capability enables the Semantic Search engine to find ‘hidden’ matches to the user’s intended search, which regular search engines would normally filter out.
The advantage of using expansion sets can be seen in the following illustration. With Boolean Search Engines, the number of potential matches is limited to only those resumes that match the specific keywords being searched on.
With Semantic Search engines, each original keyword is expanded to include many semantically identical keywords that increase the match opportunity significantly. The result is that far more matches can be found with Semantic Search than regular Boolean Search.
8 Copyright 2009, TalentSpring, Inc. (www.talentspring.com) and Arbita, Inc. (www.arbita.net).
Semantic Search – Sourcing Success Beyond Boolean Search
ARBITA
The following table shows an expansion set for a Nokia Programmer, programming for games using OpenGL language. You can easily see how the Semantic Search Engine’s expansion set offers far more match possibilities than a regular search engine would.
Semantic Search engines use Term Expansion to find larger keyword match sets, and deliver more accurate results, by including linguistically-equivalent search terms in their search sets and utilizing advanced filters and ranking algorithms to calibrate the results. For recruiters, Semantic Search term expansion enables them to find excellent candidates without requiring them to be subject matter or Boolean Search experts.
9 Copyright 2009, TalentSpring, Inc. (www.talentspring.com) and Arbita, Inc. (www.arbita.net).
Semantic Search – Sourcing Success Beyond Boolean Search
ARBITA
Three Different Semantic Approaches There are a number of approaches to Semantic Search technology worth highlighting, each with its own particular flavor of how to answer the question around extracting meaning and subtext. While not perfect, these tools are certainly good additions to your research toolkit. The three primary approaches are Lexicon, Statistical Analysis and Conceptual Search.
Lexicon- and Ontological- Based Search In the field of information technology, Lexicon refers to a specific vocabulary or list of words related to a particular domain, discipline or topic and Ontology refers to the description of concepts and relationships that can exist within a data structure. Search engines that use this kind of approach attempt to map the specific search lexicon of the search to the ontology domain.
Statistical Analysis and Pattern Matching A true semantic search system must encapsulate the knowledge of languages to emulate understanding of meaning. Because of this requirement, search engines that use statistical analysis of ranking of links, symbols, words, and clicking behaviors are not considered to be truly Semantic Search engines. However, these engines can approximate the understanding of meaning by providing close matches, particularly when the data is fairly homogenous.
Contextual Search Contextual Search tries to understand meaning of a search by inferring it from the context around the location of the data. This is usually done by: analyzing and ranking links pointing to a particular document; specializing in only one category of information (Vertical Search); extracting summaries from the results; and/or allowing the user an interface with which they can filter or disambiguate the search results (Faceted Search).
Broad vs. Narrow Match Semantic Search It is easy to get confused about which search applications use Semantic Search. Hype around Semantic Search engines powering everything from smart applications to broad-match search engines like Google and Yahoo! abound. The reality, however, is that Semantic Search technology is best applied to specific search applications that deal with volumes of contextual data. A Semantic Search engine needs to be ‘trained’ to recognize expansion lists, fuzzy match rules, and soft vs. hard keyword sets.
10 Copyright 2009, TalentSpring, Inc. (www.talentspring.com) and Arbita, Inc. (www.arbita.net).
Semantic Search – Sourcing Success Beyond Boolean Search
ARBITA
There are really two kinds of Semantic Search engines: broad search and narrow search. Narrow search engines from companies like TalentSpring are designed for specific search problems like candidate sourcing. Broad search engines from companies like Power Set or Autonomy are designed to find any kind of documentation across an organization’s electronic documentation platform. A Semantic Search engine that has been specifically designed to focus on recruiting is going to give you the most precise candidate search results. A Semantic Search engine that has been designed for broad matching will have the largest volume of results.
The effectiveness of a Semantic Search engine for candidate sourcing depends on the depth of its linguistic expansion set for candidate sourcing keywords: job titles, skill-sets, experience types, and education levels.
Targeting Semantic Search While semantic search engines are highly effective at finding deep relational matches between job descriptions and resumes, they still need to be focused, or targeted, on the intended subject.
For
example, there is a big difference between the jobs, pay, and responsibilities of a Construction Project Manager and a Project Manager that works at Microsoft. However, to a search engine, the two job descriptions look very similar, and if the recruiter didn’t highlight that one was for the Construction industry and one was for the IT industry, the Semantic Search engine would likely find matches for both jobs. The result is that the more targeting the recruiter does before the search, the more accurate the results are going to be. Good Semantic Search engines will provide a simple way to calibrate or tune search parameters.
Example Semantic Search Technologies While there are a dizzying array of semantic search technology providers and approaches, if you wish implement a semantic search solution for your recruiting effort, there are only a few options available at this time. The following sections outline both high-end semantic search solutions and free tools for recruiters:
Full High End Semantic Search Solutions for Recruiters •
TalentSpring.com takes structured information like resumes and people profiles, and matches them to employment requirements using both ontological categorization and semantic analysis.
11 Copyright 2009, TalentSpring, Inc. (www.talentspring.com) and Arbita, Inc. (www.arbita.net).
Semantic Search – Sourcing Success Beyond Boolean Search •
ARBITA
Trovix.com (now owned by Monster) primarily uses a sophisticated lexicon to match skills in resumes to requirements in job descriptions. In also learns from users’ behavior to extract and rank search criteria not included in the original search parameters.
Free Semantic Search Tools for Recruiters •
Semantic Technology Adopted by Major Search Engines: o
Ask.com applies linguistic procession to answering questions posed in natural language such as “What is the capital of Russia?” It uses link popularity to measure relevance and awards higher ranks to results from pages considered to be from experts or authoritative sources on the topic of a search.
o
Bing.com uses a little bit of everything from a lexicon for automatically suggested keywords, statistical analysis of links and words from authoritative sources, page ranking methods and categorization. Bing includes Zoomix.com, a self-learning matching technology based on learning user behavior, and aspects of Powerset.com who’s primary discovery engine asks the user to disambiguate results by clicking on results of relevant articles from Wikipedia.
o
Exalead.com has a traditional keyword search engine but also a best-in-class image search created by the categorization of image size, color and content, focusing on defining content where link analysis won’t work.
o
Google.com applies a number of techniques like page rank and link analysis, statistical analysis of relationships between keywords, and now with Google Squared it even suggests other topics using a lexicon. The related: command identifies other websites that have statistically similar content. Google also learns from user behavior and ranks results based on a user's previous click-through history.
o
Yahoo.com infers meaning from tags in the HTML and XML code. Together with Zemanta, AdaptiveBlue, DERI (NUI Galway), Faviki, Freebase and Zigtag, Yahoo created a semantic tagging format called “Common Tag” to assist users in having a common ontology for adding meaning to content via HTML tags. Tagged content becomes easier for machines to understand.
•
Deepdyve.com applies pattern matching to identify complex data found in the Deep Web (i.e., Web content within databases and other dynamic data sources not typically indexed by search engines). The result is highly relevant results displayed in ways that can be easily organized and visualized.
•
Factbites.com lies somewhere between link analysis and document summarization, taking excerpts of results and making them into meaningful sentences.
12 Copyright 2009, TalentSpring, Inc. (www.talentspring.com) and Arbita, Inc. (www.arbita.net).
Semantic Search – Sourcing Success Beyond Boolean Search •
ARBITA
Hakia.com attempts to anticipate questions that “could be” asked about a document found in its database, then ranks search results along an index that measures sentences depending on how closely they match the concept related to the search query. Hakia employs a lexicon of relationships between concepts and measurements of relevancy based on credibility and age of content.
•
Lexxe.com utilizes linguistics (natural language processing) and categorization, and works by eliminating irrelevant content, then providing visual keyword drilldowns to help derive meaning from a query.
•
Sensebot.net is a summarization engine that extracts key phrases and sentences from top results, making it less necessary for a user to drill down and click on individual links.
•
Twitter.com is a real-time search focusing on shallow but very recent content
•
Twingly.com is faceted social search focusing only on blog and micro blog content.
•
Vertical People Search like wink.com, spock.com, zoominfo.com are engines index a multitude of websites and deep web content focus only on one domain or topic. For example, Wink.com focuses on people from social networks while Spock.com and Zoominfo.com collate biographical information about people.
•
Yedda.com answers questions by combining combines natural language processing with user behavior learning.
13 Copyright 2009, TalentSpring, Inc. (www.talentspring.com) and Arbita, Inc. (www.arbita.net).
Semantic Search – Sourcing Success Beyond Boolean Search
ARBITA
Semantic Search RFP Check List With a number of technologies becoming available, here is a list of features you should be looking for when selecting a Semantic Search technology:
Recruiter-Specific Semantic Search engine. Focused-match search engines will deliver far better results than general document search engines. You will not have to “train” the engine on how to identify good candidates.
Resume List Depth: search engines that come pre-populated with job titles, skills, certifications, education and experience levels will perform far better than engines that require will first have to be trained by your company or the vendor. Look for training sets (the number of job profiles used to train the system) being greater than 10 million profiles.
User Selectable Source: This is the ability for the user to define which resume sources they want to pull from (i.e. specific job boards, social network, or the organization’s ATS system). This is an important feature with regard to controlling where your candidates come from.
ATS Interoperability: The ability for the search engine to search your existing ATS database in addition to external resume sources.
OFCCP Compliance: The ability for the semantic search engine to support your existing OFCCP process (if used by your organization).
Geographic Sourcing: the ability to specify recruiting geography (local, regional, national, etc.)
Industry Sourcing: the ability to specify which industry you are recruiting from
Marketing Module: does the vendor provide tools that enable you to either selectively or masssend recruiting ads/emails to potential candidates?
14 Copyright 2009, TalentSpring, Inc. (www.talentspring.com) and Arbita, Inc. (www.arbita.net).
Semantic Search – Sourcing Success Beyond Boolean Search
ARBITA
About the Authors: Shally Steckerl
Because of his passion for the Internet as a recruitment tool and his continually innovative methods, Shally Steckerl has developed a reputation as an authority in Internet search and a pioneer in recruitment research. Shally is also an author, internationally-requested speaker, founder of JobMachine.net, and EVP of Arbita, frequent contributor to industry forums, and global recruiting consultant for companies like Microsoft Corporation, Google, Coca-Cola Enterprises, Cisco Systems and Motorola. Since 1996, Shally has developed techniques that dramatically increase recruitment productivity and allow companies to exploit the Internet. At Microsoft, he managed the research arm of their global centralized sourcing and research team. At Google, Shally built a central sourcing organization. At Coca-Cola, he was responsible for supporting all corporate hiring managers and functional channels throughout North America, while at Cisco Systems, he was a senior member defining Cisco’s online Recruiting Strategy. Shally provides priceless insights into how forward-thinking companies are using innovative Internet recruiting techniques and intelligent technologies to gain competitive recruiting advantages.
Bryan Starbuck Bryan Starbuck is the CEO of TalentSpring, Inc. a provider of Semantic Search technology products for the recruiting industry. Mr. Starbuck as a track record as an engineering manager of working closely with Microsoft’s Recruiting department on talent acquisition focused on exceptional talent. Mr. Starbuck created TalentSpring after seeing the potential of using semantic matching algorithms on finding comprehensively matched candidates to the needs in a job descriptions. Prior to starting TalentSpring, Bryan was an engineering manager at Microsoft Corp and has a track record of shipping semantic matching related products, including working with Microsoft Research.
Mr. Starbuck has over 38 patents and a computer science degree
from UCSD.
15 Copyright 2009, TalentSpring, Inc. (www.talentspring.com) and Arbita, Inc. (www.arbita.net).
Semantic Search – Sourcing Success Beyond Boolean Search
ARBITA
APPENDIX I – Alternate Search Engines The application of Semantic Search technology is fare reaching and will be increasingly common in the years to come. Already, specialized semantic search engines for specific applications are employed on major websites such as Amazon.com and eBay to find, organize and deliver fantastic user results.
For reference, the following are examples of additional semantic search engines that are not directly applicable to recruiting:
•
Amazon.com compares user behavior to provide “similar items”
•
eBay.com’s search engine utilizes categorization, keyword search, and user behavior to catalog a vast amount of goods sold on their website.
•
ExpertSystem.net gets the closest to really understanding meaning and sentiment from both structured and unstructured data, but is available only as an enterprise search application.
•
Evri.com connects contextually relevant documents to each other
•
Freebase.com a user-generated information database collected by and for the community.
•
Kosmix.com employs categorization and content aggregation to create a directory. Kosmix tries to derive meaning by looking at the extent to which the contents of a link point to similar content.
•
MyRoar.com uses natural language processing to answer questions with a focus on financial information.
•
Swoogle (swoogle.umbc.edu) searches only the semantic web which contains highly structured data, and focuses on documents with purposely written semantic content.
16 Copyright 2009, TalentSpring, Inc. (www.talentspring.com) and Arbita, Inc. (www.arbita.net).
Semantic Search – Sourcing Success Beyond Boolean Search
ARBITA
TABLE I – Semantic Search Engine Types The following table summarizes the types of Semantic Search common search engines fall under:
TABLE II – Semantic Search Engine Types Search engines make a tradeoff between the user effort required to operate them and how structured the data being search. This table illustrates positioning of different Semantic Search engines.
17 Copyright 2009, TalentSpring, Inc. (www.talentspring.com) and Arbita, Inc. (www.arbita.net).