Alberta Innovates Technology Futures R&D final report

Preamble

My buddy’s brother is the executive chef at a high-end restaurant near where I live. He’s passionate about his work and his industry. His role has enabled him to experiment in innovative ways. For example, he recently turned an old refrigerator into a smoker. He smokes salmon, bacon, and whatever he feels like. One day I’m sure he’ll move on to bigger and better things. When he does, he’ll take his smoker, his menus, recipes, and whatever else he created in the time he was employed. If he were a programmer, he’d be straight up robbed of all that stuff.

Programmers get screwed all the time. We typically agree to work for a salary and not compete in an industry for two years (or whatever). When we leave a position, we have nothing to show for our time, because we can’t take our work with us. When we go on to a new position, we find ourselves recreating some variation of the same old boring wheel that our previous employer had us build. That’s stupid, and it doesn’t help anyone, employer or programmer.

Programmers are writers. Writers get royalties, but programmers don’t. Don’t sweat it though, we’ve got something better: open source software. That follows us wherever we go. No one owns it. No one can take it from you. You can deploy it anywhere and not get robbed by old-fashioned, short-sighted employers who think software is a CD ROM sold in a box.

If you’re the keyboard-banging-Shakespeare-monkey-type programmer, nothing past this point is going to make any sense. If you’re the type of programmer who creates worlds with words and uses computer language as a medium of self expression, don’t be a sucker - develop open source until your reputation allows you to impose your own licence agreements. Sign up for GitHub and get started now.

In 2013 I was awarded a very nice grant to conduct research into semantic search. As part of my obligations to the funding agency, I was required to file two progress reports over the two year funding period. What follows is the final report. I post it for anyone who’s interested, though it’s especially relevant to programmers. I may have got screwed a little, but at the end of it all I kind of feel like Jacob in his dealings with his uncle Laban.

Behold the power of open source!

Introduction

It is with some consternation that I submit this document, my final report, covering my research into semantic search under the auspices of HireGround Software Solutions. Sadly, this research was cut short after being illegally terminated from my position prior to my return from provincially-protected parental leave. As such, the bulk of this report will review the activities that took place during my tenure, but prior to my termination. There is also discussion of the work I conducted independently while on leave and the avenue I would have pursued had I not been forced into this unjust and senseless situation.

Motivation

HireGround develops and supports human resource software designed to expedite candidate selection. The acquisition and processing of candidate resumes can be an expensive and resource-intensive challenge for organizations, especially when received in large volumes. Positions for which there is a large talent pool from which to draw can attract the interest of hundreds, or even thousands of candidates. Determining the best people to fill these positions is an arduous task. Even when the number of applicants is relatively small (say less than 20), hiring managers are tasked with sifting through data that is inconsistent in format and quality. The nature of the material that drive their decisions - resumes written in natural language - provides a fruitful environment in which to apply semantic search techniques.

Year One

Two challenges were addressed in the first year of funding:

  1. Applicant matching: processing large volumes of resumes in an efficient and consistent manner so that the most qualified applicants, as measured against a given job description, were immediately identified.
  2. Named Entity Recognition: analyzing resumes so that a candidate’s personal information and qualifications could be identified and structured. Had my research been allowed to continue, this would have provided the context with which existing semantic search techniques could be improved and adapted for new purposes.

Applicant Matching

The applicant matching technique I developed was my first contribution to HireGround’s flagship product: the StartDate Application Tracking System (ATS). StartDate provides the means with which hiring managers track and process job applicants. As previously stated, some positions attract a lot of interest. My applicant matching technique simply compares incoming resumes to the job description provided. The resumes that most closely match that description are the ones that float to the top of the list. This technique is akin to what hiring managers do manually: identify the key requirements of the role to be filled and match them to the skills stated on an applicant’s resume. The closer the match, the more likely an applicant is qualified to fill the role.

Here, manual processing is mechanized. The need for a manual, human-driven search is eliminated in favour of speed, consistency, and economy.

Named Entity Recognition

Though effective, the candidate matching technique applied above is crude. In partnership with my co-worker, Dr. Yanfen Hao (NSERC), we developed the RESume Named Entity Recognizer (Resner). Dr. Hao implemented a language model tailor-made for extracting applicant information from resumes. I was tasked with preparing the product for market and ensuring the extracted data was structured in accordance with the HR Open Standard.

As with human understanding of a given text, the machine’s treatment of natural language must be decided within context. Resumes are an interesting literary genre (of sorts), because each of the sections therein provide a unique context that demands its own understanding. Prior to parental leave, Resner had been integrated into the StartDate product and was hosted at resner.ca. Sadly, at the time of writing, Resner is no longer available at that domain.

Year Two

The first four months prior to my parental leave were spent focused on Resner. One month of which was spent preparing HireGround’s developers to maintain the software while I was on leave. My intention, upon returning to my position, was to implement a novel search technique designed to exploit the context Resner provides. Conceptually, the implementation was worked out, but never formalized or proven in software while present at HireGround (more on this later).

Parental Leave

I include a summary of my activities while on leave to underscore my dismay at the senselessness of my dismissal. Though not exclusive to the field of human resources, I was able to develop some basic tools necessary to the application of Natural Language Processing (NLP) and semantic search techniques. These tools were cobbled together exclusively from open source products freely available to everyone. They are delivered as services provided by artificial agents, because though I am passionate about NLP, my background is in Multi-Agent Systems (MAS).

docto.io

Applying NLP techniques to typeset documents (PDFs, DOCs, etc.) first requires extracting plain text. This is also true of images (GIFs, JPEGS, etc.), which require additional treatment via Optical Code Recognition (OCR). docto.io performs these extractions alongside general purpose file conversions. It even does this for images of paper documents stored in PDFs (for example).

By itself, docto.io is nothing special in terms of research and development. It is, however, a vital tool for those conducting research into NLP and semantic search. The best part is that it has no proprietary components. Everything of which it is comprised is entirely open source, including its OCR capabilities.

whatidid.info

whatidid.info is a domain I maintain to showcase product prototypes. At the time of writing, the software deployed there allows you to search disparate documents indexed from a variety of sources: i.e., typeset, web, and images of text. By itself it is powerful tool and a potentially viable cloud-based commercial product. Like docto.io, its constituent components are exclusively open source. Currently, it allows you to search a handful of documents obtained from the Alberta Hansard Office and a couple of business cards photographed and processed through my phone. Administrative access allows you to add any document you want from any source. whatidid.info is dynamic and changing, but the functionality described will be operational for the next couple of months so people can try it out.

The software behind whatidid.info is driven by an artificial agent. It is one of two agents in a two-member MAS. That is, all documents submitted by whatidid.info administrators are first processed by the agent behind docto.io. There is no limit to the number of agents that could be included in such an MAS. The technique behind this is documented in my masters thesis and has huge ramifications for administrative and bureaucratic roles, many of which could be fully automated.

The software hosted at whatidid.info is also significant because it provides the framework through which I would have implemented the new semantic search technique I had planned for the time remaining in my AITF funding period. Provided adequate business interest, I may still do this. One viable possibility is enabling people to search engineering log files in the oil and gas industry.

Gratitude

First, I would like to thank my boss, Marilynn Kalman of HireGround Software Solutions. It was a big mistake to terminate me while on parental leave. Not just because of the legal consequences, but because in all sincerity I wanted to make your company a success. I hope you do succeed and leave a lasting legacy that will provide for your staff for years to come. I thank you for this amazing, life-changing opportunity, which, if nothing else, showed me I can develop my own products. I don’t know what the future will bring, but it is my hope that I will never have to be an employee again (unless they pay me three times what you did).

Second, I thank the good folk over at Alberta Innovates Technology Futures. You made this all possible and I hope you agree that my time as an r&D Associate was a roaring success. (Why the wonky capitalization?) I have created tools that may benefit others given the same opportunity. I am slowly venturing into Calgary’s high-tech start-up community in search of contacts, talent, and mentors. I suspect I will cross paths with the agents of your organization and hope to engage you in the future, in whatever capacity.

Sincerely,

Daniel Bidulock, M.Div., M.Sc.