In a fascinating 2016 study, researchers used machine learning algorithms to predict the intelligence of Facebook users based on their profile photos. They developed and trained their algorithms on the photos of over 1,000 users who also completed a cognitive ability test.
So what features of the photos did the algorithms pick up on? It turns out that more intelligent people tend to:
Or as the researchers stated: intelligent people seem to understand that a profile picture is most effective with a single person, captured in focus, with an uncluttered background.
While these conclusions could be drawn, the predictions generated by the algorithm were fairly inaccurate. If 0% represents a prediction that provides no information at all, and 100% represents a prediction that agrees exactly with a standard cognitive ability test, then their algorithm would score about 7%.
Still, that level of prediction is enough to get people thinking. The past 10 years have seen an explosion in the amount of information that people share about themselves online. There are photos, likes, tweets, comments, posts and shares. There are data on where we go, what we buy, who we know and what we do. There is, potentially, a great deal about people that we can deduce from all of this data. There are also some compelling applications in the recruitment and talent management spaces.
What can social media tell us about prospective employees?
One possibility is that we will be able to predict who will fit in and succeed in a role without any applications, interviews or checks. The information we need will already be there, in the online trail that people leave on LinkedIn, Facebook and Twitter.
There are, however, some built-in limitations to what’s possible in digital trace analysis. We’ve already seen that profile photos can’t tell you much about someone’s cognitive ability or intelligence. Studies of Facebook likes have found similar results. While it is the case that higher ability people tend to “like” The Colbert Report and lower ability people tend to “like” Harley Davidson motorcycles, the accuracy levels attained by these predictive formulas are still low.
The shortcomings of digital trace analysis for predicting abilities and aptitudes stem from the nature of behaviour on social networks. You don’t need to be highly intelligent to hit the “like” button. By contrast, you do need to be highly intelligent to solve the puzzles that a cognitive ability test will present you with. Differences like these put a fundamental limit on how much ability and aptitude information you can get from digital trace analysis (although I may yet be proven wrong on this point).
Better success has been attained, and more is possible, in predicting preference and personality traits. A summary of the research published this year found that personality predictions based on digital trace analysis provide about 12% of the information you’d get from purpose-built personality assessments. That’s still a big gap, but the prospects for improvement here are clearer. People completing a personality assessment typically answer in terms of their behavioural preferences. Those same preferences are probably reflected in what people “like” and “share” online.
Extraversion – how outgoing and social people are – is one personality trait that is relatively well predicted by online trace data. In one study, highly extraverted people “liked” topics such as beer pong, dancing, cheerleading and theatre. Less extraverted people “liked” topics such as anime, programming, fanfiction and Minecraft. Studies of job performance have shown that outgoing and extraverted people can be more successful in sales roles. Potentially, candidates for sales positions could be prioritised based on their on-line behaviour, although prediction algorithms would need to improve on what’s currently available for this to be practical.
The highest levels of predictive validity are attained in the areas of job and occupational choice. It’s this kind of data that LinkedIn uses to recommend vacancies to users. One of the advantages of LinkedIn is that the data supplied by users – previous roles, qualifications and skills, is naturally linked to job and occupational choice. Additionally, the behaviour of users – clicking on job ads, updating their employer details, etc., provides a rich data source to develop and validate predictive models.
In a rough order of predictive accuracy social media can tell us:
The lack of predictive validity around ability data suggests to me that there will continue to be a place in the recruitment and selection process for specific tasks designed to test whether people have particular abilities or aptitudes. It’s in this area that Revelian is continuing to invest, as it seems to be at this stage the domain most resistant to passive trait measurement on-line.
What’s the flip side?
Finally there is the ethical question. Do we want to live in a world where the things that we have chosen to share online are used to make predictions about the things that we have chosen not to share? Would we be happy with recruiters (or insurance companies, banks or law enforcement) making decisions about us based on what we “like”, “follow” or “share” on-line?
In an exploration of the privacy issues that can arise, researchers have found that online trace data can be used to predict information such as our relationship status, sexuality, political leanings and drug use.
Some of my thoughts on the ethics of digital trace analysis:
1. It’s important to respect the purpose for which on-line information was provided in the first place, and the context in which it was provided.
The fact that a person has made some of their data public does not constitute their consent for that data to be used for any purpose. These issues are likely to intensify as new generations grow up having been on-line their whole life. Another factor is the fact that some information about people on-line has been posted by others without their consent. Using data like this to make important decisions about a person raises serious ethical concerns.
2. Transparency is essential.
If, in the recruitment process, you are going to use on-line trace data, or if you are contacting someone because of a prediction from their on-line trace data, then you do need to be open about the usage of that data.
3. Avoid the black box.
Some Machine Learning algorithms are so complex that the reasons for which certain decisions are made, or the features that led to those decisions, cannot be explained. If you are using predictive analytics to make important decisions about people, then you need to be able to describe and defend the decisions you make. If your model doesn’t allow you to do this, then you may need a different and more accessible model.