Enterprise search and discovery algorithms are often perceived as objective and neutral helping us overcome our own biases, even if they don’t always produce what we want or need. The Cognitive Computing narrative is one where machines read vast amount of text to compensate for human cognitive bias and potential organizational dogma. The mantra is not to produce the ‘right answer’, but the ‘best available’. But can search algorithms be truly objective and unbiased themselves?
Search Engine Bias
Various phenomena that involve the manipulation of search engine results are typically referred to as search engine bias. Just like bias however, is not easy to define and can be hard to detect. What is the difference between bias and a point of view? Take the incompatible statements, “It is a truism that every author is biased in favour of the claim he is making.”, “Bias and prejudice are forms of error”.
There is avoidable bias (such as promoting a narrow partisan view when a broader non-partisan view ought to be taken), there is technical bias (such as related to sampling) and unavoidable bias (such as news reporting). This is not to criticise news reporting, but to guard against any view that reporting can be absolutely neutral. It is proposed that many aspects of search engine ranking is an unavoidable bias, the danger (just like news reporting) would be to view it as a neutral rendering of data. It may be better to talk in terms of pre-dispositions.
Search Engine Optimization
Search ranking involves automated and human interventions according to some design parameter choices (Sometimes weightings are called ‘bias values’). Some content will be promoted and other content marginalized. Search Engine Optimization (SEO) is an iterative process to maintain/improve the search result quality that may see some content rise and others fall as a result of changes. Some scholars have sought to measure bias of web search engines by their deviation from a relative ‘norm’ of their peers. In previous articles and research papers, I have discussed the positive elements of using search algorithms designed to specifically stimulate the unexpected, insightful and valuable: nudging search engines into the role of creative assistant, rather than just a time saver. This article looks at the predispositions (bias) that may be inherent in search algorithms.
How we come to know things
Internet search engines are ubiquitous, they have become an epistemology, ‘how we come to know things’ which raises ethical issues. This has prompted further scrutiny, to understand to what extent search algorithms and human interventions are truly unbiased. Nevertheless, some people argue that a search algorithm can never be neutral. Behind every algorithm is effectively a person, organization or society which created it, that is likely to display biases of some form, so any rendering by search engines is value-laden not value-free.
Algorithms themselves often incorporate query rules and Knowledge Organization Systems (KOS) such as taxonomies or ontologies. These KOS are ‘one version’ of reality and whilst they can enhance information discovery, these schemas may also reinforce dogma and potentially blind us to new discoveries. Whilst some indicate new cognitive computing techniques allow us to evaluate without bias, It may be a falsehood to say automated systems lack any bias.
Another aspect utilized in algorithms is explicit social voting (within sub-cultures and societies), creating a form of ‘standardization’. The more an item is viewed (clickthrough) in search results in context to a specific search query, the more popular it is perceived. Some information may therefore be ‘censored’ through its obscurity, where relevance is not determined by its usefulness, but by its popularity, which may reinforce existing power structures and stereotypes. Items at the top of any search results list may exhibit ‘presentation bias’. Once some items get a high search rank (85-95% of people never click on page 2 of search engines), a self-fulfilling prophesy may come into effect (the Matthew Effect), the rich get richer and the poor get poorer.
Some algorithms also make use of user context (such as location and previous searches), a form of ‘personalization’. Some scholars feel that personalization has/will mitigate search engine search results ranking bias, producing tailored results. At the same time, individually tailored results unique to each person may place the searcher in an over-personalised filter bubble.
Technical Sample Bias
In addition to these ‘standardized’ and ‘personalized’ aspects of algorithms, there is technical bias related to the sample in the search index corpus. If the text within the search engine corpus is itself skewed then you will have a classic case of sampling bias. This may explain why the ‘Bing predicts’ big data algorithm that followed the United Kingdom’s referendum on the European Union (EU) predicted a vote to remain by 55% on June 23rd 2016. Social media trends may not reflect everyone’s opinions, the corpus may be prejudiced. Like any models, they can be true until they are not. Significant failures in Google Flu trends algorithms is another example, with some stating that ‘algorithm accountability’ may emerge to be one of the biggest problems of our time.
In addition to automated rules and signs, search algorithms also undergo constant evaluation and tweaking by people in the background, with ratings generated by people judging how ‘good’ results are. It is therefore unlikely for search results to be completely untampered with in some way.
Power to Influence Elections
Taking a more sinister turn, studies have shown that manipulation of search result ranking in Google could potentially affect people’s attitudes towards health risk, without people being aware they were being fed biased information. Some scholars provide evidence that manipulation of search engine algorithms could even influence democracy in country elections. Evidence appears to exist for search engines biasing results both towards the left and right during elections, although (arguably) big data may make it easier to find evidence to support any particular point of view you wish to take.
Bias in Enterprise Search
Recent research involving three separate enterprise search technology/deployments, points to algorithmic bias also existing behind an organizations firewall within enterprise search and discovery technology. For example, enterprise search technology from at least one software vendor, had default ‘factory shipped’ search ranking configuration parameters, that gave preference (ranking boosts) to its own document formats, above that of the formats of its competitors.
Other examples in the enterprise include a bias in some ‘factory shipped’ enterprise search algorithms towards their country of origin. For example, in one search engine that automatically geo-references search results to display on a map, any document containing the phrase ‘west coast’ was assumed to be about California. In another deployment that had indexed third party information, algorithms were designed to favour small information providers rather than large ones, simply for performance reasons; a case perhaps of an enterprise search algorithm making arbitrary ‘editorial’ choices.
It is commonplace in Enterprise search deployments for engineers with the best intentions to over-ride automatically generated organic search results using promoted results (often termed best bets) and tweak results through user defined ‘gold standard’ test sets and search log mining hunting for better search results quality. Some search engine practitioners state engineers will have no idea what relevant results are, so involving users/customers to rate results is essential. Some organizations that have performed these types of search evaluations and tuning with test sets of documents, made comments during enterprise search conferences, that what one expert user feels is the optimal set of results for a search term, can often be significantly different to another expert in the enterprise.
Filtering of results is also commonplace within enterprise search deployments and SharePoint search, to remove/hide results deemed undesirable, inappropriate or not useful, using negative filters of ‘dirty words’. For example not showing results where the word ‘conference’ is mentioned. It would be an interesting question (dilemma?) if management in an organization ever asked their enterprise search team (using the latest machine learning techniques) to ‘hide’ search results for any content it felt portrayed the company in a bad light – such as comments made on internal company enterprise social blogs by staff about its HR policies. Some may feel this is acceptable information governance practice, others may feel it is unethical practice.
For a variety of reasons (such as complexity and trade secrets) it may not be possible to ever fully understand what enterprise search algorithms are doing and the intent behind them, although some standards exist (such as Okapi BM25). Due to this opacity, a significant amount of trust is therefore placed in the hands of those that design and deploy search algorithms. Adopting a position of unconditional faith in algorithms may pose many risks. Increasing awareness of what biases already exist (through accident or design) or could exist in the future, might be a prudent step to take.
As we are all predisposed to certain views, it seems likely that search engines will be as well.
Paul H. Cleverley
Researcher, Robert Gordon University