Following my recent blog post on big data in counter-terrorism, I was asked to provide formal input to the Joint Committee assessing the UK ‘Draft Investigatory Powers Bill’. My full submission is available here; I have adapted it below.
To understand the utility of bulk communications data in intelligence and CT, you first need to re-consider the needle/haystack analogy typically used when discussing intelligence agency use of bulk datasets.
Instead, think about how you use the Google search engine, and how much Google – like the ability of intelligence agencies to process big data – has changed over the past 15 years.
Initially, Google only allowed relatively simple search terms. Many businesses had little or no internet presence, while Google’s ‘web-crawling’ technology did not necessarily access all those that did. In short, it lacked a comprehensive dataset to query, and as a result, it was difficult to use with confidence.
These data inconsistencies meant you could not be certain that Google had access to the data you were looking for, or whether the results it pulled back were relevant to your initial query. Like the intelligence analyst Mr Binney describes, you were confronted by too much irrelevant data. Even after clicking through multiple pages of results, you might not find what you were looking for; an alternative, more targeted method (say a local phone book) was often more effective.
In 2016 however, ‘big data’ is a reality. The internet is growing exponentially and plays a central role in everyday life. As a result, the Google search engine has access to a comprehensive and growing dataset. It is in the business of ‘bulk collection’.
This does not mean that as an individual user, you are overwhelmed by data. Instead, the increase in data volume has been accompanied by the ability to ask complex and nuanced questions. This reduces the number of results your search returns, but also increases their relevance. In most instances, you get the answer you’re looking for on the first page, if not in the top result.
Similarly, while intelligence agencies in the UK have access to more communications data than ever before, by using focused queries and data filters, intelligence analysts only need to retrieve and analyse a small fraction of the overall dataset. As with Google, more data improves the quality of your results. Intelligence analysts get the data they need comparatively quickly and efficiently.
Mr Binney’s evidence also suggested that the UK intelligence agencies could choose between bulk data collection and targeted technical surveillance. The former, ‘99% useless and putting lives at risk’; the latter ‘operationally effective and reducing the privacy burden’. This is not the case.
Returning to the analogy, people typically use Google to discover new information, or to remind themselves of information that they have forgotten or misplaced. Simultaneously, they also regularly use websites or apps for a ‘targeted’ service – you visit Facebook or the BBC website because it gives you the information you already know you want.
Similarly, bulk communications data and focused data collection on ‘targets of interest’ serve different but complementary purposes. Intelligence agencies cannot exclusively focus on the latter group; they also need to discover new targets and ‘re-acquire’ targets they have lost access to. Like Google and ‘favourite’ websites, bulk data and targeted collection answer different questions in a different but mutually beneficial way. It is not an either/or question.
The suggestion that UK intelligence agencies work outwards from known targets instead of using bulk collection is therefore based on one of two incorrect assumptions: either all the individuals that intelligence agencies require access to have already been identified; or those currently unknown (or subsequently unknown) can all be discovered through analysis of known targets. Unfortunately, the world of intelligence is not that static or predictable.
It is not a question of choosing between different intelligence collection strategies. Rather, how does the UK best balance these sources and approaches from a resourcing and prioritisation perspective? What works best for the intelligence problems we face now and will face in the future? These questions do not have simple answers, hence the range of powers and proposals contained in the draft IP Bill.
Although I am unable to give specific examples of how and when intelligence agencies use different types of data, I hope this high-level overview shows that – contrary to Binney’s evidence – bulk communications data does and should play a critical role in the work of UK intelligence agencies.
David Wells worked for UK and Australian intelligence agencies between 2005 and 2014, specialising in counter-terrorism.