Forget needles and haystacks – it’s more complicated than that

Posted on

If the opposing sides in the mass surveillance debate agree on anything, it is the analogy that best describes the job of intelligence agencies. From Edward Snowden to UK Home Secretary Theresa May, the case for and against bulk communications data (particularly in counter-terrorism) rests on the best way to find ‘a needle in a haystack’.

As someone with recent experience using bulk communications data to identify and disrupt terrorists, I’ve always been struck by the gap between this analogy and the reality of my former day job.

Firstly, the similarities. Terrorists are usually extremely difficult to find. And the global nature of the terrorist threat means that you are searching for them across a large area.

But the analogy suggests a laborious, repetitive and unskilled process; an individual manually wading through a stack of hay to find a tiny needle. Using traditional methods – as a 2014 conceptual art performance demonstrated – this is extremely time consuming.

Needle in haystackYet in 2016, there are smarter, more efficient ways of completing this task. Faced with one or a number of haystacks containing one or a number of needles, I could hire a metal detector on my smartphone. Or outsource the job via Airtasker.

As I’ve argued previously, not only is the process radically different, the nature of the search is also entirely different. A metal detector might find other metallic objects within the haystacks, but once the needle is located, your job is complete. There is no nuance, little uncertainty about the items you recover, and no ongoing requirement to continue to search through the stack.

Why does an analogy matter in the context of the mass surveillance debate? Because as with most analogies, the aim is to make a complex issue easily understandable. And in mis-characterising the work of intelligence agencies, the analogy skews the debate in favour of those arguing that ‘mass surveillance’ is not just invasive but ineffective.

A good example of how this works is this piece by Colleen Rowley, a former FBI agent and whistleblower. She concludes, “if you’re looking for a needle in a haystack, how does it help to add hay?”’

If that was the task in front of intelligence agencies, the logic is difficult to argue with. And following similar logic to Rowley, many use this argument to suggest that bulk data collection makes terrorist activity more difficult to spot and prevent.

Yet it isn’t clear to me what the needle or the haystack are in this context. Is the needle:

  • the identity of a terrorist or potential terrorist;
  • the communications activity of the terrorist or potential terrorist;
  • or actionable intelligence on their terrorist activity (who/what/when/where/how)?

In reality, the answer should be all of the above. But unfortunately, unlike needles, intelligence targets aren’t static. They rarely operate or exist in isolation. And they use multiple communications devices and services, each of which route and secure data in different ways and different locations.

Conceptually, this is critical. 99% of the time, the answer to your intelligence question will not sit in one haystack/data source. Good intelligence and good counter-terrorism requires the use of all available intelligence sources to locate and fit together the different pieces of the puzzle. And most importantly, doing so before something goes bang.  

Turning to ‘the haystack’ then, this appears to refer to all data currently collected by 5 EYES intelligence agencies. Critics can (somehow) say that it contains all potential needles. Yet the intelligence agencies are seeking additional data that either doesn’t contain needles, or is unlikely to contain any needles in future. And will make their search even more difficult.

This argument is largely based on retrospective analysis of a number of major terrorist attacks, including 9/11, Paris, and the failed ‘underpants bomber’. In that much-abused phrase, the attackers were ‘known to authorities’. Actionable leads weren’t followed up, dots weren’t joined and mistakes were made. Why would any intelligence agency want to add more data when they’re apparently unable to cope with existing volumes?

Here’s where the logic gets fuzzy. As every analyst knows, correlation does not imply causality.

Yes, terrorist attacks occurred on the watch of intelligence agencies with bulk data collection powers. But does that mean that those powers were the cause of their failures? And how has bulk data contributed to their successes?

This argument also confuses data collection with data analysis. And assumes a direct correlation between data volume and the analytical resources it necessitates.

Different sources or datasets will be relevant for different requirements and at different times. Far from adding more and more data sources ad infinitum, intelligence agencies should (and do) continually evaluate their usefulness to deliver maximum value with the resources at their disposal.

But in intelligence, collecting the right data does not guarantee success. It also takes good analysis, asking the right questions, timely information sharing, and collaborative national and international partnerships. And sometimes, a bit of luck.

This may not translate into a simple, easy-to-relate-to analogy. But given how critical and complex the debate about the future of big data and surveillance is, should we really expect it to?

David Wells worked for UK and Australian intelligence agencies between 2005 and 2014, specialising in counter-terrorism.

Advertisements

6 thoughts on “Forget needles and haystacks – it’s more complicated than that

    Margot ONeill said:
    March 3, 2016 at 8:22 am

    Love your posts Dave gotta say! Cheers Margot

    Sent from my iPhone

    Like

    Naradaian said:
    March 3, 2016 at 8:55 am

    Thanks for your update, my first. Regarding terrorists hard to find, seems logical on the surface but the Belgians reportedly had photos & all points alert for the main Paris suspect long before Paris.
    Then he flies to Paris!

    In Ulster a fortnight ago… 207 arrests… 200 on intel payroll one way or another

    I’d need much more convincing. The whole Gladio thing is much discussed with the red brigade

    As I’m new I wish to raise these questions in pursuit of all our wellbeing, but tis a steep curve.
    Salaam Shalom Mir

    Like

    AlastairC said:
    March 3, 2016 at 12:27 pm

    Hi David,

    Thanks for posting this, I’ve used the haystack analogy myself (in the ‘adding more hay’ sense).

    I appreciate that the security services will need to assess different sources, but the problem is at the legislative level, what should be available?

    It is very difficult to make an informed choice without knowing:
    – How many cases (primarily terrorist in this context) have been solved due to bulk data that would not have been solved with more targeted methods.
    – How many cases haven’t been solved because of the connect-the-dots problem (the ‘known to authorities’ ones).

    My gut feeling (because that’s all we who are on the outside can go on) is that a more targeted approach such as that suggested by Binney would be a better balance of investigative power and personal security/privacy: http://www.theregister.co.uk/2016/01/06/gchq_mass_spying_will_cost_lives_in_britain/

    Any thoughts/experience on that?

    Like

      AlastairC said:
      March 3, 2016 at 10:05 pm

      Sorry, just seen your previous posts on the topic, you have covered Binney’s approach extensively.

      The google analogy is useful in your IP Bill submission, I see the advantage of having a large dataset that can be queried, and how that informs other methods & vice versa.

      To continue the analogy, Google allows you to opt-out of it’s index with a robots.txt. There is still the question of whether the Government should be allowed to suck up everything for this index, I’m don’t think that has been answered democratically.

      I’d also consider the cost, Google has over 2 million servers (guestimated), and this index would need to cover more. I know it is mostly meta-data, but that still stacks up. I’m also very uncomfortable with ISP’s being forced to store all browsing meta-data given their security record.

      The final question for me is oversight. There are people sitting in front of a query engine that can look up just about anything on anyone.

      The “Judicial Lock” in the IP Bill is simply about process, I would want that to cover the justification for investigations as well, with the ability to block it. We know about ‘loveint’, Government considering organisations like Amnesty International as targets, and the immigration officer who put his wife on the no-fly list.
      Without transparency (impossible) or oversight, that power will be abused.

      You did show how bulk data could be useful, but without the ability to answer the ‘how many cases’ types questions above, we can’t see the justification.

      Like

        David Wells responded:
        March 3, 2016 at 10:37 pm

        Hi Alastair – thanks for your comments. I think one of the most challenging aspects of this debate is separating the foreign and domestic elements.

        There are legitimate concerns about what information a Government should have access to in the latter context, particularly because there are many more alternative investigative options. The ICRs is a case in point; I can see that it could be useful from an investigative perspective, but does that benefit really justify access to an entire population’s data? For me, that’s a difficult sell.

        Internationally however, your options are much more limited. As some of the broad Home Office examples make clear, your alternative options are limited, particularly when HUMINT operating conditions are hostile and there is either a hostile Government or no functioning Government. That’s where the real value lies.

        Unfortunately, intelligence is rarely as clear cut as X was prevented because of Y. Or, this many operations were enabled by X. That’s one of the points that I think is really important. It’s impossible to say how an operation would have progressed if a particular phone number hadn’t been identified by bulk analysis for example, because that sets in train multiple streams of investigation. Similarly, you’d struggle to provide stats for HUMINT or any other intelligence source.

        The Government has tried to do this with their new justification for bulk data, but it is incredibly challenging!

        More broadly, I think based on my own personal experience, there would be support for greater transparency and oversight within the community, because it would remove some of the unwarranted suspicion that hangs over the UK intel agencies. That said, I’m not sure they could ever be as transparent as some people wish, given the equities and sensitivities in play.

        Like

    […] O problemach z lokalizacją terrorystów […]

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s