Like everyone, I was shocked by the news of the Westgate Shopping Mall shootings in Nairobi, Kenya.
The real shock is how they determined who to shoot, singling out individuals and asking whether they could name the Prophet Muhammad’s mother:
Reports from separate floors of the building in the first hours of the assault told how the attackers, speaking rough Swahili and English, shouted at Muslims to identify themselves. Many people came forward. They were ordered to speak in Arabic, or to recite a verse from the Koran, or to name the Prophet Mohammed’s mother. Those who passed this test were allowed to flee. Those that did not were executed, including children.
It almost seems perfunctory to relate this to banking, but it did sit firmly in my mind as I chaired a meeting around Big Data last night.
I hate the term Big Data, as mentioned before, and feel it needs a context and so here is the context.
I don’t know the name of the Prophet’s mother but, within two seconds, I can Google the answer: Aminah bint Wahb.
I don’t know a verse of the Koran but can find one in seconds online: Assalamu alaikum wa rahmatullahi wa barakatuh (May the peace, mercy, and blessings of Allah be with you).
And there is the context of Big Data: if you don’t know the question, how can you find the answer?
The discussion about Big Data last night was in the question of Fraud and Anti-Money Laundering (AML) and was a wide ranging conversation.
Big Data for fraud and AML is all about cost avoidance whilst, on the other hand, much of the Big Data conversation is about marketing and sales for revenue uptick.
Both are valid uses of Big Data analytics, but this market is nothing new.
Teradata was doing all this stuff in the 1990s with propensity modelling and data mining, with Wal*Mart their biggest customer in the world back then, with a 27 terabyte database.
The change today is that the world produces 27 terabytes every few seconds thanks to social media.
This is well illustrated by Maria Conner’s recent blog entry:
In 2012, every day 2.5 quintillion bytes of data (1 followed by 18 zeros) are created, with 90% of the world’s data created in the last two years alone. As a society, we’re producing and capturing more data each day than was seen by everyone since the beginning of the earth.
This vast amount of digital data would fill DVD stack reaching from the Earth to moon and back. To put things in perspective, the entire works of William Shakespeare (in text form) represent about 5 MB of data. So, you could store about 1,000 copies of Shakespeare on a single DVD. The text in all the books in the Library of Congress would fit comfortably on a stack of DVDs the height of a single-story house.
The world’s technological per-capita capacity to store information has roughly doubled every 40 months since the 1980s according to Martin Hilbert and Priscila López.
Given that unstructured data accounts for 80% of the data in the world, and we know much of that is from social media that gets special attention.
How much data is generated through social media tools?
- People send more than 144.8 billion Email messages sent a day.
- People and brands on Twitter send more than 340 million tweets a day.
- People on Facebook share more than 684,000 bits of content a day.
- People upload 72 hours (259,200 seconds) of new video to YouTube a minute.
- Consumers spend $272,000 on Web shopping a day.
- Google receives over 2 million search queries a minute.
- Apple receives around 47,000 app downloads a minute.
- Brands receive more than 34,000 Facebook ‘likes’ a minute.
- Tumblr blog owners publish 27,000 new posts a minute.
- Instagram photographers share 3,600 new photos a minute.
- Flickr photographers upload 3,125 new photos a minute.
- People perform over 2,000 Foursquare check-ins a minute.
- Individuals and organizations launch 571 new websites a minute.
- WordPress bloggers publish close to 350 new blog posts a minute.
- The Mobile Web receives 217 new participants a minute.
The most updated numbers are available from the sites themselves.
Well the so what test is that twenty years ago, we could not produce, search, analyse and track so much data because it was too costly.
Teradata used to refer to their systems as BFOBs (Big F-Off Boxes) and that it would be a $20 million plus investment to get one up and running effectively. Today, you can do that analysis in the cloud for peanuts.
This means that we couldn’t analyse and leverage the data in the past, but we can today. The question then is how do you do it?
Bring all the data into one big enterprise bucket, and then apply Hadoop to it?
Possibly, but that does not work in many banks as they have everything still structured in siloed boxes, some of which are segregated by law. For example, integrating the insurance data with the banking data in a bancassurance group is still claimed to be a big no-no.
That does not wash today however, and I suspect that regulations are used as an excuse for inertia rather than being a real block. After all, Tesco Bank claim this will be their major opportunity:
“In our move from retailing products to bank retailing, it amazes me that the current incumbents reward the new customer rather than the existing one. That encourages promiscuity and commoditisation. If you can reward the existing customer more than the new one, by learning more about them, then you can price your products better. For example, our Clubcard (their major loyalty program) data allows us to price our products 15% more accurately than the Royal Bank of Scotland for any particular risk type by customer segment. This means we can be the best at risk-adjusted pricing.”
In other words, integrating retail data with financial data is not a big leap of thinking. It needs to be on a permissions basis however, as I would not appreciate you offering me baby products if I didn’t know my partner was pregnant or, worse, my daughter.
Target started sending coupons for baby items to customers according to their pregnancy scores resulting in an angry man going into a Target store in Minneapolis, demanding to talk to the manager.
“My daughter got this in the mail!” he said. “She’s still in high school, and you’re sending her coupons for baby clothes and cribs? Are you trying to encourage her to get pregnant?”
The manager didn’t have any idea what the man was talking about and apologised. He called a few days later to apologise again but the father was somewhat abashed. “I had a talk with my daughter,” he said. “It turns out there’s been some activities in my house I haven’t been completely aware of. She’s due in August. I owe you an apology.”
Data analytics is the new battleground and the first step is to get the data sorted for the purpose of the question you are trying to answer.
Then there is another interesting aside: it’s not just the internal data.
As we talked last night, many of the attendees felt the hardest part would be organising the data internally, with Forrester saying that companies only use around 12% of the internal data available to them.
But what about all the external data? When people leave digital footprints built over years in Facebook, LinkedIn, Twitter, Tumblr, Flickr and more, then it makes it far easier to track individual’s histories and identify them than ever before.
That’s what criminals are finding, as referenced in the recent report by Sophos who cracked open a criminal gang using malware in Russia thanks to their social media footprints, so shouldn’t we be using these for finding the criminals who launder or defraud?
It’s obviously not a simple thing however, as building data banks that hold all the data about an individual in public domain and internally would be a massive task … but today’s technologies allow you to tackle such massive tasks. As mentioned, you can do what Wal*Mart were doing twenty years ago for a few pennies today.
I guess the conclusion is that if data is the battleground, then you need to arm yourselves with as much weaponry as possible and, for those who invest the most in their warfare, the rewards will be increased market share and decreased cost.
That’s as long as you know the question to be asked of course.
“It’s not just about looking for needles in haystacks, but removing some of the hay.” Martha Bennett, Forrester