Another report (this time from Tower Group) is highlighting the likely increase in data volumes that enterprises will need to handle over the next couple of years – a 900% increase in financial market data by 2012.
Of course, increasing amounts of data flowing around the enterprise is hardly new (in the mid 1990s, I remember being told in hushed tones at one of the baby bells that their databases were several terabytes in size). Each time enterprises have been faced with such an increase, decisions have been made about what needs to be done: Is the focus on integration, storage or analysis and can we use existing technologies or do we need/want new technologies?
In the case of The Tower Group report, the focus is on financial services and in particular the increase in market data handling requirements caused by new regulations (the EU’s MiFiD and US’s Reg NMS – both intended to create fairer markets). Specifically, there is a regulatory requirement to store the data (in a manner which can be used as evidence if there is ever a compliance issue). As these regulations relate to best price execution, this in itself will be onerous because every single published quote available when each trade is executed must be stored as well as all other relevant data associated with the trade.
Similar scenarios (where there is a flood of data or a potential one) exist in many industries – from the online games with millions of players generating massive amounts of data to security monitoring devices for government agencies to RFID tags. However, it is too easy to react by announcing that the data deluge is coming and some new technological surfboard is needed.
For a start, in many cases the data may have little potential value and storage may not even be needed. But of course this will not always be the case. To quote from the Bob’s guide coverage of the press release:
“Regulatory compliance will pave the way for firms to completely automate the trading process,” added Price. “Once all the players have done so, the winner in the hunt for liquidity will be whoever can process the data the fastest.”
To put it another way, in the case of market data there may be business justification for doing more than storage for forensic reasons either in the realm of integration or analysis of the data streams. In my other examples there will also be opportunities – again in the realm of integration or analysis.
The second hurdle is whether new technology is needed.
Simply saying that there will be an increase in data and a need to do something with it is not enough to justify new technology as old technologies may continue to scale. Going back to my baby bell experience, this was around the time when Object databases were being touted as the relational db killer – as only Object databases could possibly handle that size of database.
In the case of the “data deluge”, the real-time processing of streams (when the value of the analysis is high enough) is certainly one area where a new approach seems needed – and is an area targeted by the likes of StreamBase and Progress‘ Apama products. For analysing the increasing massive data warehouses and for integration, I think the jury is still out – although the increasing complexity within the data may throw a spanner in the works.