To say big data is a “huge” matter for Wall Street would be redundant.
By 2020, researchers at Northeastern Univesity in Boston estimate that the world will have produced 44 zettabytes, or approximately 1 trillion gigabytes, of electronic data.
IDC also predicted that the market for big data and big data analytics would be more than $203 billion in the same year, up from the present $130.1 billion the research house estimates.
For Wall Street, industry analysis firm Opimas more modestly expects that the buy and sell sides will invest $7 billion to manage the plethora of data sets in 2020.
Big data, or alternative data, holds the same promise that the frontier held for settlers in the 18th and 19th Centuries: If you homestead the right place, you could reap untold rewards.
However, how do quantitative traders know that their new data sets are not the electronic equivalent of desert or swampland? It is not an easy task.
Quants already integrated the first generation of alternative data, such as credit card transactions, shipping data, and satellite imagery, Twitter sentiment, and weather data, into their models.
It is the second generation of alternative data sources like non-Twitter social media, proprietary sensor data from the Internet of Things, and other transactional data quants will find challenging.
The mere existence of big data should have quants generating new hypotheses like it is going to of style. Whether they can source the necessary data and prove that their new models can detect alpha signals is something else entirely. If an alternative data set is not large enough or lacks the history that prevents backtesting, the quant is out of luck.
Thanks to capitalism, companies are popping up to help quants prospect for viable alternative data sources. Neudata curates a portfolio of alternative data sources while ExtractAlpha helps firms source data and then model it to make sure the data set provides alpha signals.
Nevertheless, the amount of available alternative data, the novelty of new trading strategies based on it will find quants “pulling” data sets from the new sources rather than having data provider and aggregators “pushing” data sets to their clients.
As new data sets mature and prove their alpha detection capabilities, they eventually will become conventional tools.
The good news for quantitative model makers who want to monopolize their models’ returns for as long as possible is that there is more data out there that could be exploited in several lifetimes. It is just a matter of discovering it.