The mining of data sources is an essential component of modern trading strategies — and topic tags are key, helping machines mine unique datasets for market-moving signals.
The rise of algorithms is giving Wall Street a makeover as traders and investors look to high-powered machines and unique datasets for market-moving signals.
All this makes the systematic analysis of news stories an appealing idea. Businesses were quick to embrace news and social media analytics as a data-driven tool for the brand management and targeted advertising.
Today, that same content is processed by quantitative hedge funds with higher precision and faster speed to uncover predictive signals that can be used for better trading decisions.
Textual news is usually processed with natural language processing (NLP) techniques, a computer science field that has been around for decades.
One example of an NLP task is sentiment analysis, where each news story can be classified by its underlying tone to decipher potential impact on a stock's price.
For example, a news article about-better-than expected quarterly earnings might get scored as having positive sentiment and lead to a pop in the stock price whereas a news article about an analyst downgrade could be scored as negative and result in a correction.
Accounting for context
With the greater availability of open-sourced NLP toolkits and services, it can be tempting to build the system by hooking up an off-the-shelf algorithm with an aggregated live newsfeed.
Yet, problems arise when machines attempt to interpret specialized language. A word like “magnificent” would normally be declared as positive by most algorithms even if it appears in the context of Magnificent Hotel Investments, according to Ivailo Dimov, Quantitative Researcher and Data Scientist at Bloomberg.
Without specialized training in financial-oriented domain knowledge, most general purpose NLP algorithms fail to note the subtleties, which can lead to skewed sentiment scores and fatal results in trading performance. “If you can’t discern whether the text applies to a company or business situation, it can result in noisy and erroneous data,” says Dimov.
In addition, stories are also tagged with a rich set of topic tags to further categorize content characteristics and themes, such as technology <TEC>, analyst changes <ANACHANGE>, or downgrades <ANACUT>. “With topic tags we can gather more relevant information about sentiment than the raw text itself,” says Dimov.
Improving group code through component analysis
Since Bloomberg collects and internalizes data from a wealth of sources, they have over time developed a robust solution to generate topic tags with greater accuracy. In most cases, a given news story may have more tags than are necessary in an attempt to capture all relevant information while avoiding potential errors. Meanwhile, the entire topic taxonomy contains tens of thousands of unique tags, with heavily skewed long tail distribution.
This presents nontrivial challenges when one tries to utilize topic tags to further enhance sentiment-driven strategies. Proper dimension reduction is needed to associate tags of similar meanings so that they can be treated holistically as a group. However, traditional techniques such as Latent Semantic Analysis bases the analysis solely on term co-occurrences, which turns out to be very noisy due to the high dimension, parsimonious distribution. As a result, it tends to group topic tags together even though they don’t exhibit a clear logical relationship.
Ivailo and his colleague Daniel Lam, Senior Quantitative Researcher at Bloomberg, together developed a novel mathematical approach called π-component analysis to better understand and group the codes in a maximally cost-effective, parsimonious way.
When combined with sentiment analysis, groups of topic tags identified by π-component analysis systematically show stronger impact of sentiment on the prices of certain stocks – evidence that structured news sources play a valuable role in the search for alpha.
Access the full white paper series to learn more about sentiment impact on stock prices with topic codes.