There's one astonishing thing that most non-quant's don't know about their data. It's that almost all fundamental and analyst ratings data-sets in the equity space are not point-in-time.
What is point-in-time (PiT) you may ask? Well let me explain what regular data is first, which we will refer to as non-PiT. Non-PiT data is data which is overwritten backwards in time when things like quarterly / annual reports and restatements are conducted.
For example, if a company restates an annual report from two years ago, the data provider will simply go back two years in time and overwrite that annual report as if the restatement was a matter of fact from the beginning. See a problem with this?
Here's another example, let's say a company finishes Q1 on March 31st, but they don't actually publish their quarterly results until 6 months later in September. Well the data provider will go back 6 months and update the quarterly data as if the data was known right when the quarter ended. You're essentially trading on future information, with the potential to greatly over-estimate your backtest.
The issue is just as prevalent with sell-side estimates data. For those of you who don't know, sell-side analysts publish buy and sell ratings on companies as well as financial statement forecasts - such as revenue and earnings. There is often lots of alpha found in the revisions of these forecasts. However, one study which compared two different snapshots of the same sell-side analyst dataset - taken two years apart, found that nearly 20% of these estimates had been altered. The authors go on to write that alterations included "recommendation levels, additions and deletions of records, and removal of analyst names." They state "the changes appear non-random across brokerage firms, analysts, and tickers, and have a significant impact on the overall distribution of recommendations across stocks and within individual stocks and brokerage firms."
Building accurate alpha signals is simply not possible with non-PiT data.
So what is PiT data? As you may guess, tracks all these detailed changes. From when a company's press release is issued, to when the official publication comes out, as well as all future restatements. It also tracks every single analyst estimate, so you know when an analyst publishes and updates their forecasts.
I can't overstate how critical PiT data is for back-tests. This represents a major source of look-ahead bias into your models. It's one thing for fundamental data, but with sell-side estimates data, analysts are incentivized to change their ratings after the fact to boost their track records.
I was even surprised to see that many pseudo long-duration equity quant firms will use non-PiT data. I'm talking massive funds with large institutional clients. They either have no idea the data they're using is tainted, they think it's not a big deal, or - what is far too common - they simply don't care or simply want to kick the can down the road.
Either way, this is something every investor should be aware of, quants and non-quants alike. Having high quality PiT data is paramount to developing robust and accurate back-tests. After all, the back-test is an indispensable tool for quants, and asset managers have a fiduciary duty to their clients to protect their capital. Having an overstated back-test because your dataset has a great degree of look-ahead bias is cheating yourself, and your investors.
Citation:
Ljungqvist, Alexander and Ljungqvist, Alexander and Malloy, Christopher J. and Marston, Felicia C., Rewriting History (April 16, 2008). AFA 2007 Chicago Meetings Paper, Available at SSRN: https://ssrn.com/abstract=889322 or http://dx.doi.org/10.2139/ssrn.889322
Opmerkingen