New research reveals that equity factor styles have existed and persisted since the mid-19th century. This is based on a proprietary US stock database that has been built by a team of Robeco researchers, led by Guido Baltussen, in collaboration with the Erasmus University. This underlines that factor premiums do not depend on specific market regimes, which is good news for long-term quant investors.
- Authors constructed a novel US stock database, from 1866 to 1926
- Low risk, Momentum and Value premiums are significant in this era
- Analysis showcases the potential of machine learning techniques
Over the last few decades, asset pricing literature has uncovered numerous equity factors, such as low risk, momentum and value, that explain cross-sectional differences in stock returns. The empirical evidence presented in support of these findings has largely relied on the Center for Research in Security Prices (CRSP) database, which houses US stock data – including returns – dating all the way back to 1926.
This sample period has been so intensively analyzed that many experts have warned that studies on factors could potentially be plagued by data dredging or p-hacking effects.1 In other words, many of the factors that seem important in-sample could lose explanatory power, or even fail to hold up out-of-sample. This issue can be addressed with a truly independent and sufficiently large dataset that can be used for out-of-sample testing.
Constructing a novel database
Regarding the latter, Guido Baltussen, Bart van Vliet and Pim van Vliet (from our Quantitative Investing team), in collaboration with the Erasmus University, have constructed a novel US stock database for the period 1866 to 1926, containing stock prices, dividend yields and market capitalization values. This huge effort, spanning over several years, entailed the hand-collection of market capitalization data, double-checking of all inputs, as well as data cleaning and adjustments for stock delistings and stock splits using digitalized financial journals. The team then merged this information with data from an external data provider – Global Financial Data – for the same period.
This ‘pre-CRSP’ sample period is of similar length to the one used in existing CRSP-based studies (61-years), and covers an economically important period that is independent to prevailing datasets. This era was characterized by strong economic growth and rapid industrial development, laying the foundations for the preeminence of the US economy. Meanwhile, the US stock market played a pivotal role in economic growth and the financing of key innovations during this phase.
The novel database provides new ground for independent tests, that can allow us to better understand return drivers and stock prices. The authors used the data to examine the cross-section of US stock returns over the pre-CRSP period in their research.2 This focused on well-documented stock characteristics, namely beta, momentum (12-1 month price momentum), short-term reversal (1-month), size and value (dividend yield).
Evidence of equity factor premiums pre-1926
The analysis started with Fama-MacBeth regressions3 and univariate portfolio sorts on the dataset. The authors found that market beta was not priced and the capital asset pricing model (CAPM) largely failed to explain asset prices, as low-beta stocks generated positive alpha and high-beta stocks delivered negative alpha. Furthermore, momentum and value exhibited significant premiums and return spreads. By contrast, size failed to do so on both counts, while short-term reversal displayed a significant premium but yielded an insignificant return spread.
The authors then built market-neutral and size-neutral factor portfolios, by double-sorting on size and a specific factor characteristic. They observed economically substantial and statistically significant premiums and CAPM alphas for low-risk (beta), momentum and value (dividend yield), while the size premium was again insignificant for both measures. In terms of short-term reversal, they saw significant premiums but insignificant CAPM alphas. The main results are summarized in Figures 1 and 2.
Figure 1 | Return spread (%), for the periods 1866 to 1926 and 1927 to 2019
Source: Robeco Quantitative Research. The figure shows the average annualized returns for the size, value, momentum, short-term reversal and beta factors for the pre-CRSP and CRSP samples. Factors are constructed from top-bottom portfolios from 2×3 size-characteristic-based portfolios. The pre-CRSP sample starts in January 1866 and ends December 1926. The CRSP sample runs between January 1927 and December 2019. Performance is measured on a monthly frequency.
Figure 2 | CAPM alpha (%), for the periods 1866 to 1926 and 1927 to 2019
Source: Robeco Quantitative Research. The figure shows the average annualized CAPM alphas for the size, value, momentum, short-term reversal and beta factors for the pre-CRSP and CRSP samples. Factors are constructed from top-bottom portfolios from 2×3 size-characteristic-based portfolios. The pre-CRSP sample starts in January 1866 and ends December 1926. The CRSP sample runs between January 1927 and December 2019. Performance is measured on a monthly frequency.
Overall, there was no material out-of-sample decay in factor premiums, as they were broadly similar in both the pre-1926 and post-1926 eras. The authors also confirmed that these results were generally robust over time, while different testing choices held up across industries and exchanges. Moreover, factor spanning tests revealed that low-risk, momentum, short-term reversal and value are non-redundant asset pricing factors, while size is subsumed by other factors. This indicates that low-risk, momentum and value are durable asset pricing factors.
Machine learning techniques offer valuable insight on stock returns
The authors also conducted an out-of-sample test of machine learning (ML) methods, that had previously been successfully applied in the asset pricing literature. For example, some researchers have argued that cross-sectional regressions and portfolio sorts can miss important dynamics and interactions between variables, such as return volatility and price momentum. These researchers found that ML models (random forests and neural networks that allow for nonlinear predictor interactions) could predict cross-sectional differences in stock returns over the period 1957 to 2016.
However, this sample period coincides with the CRSP era. And ultimately, ML models also require out-of-sample testing in independent samples, similar to traditional factor tests. The authors therefore applied what they viewed as the most promising ML techniques (random forest and neural network models) to the new 61-year sample period. They noted that the ML methods also worked in the pre-CRSP stage, as both models delivered significant CAPM alphas. As such, the research outlines that ML tools offer valuable information in terms of understanding the cross-section of stock returns.
In conclusion, this deep historical research underlines that factor premiums are not very dependent on specific market regimes, nor specific market structures. Instead, they are probably an ‘eternal’ feature of financial markets.
1 See: Harvey, C. R., July 2017, “Presidential address: the scientific outlook in financial economics”, Journal of Finance.
2 See: Baltussen, G., Van Vliet, B. P., and Van Vliet, P., November 2021, “The cross-section of stock returns before 1926 (and beyond)”, working paper.
3 See: Fama, E. F., and MacBeth, J. D., June 1973, “Risk, return, and equilibrium: empirical tests”, Journal of Political Economy.
4 See: Gu, S., Kelly, B., and Xiu, D., February 2020, “Empirical asset pricing via machine learning”, The Review of Financial Studies.