Discriminatory Power of Consensus Ratings

Posted by
Credit Benchmark Research
on
November 12, 2025

This report and analysis was produced by Collin Boler of Princeton University in collaboration with Matthew Noll and Ryan Hoffman of Credit Benchmark.

An analysis of Credit Consensus Ratings prior to default.

This analysis evaluates how Credit Benchmark’s Credit Consensus Ratings (CCR) compare to S&P Global Ratings in identifying default risk at the time of S&P’s default declarations. Using the Gini Ratio — a robust statistical measure of a rating system’s ability to discriminate between defaulting and non-defaulting entities — the study assesses CCR’s predictive power. A Gini Ratio approaching 1.0 signifies stronger performance. To ensure valid comparison, the analysis controls for entities jointly covered by both Credit Benchmark and S&P, highlighting CCR’s effectiveness in signaling credit deterioration ahead of default events.

Key Takeaways

Between July 2015 and June 2025, Credit Benchmark’s one-year averaged Gini Ratios were 0.88 compared to S&P’s 0.91, highlighting Credit Benchmark consensus ratings’ strong discriminatory power.
Over the same ten-year period, Credit Benchmark’s averaged three-year Gini Ratio was 0.83, versus S&P’s 0.85; averaged five-year Gini Ratio was 0.81, versus S&P’s 0.82, and averaged seven-year Gini Ratio was 0.77, versus S&P’s 0.80.

Implications

Credit Benchmark’s consensus ratings are highly effective at signaling default risk, consistently rank-ordering the credit risk of entities on par with S&P over the last ten years. With an average one-year Gini ratio of 0.88, consensus ratings offer excellent discriminatory power while covering a universe five times that of the largest rating agencies. In practice, this means decision-makers can rely on Credit Benchmark not just as alternative to traditional rating agencies, but as a scalable, independent lens on default risk – especially valuable for private, unrated, or thinly covered entities, enhancing portfolio surveillance and enabling forward-looking risk management.

Input and Data

For this study, the subset of data examined is confined to entities that were rated by S&P and held a Credit Benchmark CCR as of June month-ends across the last 10-years ending June 2025. The subsets of Credit Benchmark’s and S&P’s ratings coverage are of sufficient sample sizes permit a fair comparison of the performance of Credit Benchmark’s ratings against S&P.

Total Universe

S&P Global Ratings’ coverage universe consisted of 16,139 entities at the time of publication, of which 7,364 were not marked ‘not rated’ (NR)[1].

Of the 7,364 actively rated entities, Credit Benchmark’s active CCR coverage overlapped with 4,247 entities (57.7% of S&P). The number of overlapping entities fluctuates from year-to-year. For example, in June 2020, of the 7,658 entities actively rated by S&P, 3,821 entities (49.9%) overlapped with Credit Benchmark Ratings.

The figure below shows the year-over-year evolution of S&P ratings and the counts of Credit Benchmark rated entities within the S&P rated groups.

S&P Total Rated Entities and CB's Overlapping Coverage

Defaulted Universe

Between the subsets covered by both Credit Benchmark and S&P, 278 instances of S&P recorded defaults occurred over the ten-year horizon, with 233 separate unique entities defaulting. The figure below shows the counts of S&P defaults with mutual coverage between Credit Benchmark and S&P, over 12-month periods from July to June from 2015 to 2025.

Defaults Occurring in 12-month Periods Ending June 30

The figure below visually illustrates a snapshot of the relative sizes of the overall rated universes of Credit Benchmark and S&P Ratings, while also showing the relative size of the area of overlapping coverage. While both S&P and Credit Benchmark will recognize many more defaults than used for this study, the default events used for this paper’s Gini Ratio analysis fall within the mutual coverage area (i.e. area of overlapping circles).

Relative Coverage Universe and Coverage Overlap

Processing Components

This study uses a static pool approach to shape the monthly ratings data (containing both Credit Benchmark and S&P ratings) into a series of annual static pools. The pools only contain entities who had a valid, non-defaulted rating from both S&P and Credit Benchmark at the start of the pool date, which in this study is June month-end of each year. Each entity’s binary outcome within the horizon length[2] — either healthy or default — is attributed to that entity in the static pool cohort.

Static Pool Methodology Overview

An entity’s inclusion in a static pool means the entity has a non-defaulted rating from both S&P and Credit Benchmark as of June month-end of that pool year.
Default occurrences are based on S&P’s definition of default and the default event was recognized by an S&P annual default study, with a month associated with the default event.
Static pools are frozen. No adjustments are made if ratings were dropped or withdrawn subsequent to the start of the static pool period. Each of the static pools is effectively a ‘buy-and-hold’ portfolio.

The table below shows a snippet of a static pool for the June 2019—July 2020 cohort.

The table below shows the counts of shared-coverage entities entering each static pool for the one-year horizon
groups. It includes a column for Exits and New Entries. Exits refer to the number of entities that aren’t listed as rated in the next year’s static pool (June), and new entries refer to the number of entities that are listed as rated in the next year’s static pool, but don’t have a rating in the current static pool.

Furthermore, the charts below demonstrate the difference in rating distribution between entities who remain healthy and those who default within one year for both Credit Benchmark and S&P in this respective universe.

CB Healthy Distribution

CB Defaulted Distribution

S&P Healthy Distribution

S&P Defaulted Distribution

Performance Testing

The primary indicator used to test discrimination is the Gini ratio. This metric measures how well ratings are ranked over a set time horizon. The time horizons analyzed are one year, three-year, five-year and seven-year horizons. The Gini ratio strictly measures relative ordering.

Testing Discrimination with Gini Ratios

The first step of the Gini calculation is to sort all entities in descending order of predicted PD, with the riskiest appearing first. Actual default flags for each entity are applied based on whether the default occurs within the static pool’s horizon period[3].

Deriving the Gini Ratio from the Lorenz Curve

The Lorenz curve in this context is a graphical indication of how defaults concentrate among differently rated entities. In the given horizon period, the cumulative proportion of the total non-defaulters and the cumulative proportion of defaults are plotted for each distinct rating (C-AAA) as the x and y coordinates respectively.

The green line represents the ideal curve, which indicates a theoretical perfect rank ordering based on an aggregate default rate r, where the lowest rated r percent of entities all default, and the highest 100 – r percent of entities don’t default. The red 45-degree line from (0,0) to (1,1) — known as the Random Curve — represents no discriminatory power, because defaults would be uniformly distributed. The farther the Lorenz curve bows above this line, the more defaults are concentrated with higher PDs, which indicates better discrimination.

Gini Ratio Results:

The table below shows the Gini Ratios by year, and horizon length based on the PDs from June of each year from 2015 to 2024.

Credit Benchmark CCR vs S&P Ratings by Year (2015-2024)

The table below shows the weighted average Gini for each horizon year, where the Gini ratios for each year are weighted by the number of entities in each year before averaging. Defaults can appear in more than one static pool for horizons longer than one year. Defaults are attributed back to each entity in each pool where the default qualifies. For example, if an entity defaults in January of 2023, the default will be attributed back to 2022 1Y Pool, 2021 and 2020 3Y Pools, 2020 and 2019 5Y Pools, and 2018 and 2017 7Y Pools if the entity had a valid June non defaulted rating for each of the years.

Credit Benchmark CCR vs S&P Ratings Weighted Averages (2015-2024)

The graph below shows the overall Lorenz curves of the S&P and CB ratings looking at the one-year horizon. Each point corresponds to one of the twenty-one distinct rating notches, from C (most risky) to AAA (least risky).

S&P’s 0.909 Gini indicates similarly high discriminatory power than Credit Benchmark’s 0.882 Gini.

Credit Benchmark vs S&P Ratings on Shared Universe - 1Y Horizon (2015-2025)

The graph below shows the overall Lorenz curves of the S&P and CB ratings looking at the three-year horizon. S&P’s 0.850 Gini[4] indicates a similar discriminatory power to Credit Benchmark’s 0.830 Gini for this time horizon.

Credit Benchmark vs S&P Ratings on Shared Universe - 3Y Horizon (2015-2023)

The graph below shows the overall Lorenz curves of the S&P and CB ratings looking at the five-year horizon. S&P’s 0.822 Gini indicates similar discriminatory power to Credit Benchmark’s 0.807 Gini for this time horizon.

Credit Benchmark vs S&P Ratings on Shared Universe - 5Y Horizon (2015-2021)

The graph below shows the overall Lorenz curves of the S&P and CB ratings looking at the seven-year horizon. S&P’s 0.774 Gini indicates similar discriminatory power to Credit Benchmark’s 0.799 Gini.

Credit Benchmark vs S&P Ratings in Shared Universe - 7Y Horizon (2015-2019)

Further Entity Profiling

Results by Public Status

Results by Industry

Results by Region

Assumptions and Limitations

Sample Representativeness: The analysis is confined to the subset of entities rated by both Credit Benchmark and S&P in January of 2015-2024, representing approximately 5% of Credit Benchmark’s 98,603 total currently available monthly Credit Consensus Rated (CCR) entities[5]. This overlap may not fully reflect the diversity of industries or credit profiles in the broader Credit Benchmark dataset, potentially skewing results toward larger, publicly listed entities that have S&P ratings.

Selection Bias in S&P Coverage: On the same note, entities that have S&P ratings may differ systematically from those that do not – particularly lower rated or private firms, leading to survivorship and self-selection biases in the comparison data.

Static Pool Methodology Constraints: By freezing ratings at the start of each horizon, the static pool approach assumes no rating migrations within the period, which may understate the impact of upgrades or downgrades on default experience. Moreover, only the PDs in June of each year will be measured. A dynamic pool approach would have to be configured to test the PDs of each of the months of the year. However, the static pool approach, not the dynamic pool approach, is considered the industry standard methodology among many ratings agencies to transform data prior to a Gini ratio calculation.

Data Quality: The S&P monthly default data comes from Credit Benchmark’s internal monthly CCR Ratings and Credit Benchmark’s dataset of S&P monthly default data. There is likely some discretion between the listed S&P defaults in the CB’s S&P default dataset and S&P’s annual default studies. As stated in S&P’s Default, Transition, and Recovery: 2024 Annual Global Corporate Default And Rating Transition Study, Issuers sometimes default after S&P Global Ratings withdraws its rating – [S&P Global] make[s] [its] best effort to capture these defaults in [their] database. Historically [from 1981-2024], 14.8% of defaults are of entities that were no longer rated at the time of default. In our case, our total S&P default count was 16.0% less and an average of 17.4% (per year) less than S&P’s listed defaults for 2016-2024 in their 2024 default study, which may account for the instances where S&P perform this discretionary default capturing of S&P entities that aren’t currently rated.

Conclusion

The findings in this study show that Credit Benchmark’s monthly Credit Consensus Ratings exhibit strong discriminatory power over the last 10 years. Moreover, Credit Benchmark’s probability of default discriminatory power is on a par with S&P ratings.

Appendix

Gini Ratio:

The Gini coefficient is a measure of rank ordering that calculates the discriminatory power of model, one measure of its predictiveness.

The Lorenz curve in this context is a graphical indication of how defaults concentrate among entities with different Probabilities of default. A 45-degree line from (0,0) to (1,1) represents no discriminatory power, because the defaults would be uniformly distributed. The farther the model bows above this line, the more defaults are concentrated with higher PDs, which indicates better distribution[6].

Gini Ratio = 2 x (Area under Lorenz Curve) – 1

The X-axis represents the cumulative share of entities ranked from highest predicted PD descending:

For each rank k, 1 through N (where N is the total number of entities in the static pool):

at k = 1, the riskiest entity has been covered, at k = N, the safest entity has been covered.

The Y-axis represents the cumulative share of actual defaults:

For each rank k, 1 through N:

Where ⅆ¡ = 1 if the entity defaulted within the horizon, 0 otherwise, and D is the total defaults among all N entities.

This can also be calculated geometrically, with the formula:

Weighted Average Gini Ratio:

The weighted Ginis in the aggregate results follow this formula:

Where ¡ indicates each individual year that the horizon length covers. This number differed only slightly from the average Gini ratio but was chosen because due to its robust properties compared to a simple average.

Download

Please complete your details to download the PDF of this report:

Download Report

[1] As of August 2025

[2] This study uses one, three, five, and seven-year horizons.

[3] See the Processing Components section for more on the Static Pool methodology used to organize the data.

[4] Statistical significance tests on differences between CB and S&P Gini coefficients are possible using bootstrapped simulations. These are in development for future research updates

[5] 1 Month CCR Ratings, as of August 2025

[6] The traditional Gini ratio calculation is 1 – 2(Area under Lorenz curve), which looks at a Lorenz curve that bows southeast (under random line), but to match S&P’s methodology (where PDs are sorted riskiest to safest), the formula was reconfigured based on how the Lorenz curve is plotted (bows northwest).