Project 01 of 08

Medicare Part D Prescriber Risk Analytics

A reproducible risk scoring framework for every active Medicare Part D prescriber. The system combines statistical peer benchmarking with isolation forest machine learning across six engineered features, and validates the resulting risk score against an honest temporal holdout: providers newly added to the HHS OIG List of Excluded Individuals and Entities in 2023.

1.38M

Part D prescribers analyzed, 2023

81,914

LEIE exclusion records (full)

69,034

Providers flagged, top 5 percent

97.6%

Temporal coverage, 2022 flagged in 2023 universe

Methodology

The analytical universe consists of 1,380,665 active Medicare Part D prescribers from the 2023 Public Use File. Six features are engineered from the underlying CMS columns: cost per claim, claims per beneficiary, cost per beneficiary, days per beneficiary, opioid share, and brand share. Each feature is standardized and assigned a statistical risk weight that sums to one.

The combined risk score blends the statistical component (weight 0.60) with an isolation forest machine learning component (weight 0.40). The isolation forest runs with one hundred trees, contamination set to 0.02, and a fixed random seed for reproducibility. Providers are placed into five risk tiers based on the percentile rank of the combined score.

Temporal validation uses an honest holdout: the model is trained on data through 2022 only, and tested against LEIE exclusions filed in 2023. This avoids the in sample inflation common in fraud detection benchmarks.

Key findings

The flagged top five percent of prescribers (69,034 providers) represents an analytical priority set for downstream review. The tier distribution stratifies risk into Critical (1,381), High (12,426), Medium (55,227), Moderate (207,100), and Routine (1,104,531).

Temporal stability is high. Of providers flagged in the 2022 universe, 97.61 percent remain in the 2023 universe, with mean cost per beneficiary 8.59 times the population mean and median lift of 16.08 times.

Permutation importance identifies opioid share (29.80 percent) and brand share (17.74 percent) as the most discriminating features, followed by days per beneficiary, claims per beneficiary, cost per beneficiary, and cost per claim.

The bootstrap ninety five percent confidence interval on the flagged sample excess is 92,766 to 94,688 million dollars based on two hundred resample iterations.

Selected figures

Risk tier distribution across 1.38 million Part D prescribers

Figure 1. Risk tier distribution across the 2023 Part D prescriber universe, with Critical, High, Medium, Moderate, and Routine tiers defined by combined risk score percentile.

Per state count of flagged Part D prescribers

Figure 2. Per state count of flagged providers in the top five percent of combined risk score. California, New York, Florida, and Texas lead by absolute count.

Figure 9. Executive dashboard summarizing universe size, tier distribution, temporal validation, and bootstrap confidence intervals.

Verification

Every numerical claim in this project traces to a persisted result file in the public rebuild repository. The full claims traceability table maps each number to its source file, the notebook section that produced it, and the exact computational basis. Reviewers can reproduce any flagged provider from the same CMS public use files and LEIE downloadable database used in the analysis.

Downloads

Technical Report (PDF)

Citation

Hassan, S. (2026). Medicare Part D Prescriber Risk and Compliance Analytics: A Reproducible Framework with Temporal Holdout Validation. Working paper. SSRN identifier forthcoming.

← Back to all projects