Clinical Decision Support: Lessons Learned from the MID Project
The experiences of two of five conveners in the Medicare Imaging Demonstration indicate that the challenges of implementing decision support for radiology go well beyond the technical.
Coming in 2017, physicians treating Medicare patients must consult clinical decision support (CDS) prior to ordering advanced imaging for Medicare beneficiaries as mandated by Congress in 2014; four years prior, Congress had required CMS to conduct a demonstration project to assess whether CDS could assure that patients receive the imaging study required for their care, explicitly excluding the cumbersome method of choice at the time—pre-authorization.
Several months ago—and quietly into the breach—CMS delivered a report on the results of that demonstration to Congress. The report’s author and three of eight participating organizations presented the very timely “Clinical Decision Support: Lessons from the Medicare Imaging Demonstration (MID),” on December 2, 2014 at the annual meeting of the Radiological Society of North America (RSNA) in Chicago.
At the outset, moderator Ramin Khorasani, MD, PhD, underscored not just the stakes in meeting Congress’s 2017 CDS implementation deadline, but (given the MID results) getting it implemented correctly: “Beginning January 2017, if ordering physicians are not exposed to certified CDS when they order the exam, the radiology practices that perform the exam will not be paid,” he emphasizes. “If you can’t get this in in time, it will be quite the show, let’s say.”
The MID project was mandated by the Medicare Improvements for Patient and Providers Act of 2008; designed by CMS and the Lewin Group to present CDS as an alternative to pre-authorization; and undertaken by five conveners who enrolled between 8,000 and 8,500 physicians between October 2011 and September 2013. The conveners were the Brigham & Women’s Hospital consortium (which included Geisinger Health System, University of Pennsylvania Health System and Weill Cornell Medical College), Henry Ford Health System, Maine Medical Center, National Imaging Associates and University of Wisconsin Health System.
“MID assessed the impact of CDS based on very specific pre-selected professional society guidelines of 12 targeted high-cost outpatient imaging procedures for Medicare fee-for-service patients—but specified that physicians could ignore the guidelines and refer patients to any imaging provider they wished,” Khorasani explains. There were no restriction on services, and physicians received comparative data on their use of the specified imaging exams in relation to their peers.
The setup
Katherine Kahn, MD, is professor of medicine, University of California, Los Angeles, and senior scientist with the RAND Corporation Santa Monica, Calif. office, which performed the analysis of the national proof data. She served as the principle investigator and project director for the evaluation of the MID.
“I can tell you that when I assumed responsibility for the evaluation of the MID, I knew it would be an interesting and important project,” she begins. “I had no idea it would take on the significance that I think it will based on the new PAMA legislation.”
The goals of the MID evaluation (EMID) were to quantify the rates of appropriate, uncertain and inappropriate orders and to determine whether exposing physicians to guidelines at the time of the order was associated with more appropriate ordering and with changes in utilization.
Khan notes that within the MID, the practices that participated utilized CDS systems that fell into two broad categories of decisions-support software (DSS) design: One required users to select the patient characteristics and the clinical indications for an order; with the other type of system, users were prompted by a series of structured queries to document clinical information relevant to an order.
An implementation team at CMS selected the targeted images using three criteria: advanced images with high volume; availability of clinical AC developed or endorsed by a medical specialty society; and variation in utilization rates between rural and urban areas. They included MR and CT of the brain and lumbar spine; CT of the sinus, thorax, abdomen, pelvis and combined abdomen/pelvis; and MR of the shoulder and the knee; and SPECT MPI. ACR Appropriateness Criteria were the exclusive AC used for six of the procedures and AC from six other sources were associated with the other seven procedures.
Clinicians used radiology order entry (ROE) to specify the requested advanced image and the reason for the order; when the reason for the order matched specifications in national specialty guidelines, DSS linked the reason for the order with the specialty society guidelines and assigned an appropriateness rating based on evidence and expert opinion from the specialty society. When the reason for the order did not match specifications from the guidelines, the order was not rated, or not assigned an appropriateness rating.
During the first six months, clinicians did not know that DSS software was running in the background; they did use ROE but they received no feedback about their appropriateness level, Khan explains. “For the remaining 18 months, within moments of the clinician placing an order, they received on their screen an appropriateness rating or a message that their order was not rated. Some systems also recommended that the clinician provide additional clinical information to facilitate their order.”
The evaluation team compared appropriateness ratings during the six-month preliminary period with the subsequent 18-month intervention period. At the same time, the evaluation team was looking to see whether utilization changed.
Unanticipated results
The ability of MID to determine whether exposing clinicians to CDS had an impact on appropriateness was seriously hampered by the fact that the majority of orders across both the baseline and the intervention periods—almost two-thirds—were not rated, Kahn reports (Figure 1, page 37). The CDS systems could not match an order to an appropriateness rating for more than half of orders for nine of 12 MID procedures. Depending on the procedure, CMS anticipated the systems would be able to assign ratings between 53% and 89% of the time, but for most procedures, the percentage of time that orders successfully mapped to a guideline was 20 to 40 points lower than expected.
Not only did this trend not improve from baseline to intervention period, it dipped for all but two procedures (CT pelvis and CT abdomen). The largest decrease in the percentage of rated orders was for orders for CT and MRI of the lumbar spine, which decreased by 8 and 15 percentage points, respectively.
Even when the systems were able to provide decision support, many physicians interviewed in focus groups by RAND reported sharp dissatisfaction with the guidelines. “DSS systems were supposed to make available online concurrently with the order placement copies of the guidelines,” Kahn says. “But consistently, clinicians participating in MID said these were typically PDFs that were available online. Sometimes they were 10, 20, 30 pages in 12-point font, and in the context of their 15-minute clinical appointment with the patient, they didn’t think it was feasible to read these.”
Looking strictly at those orders that were assigned a rating, RAND found an average increase in ordering appropriateness between baseline and intervention of 8% (Figure 2), with rates of appropriateness in the baseline of between 62% and 82% (depending on the convener) and increasing to between 75% and 84% during intervention.
Overall, convener leadership and physicians interviewed after the MID ended approved of MID’s attempt to measure and improve appropriateness, but they found that the implementation was not an effective means to improve ordering behavior, Kahn reports. “There was a whole slew of concerns ranging from the amount of extra time involved in using DSS, to disagreement with the appropriateness rating system to disagreement with the guidelines,” she says.
Useful insights
The MID did provide some insight into who was ordering what types of advanced imaging, which Kahn called the epidemiology of advanced imaging. During the MID, 5,128 different ordering clinicians placed about 140,000 orders on behalf of close to 80,000 unique beneficiaries, the largest evaluation of clinical decision support in the United States to date. Of the MID clinicians, 29% were PCPs, one-third were medical specialists, 15% were surgical specialists, 13% were non-physician generalists, such as NPs or PAs, 4% were non-physician specialists, and about 6% were difficult to categorize.
More than 2,700 physicians—about 50%—ordered fewer than 10 images over the 24-month MID project, providing scant opportunity to impact the way clinicians practice through CDS exposure, Kahn notes. As the number of orders per physician gets higher, the number of physicians placing those orders decreases. For instance, just a couple of dozen physicians ordered 250 advanced image orders (of the MID-type described) over the two-year window.
Within the primary care physician category, 37% placed advanced image orders for five different body parts or more and 23% placed advanced imaging orders for only one body part. Of medical specialists and surgical specialists, just 9% and 3% respectively placed advanced image orders for five different body parts, providing insight into how CDS could be targeted.
Kahn urged attendees to think about how this knowledge will inform the nature of the clinical decision support that is made available to different categories of ordering physicians. “We might think about how the generalist physician places orders across many different parts of the body, while the specialty physicians tend to be focused on one or two body parts and image types,” she suggests.
RAND found no change in the rate of utilization for MID procedures over the two years before and the two-year period when MID was in place. “In summary, over time and despite a lot of opportunity with the MID, there was a lot of dissatisfaction by the clinicians; there was a lot of frustration about the lack of orders that received a rating,” Kahn says.
“There actually was some hope that amongst the rated orders there was an eight percentage point increase in the rate of appropriateness, suggesting that if we could do a better job linking the way in which clinicians place orders with better specified appropriateness guidelines, we might have a match here,” she concludes.
Henry Ford Hospital: Workflow and consequences
In describing his experiences in implementing CDS for advanced imaging, Safwan Halabi, MD, radiologist and principle investigator at Henry Ford Health System, Detroit, Mich., emphasizes that the MID was an outpatient imaging demonstration, although Henry Ford turned its solution on for every ordering physician except in the ED.
Henry Ford Health System is a comprehensive, integrated, nonprofit managed-care healthcare organization employing one of the largest multispecialty group practices in the country, with 1,200 physicians and researchers in 40 specialties. The system includes 30 medical centers and 4 hospitals, but MID was implemented primarily at the 802-bed Henry Ford Hospital and the 191-bed West Bloomfield Hospital.
At the outset of the MID project, Henry Ford had a homegrown electronic health record and radiology order entry system. The ACR provided Henry Ford with a web service for CDS that was integrated with its homegrown ROE system. Henry Ford Health System exited the MID prematurely because of the need to prepare for transition to a new electronic health record, implemented between 2013 and 2014.
When physicians met a certain threshold—such as 30 scored orders—they would get a feedback report that compared their practice to their peers. “Everyone who had ordered SPECT MPI, for instance, would be compared to others who had ordered SPECT MPI,” Halabi explains. “We also would compare not only to their peers, but with other conveners.”
When they used the ROE system, clinicians had to select a clinical scenario, including signs and symptoms. For example, when ordering a CT lumbar spine without IV contrast for low back pain, clinicians had to drill down to what kind of low back pain (low velocity trauma, osteoporosis, and or age greater than 70 years). They also were required to select an ICD-9 code and the responsible staff physician from a dropdown menu.
During the intervention period, a score was displayed to the clinicians and an alternate was suggested for uncertain and inappropriate studies, so the clinician had the opportunity to change the exam to a more appropriate study.
Where an alternate study was suggested, the medical society guideline that backs the scoring was displayed and while physicians were not required to order the appropriate study, they were required to attest that they were aware of the guidelines.
If they selected an inappropriate study, they had to choose from a pick list to justify their choice, such as, guideline does not apply to the patient condition; physician does not agree with items and other evidence base; recommendation from radiologist; and other choices required by MID.
Challenges faced
Halabi and colleagues faced a number of challenges, beginning with the inability of physicians to find their clinical indication in the clinical scenarios, a particular problem with subspecialty physicians and surgical specialties.
“One thing we tried to do to improve usability was to be able to search against decision support, so if somebody typed in osteoporosis you wouldn’t have to go through all of the menus,” he says.
Educating referrers about what is in the guidelines also presented a challenge. “Think about how many radiologists know what is in the ACR AC, and then push that to the primary care physicians and specialties,” he says. “That’s a big educational leap.”
This problem was exacerbated by the incidence of proxy ordering, which Halabi acknowledges is common at Henry Ford. Halabi cited a recent paper2 that compared the diagnostic image ordering patterns between advanced practice clinicians—such as nurse practitioners and physician assistants—and primary care physicians following office based evaluation & management visits.
“This is my take-home point for all of you,” he emphasizes. “We have a lot of proxy ordering in our health system: A majority of non-physicians are ordering on behalf of physicians. The conclusion of this paper was that there was a higher incidence of image orders when a non-physician was ordering. When you think about implementing imaging decision support you have to think about who is going to be using this guidance and what changes in ordering patterns you are trying to effect.”
Halabi and colleagues presented a paper at the 2014 RSNA meeting that reviewed 69,000 requests before and after CDS implementation in the inpatient setting. After looking at which specialties were ordering the advanced studies (neurology, internal medicine, neurosurgery, hospitalists), they looked at the percentage of orders that were made by proxies for the physicians. One quarter of the 43 specialties had 100% rate of proxy requests; 19 had proxy-ordering rates in excess of 90%; and six had lower rates, around 50%.
“When we look at the physician data (in the MID report), they said, ‘Well this doesn’t affect my practice, I know what to order’,” he notes. “But this may be very helpful where proxies are ordering on behalf of physicians and they have no idea what the right exam to order is.”
In conclusion, Halabi shares two maxims for a successful CDS implementation: First, respect the ordering provider workflow; second, establish consequences for ignoring CDS recommendations that will enhance the educational impact of imaging CDS.
Weill-Cornell: Scalpel not a shotgun
“The take home point from our side—and what I hope to convince you—is in order to be effective, CDS really needs to be initiated in a surgical approach rather than a shotgun approach,” Keith Hentel, MD, executive vice chair at Weill Cornell Medical Center and principle investigator for MID, relates.
Weill Cornell Medical Center, staffed by a large academic medical practice, joined the Brigham & Women’s MID convener consortium and enthusiastically broadened the CDS across all patient populations. “We thought that what was good for our Medicare patients would be good for everybody, so we decided to implement it across the board for all of our patients,” Hentel explains. “Quickly, it became apparent that we had to scale back, and we did scale back, very shortly after we started, to just do our Medicare patients.”
One problem was that the vast majority of orders were not scored against the evidence. Another was an onerous workflow associated with the implementation, which was integrated with a broadly deployed EHR.
To apply CDS surgically, it’s important to know how not to implement, Hentel says. He identified three flashpoints.
Conditions where no evidence-based guidelines exist. “One of the real flaws with the design of the MID is that patients don’t present with CPT codes,” Hentel explains. “Evidence doesn’t exist for CPT codes. People don’t come in because the need a CT of the belly, they come in because they have suspected appendicitis or suspected pancreatitis. the AC were not meant to be all-encompassing and cover every possible indication that the patient came in for, so the fact that we thought that we could apply decision support to every patient that came in for a particular CPT code was a big flaw and resulted in so many of the non-scored indications.”
Advice for those who do not need it. “I am a bone radiologist by training,” Hentel says. “I deal with orthopedists and rheumatologists. Not all physicians are the same. Throughout the MID it became apparent that physicians of different types were both different in their ordering and their response and acceptance of decision support. To think you can apply the same guidelines and expect the same guidelines to be effective for all physicians, I think was a flaw.”
Questionable and even bad advice. “There was not good evidence to cover an entire CPT code and, truthfully, there are different states of evidence at different sites,” Hentel says. “The MID required us to use the AC from the medical societies. When we actually went through the exercise of looking at these pieces of evidence and comparing them to what we considered our local best practice at my institution, there was a direct conflict with a significant number of these pieces of evidence.
“That may be because we have specialty services or advanced equipment that may not be available at other sites, but I think that every medical practice is different and every medical practice may have strengths that another practice may not have. That led to some confusion with some of our physicians; some of our physicians were a little more vocal—it was a little bit more than confusion.”
This is why Hentel advocates taking a surgical approach to CDS. Weill Cornell is:
- Selecting clinical conditions in patient populations where there is strong evidence and is real opportunity for improvement.
- Implementing different implementations based on ordering provider and payors that have different requirements.
- Sticking to the mantra (for now) that less is more.
“We now know what we are doing,” he says. “We are doing a surgical approach. We are hoping to cut out all of the noise so you can more clearly see the appropriate information and the appropriate advice.”
Brigham and Women’s: The voice of experience
The Brigham and Women’s Hospital (BWH) convenership included all physicians at the BWH, as well as Geisinger Health System, the University of Pennsylvania Health System and the previously reported Cornell Medical College. However, the MID data from BWH was excluded from the final analysis and the report that was submitted to Congress. This was due to the organization’s extensive, 22 years of experience with CDS, explains Khorasani, BWH principle investigator. The fact that BWH received permission from CMS to suppress evidence that was contrary to local practices—20% to 30% of the MID evidence—was another factor in that decision.
“The Brigham implementation of CDS actually has very minimal use of the ACR AC,” he notes. “One or two of those pieces of evidence are actually live in our environment out of a total of 800.”
BWH ran the MID project concurrently with its existing radiology order entry system, which was initially developed at BWH. Its content was licensed to a CDS vendor. The system is integrated with the EHR, optimized for radiology workflow (unlike the other conveners who dealt with clunky EHR integrations) and allowed the MID evidence to be turned on for specified physicians.
Khorasani makes a point of distinguishing the CDS content from the delivery method, which he said should require iterative interaction in order to enable the capture of the appropriate information from users. “If one of our emergency physicians puts headache into the ordering system, the next screen that pops up is ‘does patient have loss of consciousness’,” he explains. “You have to answer these questions.”
To illustrate his contention that it is not possible to determine the appropriateness of the exam with one piece of information, he shared a complicated decision support algorithm for CT head. Yes to the first question results in ordering the exam, but no triggers another set of questions, and if the ordering physician says none of the above, the following message appears: “For inpatients with minor head trauma based on the information you have provided, the CT is not going to help you.”
“These are not appropriateness criteria,” Khorasani says. “These are unique pieces of evidence that are published in the literature, that are accepted by our clinician colleagues in the emergency medical world, that they will understand and appreciate the content as presented to them. They can cancel the exam or proceed.”
While Khorasani maintains that the impact of imaging CDS at the Brigham has been quite substantial, with a 30% reduction in use of CT per 1,000 patients in the ED, his data from the MID project shows no change in appropriateness ratings after the intervention. In fact, he believes the goal of MID—to assign orders as appropriate, uncertain or inappropriate—is besides the point of CDS, referring to the “appropriate” and “not rated” messages as noise. “It is completely contrary to what decision support is supposed to do,” he says.
What’s more, Khorasani called into question the low number of exams across all conveners that were uncertain or inappropriate (Figure 3, page 40). “Rates of inappropriate and uncertain exams are incredibly low based on professional guidelines deployed through MID,” he notes. “I don’t know what portion of the studies you read are inappropriate, unnecessary or redundant. I suspect it is larger than the numbers we see on these graphs.”
In conclusion, Khorasani said that CDS works when the evidence is good and is presented in a way that is actionable. “The take home message for me and our practice is that although CDS is deployed through IT, the quality of the evidence should be the primary focus of interventions designed to change physician behavior,” he says. “While CMS and commercial payors may focus on use of CDS for cost savings, CDS-enabled clinical programs will not succeed if focus is primarily financial in response to federal regulatory requirements. Buying an IT system and putting it on the ground isn’t going to change things very much.”
References
- Tibie JW, Hyssey PS, Burgette LF et al. Report to Congress: Medicare Imaging Demonstration Final Evaluation. 2014. Rand Corporation, Santa Monica, CA. Accessed January 6, 2015: http://www.rand.org/content/dam/rand/pubs/research_reports/RR700/RR706/RAND_RR706.pdf.
- Hughes DR, Jiang M, Duszak R. A comparison of diagnostic imaging ordering patterns between advanced practice clinicians and primary care physicians following office-based evaluation and management visits. JAMA Intern Med. 2015;175(1):101-107.