Big Data and Machine Learning in Healthcare: How, Why, and When

so first thank you very much – Tom Chris Mike everyone who's put this together it's not easy to organize a conference conference focused around new ideas bringing together people from a few different slices of industry they've done a fantastic job that have been incredibly helpful throughout so thank you guys very much for the invitation of appointing this together this is my all right good so first my disclaimers a co-founder of a for-profit venture called shift assistant professor of Brigham and Women's Hospital Harvard Medical School I also have affiliations vested interest with every other organization pictured in the slide so feel free to not trust a word that I say so I have spent the last dozen years or so in the Health System innovation space working with a number of organizations trying to help them use their data differently to improve the quality of care the quality of clinical operations even the quality of scientific discovery and I feel very fortunate to have been able to work on some pretty cutting-edge projects like at the Department of Veterans Affairs the million veteran program which has just hit 550 thousand plus veterans enrolled to do genomics silence very differently also have a fortune of leading the point-of-care clinical trials program where we jammed clinical trials into the electronic medical record for the first time in an effort to reduce the number of years between scientific discovery to implementation from now an estimated 17 down to the minute you have enough evidence you can flip the switch and try that into decision support and more recently I worked with the tool Gawande at Ariadne labs and we got to do some really fun stuff in uttar pradesh india where we were taking data feedback loops captured from cell phones sending it up to an amazon cloud and then delivering a back to birth attendants and coaches so that they could work with the birth attendants to improve upon really basic life-saving behaviors like hand-washing and breastfeeding and skin-to-skin contact and while I feel blessed to have been able to do that work it's also been bittersweet if not disappointing I think the reason we choose to apply these skills to make data matter for healthcare is because we all believe that there is an enormous opportunity to improve lives by doing this in healthcare as opposed to in other industries and if you've been in the field for some time I think you may also share my frustration in watching other industries fly past us when it comes to using their data most effectively every time you walk down the aisles of a supermarket every item you're seeing where it's positioned when it's restocked is dictated by the collective experience of shoppers who have come before you they use their data to understand what's happening to whom is it happening what should we be doing more of what should we be doing less of casinos know exactly when to offer their players some kind of incentive to keep them at the table a little longer a steak dinner a free room to get them to come down to Connecticut or what other whatever other part of the country you're in even waste management has figured out how to optimize their routes to make sure that they're spending every dollar they have as effectively as possible they can predict landfill overbuilt over fill years in advance based on the collective experience of neighborhoods like yours and the terrible tragedy that we're all overly familiar with now is that despite the fact that no industry counts more than healthcare it truly is the one industry that's a matter of life and death you'd be hard-pressed to find an industry that counts less than health care we simply do not have not used our data our collective experience from delivering care to improve the quality of care we deliver and I think a set of numbers that is probably most damning disappointed revolves around the number of people that die of a preventable death each year so the Institute of Medicine estimated that number could be as high as 98 thousand some 15 years ago then a couple years later the Journal of patient safety is actually just a couple years ago the Journal of patient safety had a study that said that that number might be as high as two hundred and ten thousand people dying by accident at the hands of the US healthcare system but the upper bound of that estimate maybe four hundred thousand and while there is considerable debate around what is an accidental death and health care is complex have you heard the analogy that it's the equivalent of two airplane at DC seven and seven four dc-10 and a 747 going down every day for a year is the equivalent of the number of deaths third leading cause of death now in America is preventable death at the heads of the US health care system and that number may be 98 it may be four hundred and maybe to ten I would argue that the most concerning aspect of this number and the conversation around it is that we don't bother to count it right if something counts we should count it and if we're off by a margin of two to three hundred thousand it would indicate that it just doesn't matter I cannot be convinced that considering the minds in health care in the US we can't come up with a reasonable way to approximate this number with a tighter margin of error than 300,000 and so we've watched from within healthcare at other industries the light at the end of the tunnel is real but it has seemed distant for at least two decades the point of my talk today there's three number one I am now convinced and I'm a skeptic by Nature I'm now convinced that this is happening and it's happening now as compared to the dozens of false starts that I and you have experienced up till this point the second point I'm going to make today is that the return on investment of big data approaches will be even higher than those in other industries and that may seem a stretch because right now we don't even do the math on the number of people that are dying by accident but I'm going to argue that in a few years time these big data technologies will be even more important for us then for waste management retail casinos telecom etc and the final point that I'm going to make today is that our current approach to analytics will change dramatically as a result to the point that I would estimate in less than five years we will look back and laugh and maybe be a little uncomfortable at what we were considering in from analytics right now today so let's start with why now as compared to in the past two conditions are necessary for data to take on new life in healthcare number one access to the data number two incentives to understand why this is happening right now we start with a health care policy story health care policy is terrifying and boring I'm going to do my best to make this matter in three minutes or less and the reason you should care about it is because in health care policy eats culture and strategy for breakfast if you don't know health care policy and you're in the healthcare business enjoy your stay it will be temporary ok here we go President Obama takes office and is top right one of his top priorities is to improve the quality and lower the cost of health care provided why because somewhere around the 80s we move from an HMO model to more of a fee-for-service model in a fee-for-service model you get paid for the number of units sold the quality the efficiency not part of the formula so understandably the cost of care goes through the roof and we're by far and away number one in the world now 37th in terms of the quality of care we provide opportunity window opens in 2008 the Great Recession hits there's blood and the economics treats great opportunity for a politician to push through a number of policy agendas he starts with data acts he starts with a carrot-and-stick policy that ensures that this country begins to adopt digital electronic medical health records at rates that have never before been even considered right so the American Recovery and Reinvestment Act $17,000 per adopting clinician but if you wait four years it's minus four thousand dollars what happens everyone gets an electronic medical record I in less than a decade we got when I started this work we had six point seven percent adoption of electronic medical records now up in the mid 90s depending on how you count it so we've got data in ways that we never had in the past what about incentive right around the time all of this adoption of electronic medical records stuff is happening the same administration introduces payment reform in ways that we've never seen before right now it's not smooth it's not easy it's not even one flavor of payment reform but the writing is on the wall suddenly the quality and the efficiency of care provided will be part of the formula that dictates how you get paid so you still we did health care policy is everyone still with me yes okay good because I have more healthcare policy for you okay I made the argument in favor of data being accessible I made the argument that we now have incentive why am i arguing that the ROI will be even bigger in healthcare than in other industries so let's pick up policy again but let's start by saying while every other industry has quantitative and at least somewhat reliable data right I trust the number of iPads I sold in Wisconsin I trust the share price of that particular stock health care data as you all know is very different why is it so different when you incentivize the rapid adoption of electronic medical records where if you don't have this in place in just a few years time completely transformative and you need to do it now which electronic medical record vendors do you think get the money it's the folks that have been in the field for 30 years and they did not grow up creating an electronic medical record systems designed for value-based care they created electronic medical records for fee-for-service care what are the three reasons for being for electronic medical records in a fee-for-service world number one capture the number of units delivered so you could get paid that's claims data right number two communication among clinicians I'm a nurse I'm picking up a shift of another nurse how am I going to communicate across clinicians what happened how do we communicate as humans free text narrative storytelling greater than 50 percent of the electronic medical record is free text unstructured inaccessible to a lot of the techniques we currently use third reason for being legal documentation so not only am i dictating and free text but I'm adding this layer of ambiguity that would protect me and my institution from legal action in case my opinion or my actions are not correct so I'm taking free text I'm making it even harder notice completely missing from the reasons for being for the electronic medical record that we have been adopting over the last 30 years and aggressively today learning doing the math counting improvement value-based care is not introduced with one agreed-upon set of guidelines instead dozens of policies are introduced all at the same time causing health care to exist in a quantum state of payment reform there are several flavors of a CL commercial payers coming behind it and take advantage of this finally an opportunity to dictate to have some control over cost and efficiency they introduce a QC s right CMS puts out a pms alternative payment models the same hospital caring for the same population of diabetics could be held accountable for five to seven different methods of reimbursement depending on whose pay each one of these comes with it a set of reporting requirements that leads to checkboxes drop downs measures which take out already poorly designed 30 year old electronic medical record system and add to it another layer of confusion frustration another layer of data not designed to help you do your job in analytics getting us to this value-based future so this leaves us with data that is just plain harder and different than the data we see in other fields which leads to the third argument that I wanted to make today which is these be these big data methods will matter even more for us than they have matter for any other industry that has already made terrific use of them so this is happening in healthcare so let's now talk about Big Data raise your hand if you're involved in big data in any way please leave your hands up if you've explained to someone that you're involved in machine learning leave your hand up if you've called it artificial intelligence cognitive computing data science we have a naming issue here right we've just created this umbrella term that means anything that has to do with data that maybe isn't in Excel it's confusing it's confusing the customer frankly it doesn't have to be that hard we're talking about a new set of math plus computation it has with it a set of pros and cons it is an objective set of technologies that applied to the right tool can lead to answers to questions that we can't currently answer with our current approaches so let's ground our understanding of what this is with a pretty sample a simple example looking at the way we currently learn from data as we move to a value-based model this is one one of the thousands of questions that will dictate the success of your organization it's no longer about what happened in the past it will soon be about what is likely to happen and so what are our two methods of learning from clinical data today and by clinically I mean claims or EMR data or whatever data we have our two methods are number one reporting reporting was designed and has served us well to give us a view to the past perfectly well-suited to look at claims data to answer questions like how many beds how many pills how many cuts this is these are the questions that I trust in attempting to get answers out of claims data because that's what they were designed to answer and so our clinical team now comes to us the analyst and asks us to answer questions like how many diabetics are we likely to see which diabetics exactly are most likely to and we take the limited structured data we have and we attempt to use intuition clinical intuition good old-fashioned smarts to map that to the structured data that we have in our databases right our second method of answering that really important question is traditional statistical adjusted progression based risk scores lace lace plus Johns Hopkins there's a dozen plus vendor products is this familiar to folks okay so it's always dangerous to try to drop a whole bunch of methods into just two buckets but at a high level what we're doing with risk scores is we're working with one or more institutions mostly claims data and we're attempting to in advance say that there's probably ten five seven a dozen things that matter here and we're moving each one of those items a little bit to the left a little bit to the right and we're seeing if that affects our statistical significance if it changes the underlying characteristics of the population if just a little bit and we're validating that mathematically then because up till now we want to take that risk score and apply it as widely as possible to as many organizations as possible we turn it into a set of rules one point for this two points for this three points for a readmission within six months five points for cyber NGO blastoma or whatever it is right so I want to talk about some of the limitations of these methods and why I think some of the newer methods will become very important complements in some cases replacements for the way we currently learn our data number one there is a heavy dependence on disease codes I trust I say this as a researcher has been doing this for some time I'll give you a quick anecdote we work with six major academic medical centers very shiny ones well-known winners of some best in the country awards to try to understand and improve the quality of colorectal cancer care being delivered every effort to do things better starts with who should be in the cohort what do we use to figure out who has a disease icd-9 codes right six academic medical centers eighty percent of the people that have the codes for colorectal cancer never had the cancer right all of them had colonoscopies twenty percent had the cancer working as a researcher in data for a dozen years I can tell you that this is not the exceptional case just a couple months ago I was working with an ACO I told this story the CMO said nah no way the guy responsible for revenue recognition had just finished to an audit on heart failure and said 50% 50% inaccurate in heart failure today in our organization that's disease codes that's a problem if we're going to try to use them to predict what's likely to happen in the future relatively little data points considered ask your clinician to look across a population at who needs their help I would be amazed if they pull up claims data and look at five to seven data points as opposed to looking the gist alt of the medical record that they have at their disposal starting with the notes risk stores are often one-size-fits-all clinicians also would not apply one set of thinking one algorithm once one scoring system to an entire population to understand risk your if you're using a risk score based approach to consider a high-risk pregnancy with exactly the same math as you would consider a geriatric nearing end of you're probably being a bit less granular that I think the patient population would require we're ignoring free text today with most of these methods and I hear this a lot from the work that I do doctors tell us I know who's sick I need to know who's actionable I know who cost more last year I need to know which of my diabetics is most likely to end up in the emergency department soonest in one step further I already know these five people I'm going to see them any either way tell me who's pre-diabetic that's heading toward diabetes that I can do something about that's very different than one-size-fits-all risk models where you build a model down in rural Pennsylvania and you tell folks in Manhattan that it's good for everyone to understand what's likely to happen this is one of the approaches I think that we're going to get a kick out of looking back at in just a few years the final thing that's been heavily influential in my thinking about risk in health care is that the patient population and the interventions we deliver are dynamic in nature so this study came out of Colorado it was published just last year they created a cohort of super utilizers in their Medicaid population super utilizers end up consuming the vast majority of the resources what they found is just seven months after defining this cohort 50% of that cohort had moved on they got better they died they left one year later only 28% were left two years later 14% if you're building a model that assumes it understands the way the world looks like today and gives you a list of patients today you need to know that that model is outdated just a few months later so this is why some eight years ago I began my research in what methods might help us understand what's like could happen more effectively and supervised machine learning as a set of tools have characteristics that are objectively better at answering some of the types of questions that will matter more in a value-based world so I want to move away from big data and talk about supervised machine learning specifically which I believe is one of the most important sort of technologies within the Big Data Brella rather than begin with who will utilize supervised machine learning begins with the definition of a cohort and it splits it in half give me the diabetics that did end up utilizing in the very recent past and give me the diabetics that did not and then rather than assume that it's going to be these four drugs these three conditions these two comorbidities that may matter supervised machine learning takes an inductive view of the world it learns mathematically what is it that makes the folks in the eat that end up in the edu different than the folks that do not and in doing so it can consider all available data right from your socioeconomic status your labs to your EMR and if you combine it with natural language processing the free text nurses impressions call center notes and so on and so forth it Maps each of these millions of data elements in space using probabilities depending on the algorithm and honestly many of these algorithms are completely interchangeable what really matters is the robustness of the data and when we say big data I can tell you we can do this work with as few as a few hundred records which is contrary to popular belief because if no other reason we use the expression big data but really it's about rich data and learning from that rich data what matters or at least what is correlated to one outcome versus the other now if you build these models and you validate these models then you can apply them to a population in which you don't know the answer and you can get out of this the rank ordered list of the patient's most likely to be the diabetics that are going to end up in the IDI so what are the characteristics of these types of approaches that make them more attractive for some of the type of work we now need to do in healthcare number one it considers more data than traditional risk scores the limited data you see on I don't know if the left or right I guess your left leg toast this is right out of lace which is one of the more popular risk scores if that person that we're trying to predict this for is a high-risk pregnancy they're a ghost to your risk score because they don't have those criteria that you see on the left right if instead regardless of the condition we're trying to predict who's going to end up in trouble it makes more sense to have more data to attempt to predict something they're also more tolerant to missing and inaccurate data and by the way if you work in health care that's our data and then remember we talked about how the population moves right so if you do something to improve upon a condition if you turn your case management team loose to try to prevent admissions of folks with COPD I can tell you that based on what they do the reasons people come back in get admitted with COPD is going to change right it's going to work they're going to start winning and solving the problem and then the reasons you need to pay attention is going to shift these models by design evolve machine learning right they don't take a static view of the world they can be retrained with each new experience or periodically so that as your clinical teams are doing their jobs the models can keep giving them the most actionable people as opposed to the complaint that we're probably all used to hearing now which is I already know about that person there's nothing I can do or that's old and outdated now uh I think it's important to qualify that we're not talking about anything new here right machine learning and the math we're talking about has been around for 40-plus years in fact in medicine across dozens of clinical domains from surgery to oncology to radiology there have been hundreds of studies that have shown that these methods outperform traditional methods when it comes to predicting what's likely to happen right the math has been done game is sort of over and if we're going to move toward a value-based model we need to move past what happened and up into the right of what's likely to happen the number of insights that we need to generate I mean I don't you can't put a number on it the number of questions you'll need to be able to answer when you are given $25,000 per patient with knee surgery to get them healthy again you will not be able to settle for who's at risk you'll need to be able to answer tons of questions and you would not presume to be able to answer them with a few elements from your claims data you will need to be able to make sense of all of your data in fact for the first time it will become your competitive advantage and if you can't take advantage of your data you will be at a disadvantage so in my opinion we've covered the incentive we've covered access to data we've covered the tool I believe the last barrier that needs to be overcome is this idea of democratization so every technology starts out mystical and accessible to only a few PhDs or experts that live in right sterile rooms and have white coats and they're the only people that get to use the mainframe computer and we don't really know what that mainframe does right it might be good for accounting but I'm not so sure one thing I'm sure of is personal computing will never be important right I mean think about it it's mystical and magical and exclusive eventually it becomes the desktop computer right and then eventually a laptop and then we get to a point where we're filled in a room full of people that care about big data and not one of them is walking around without a cell phone and not one of those cell phones doesn't have dozens of apps on it that lent you the person that has the problems that you want to solve use the technology to solve the problem quickly inexpensively and based on your priorities not on the dictated few priorities of others whether those others government or payers etc that's where we need to get in order to truly make health care count and I was looking back as I've been at this this has been the problem that I've been focused on as a researcher for about eight years now is the democratization because math is math but to make it work for health care there's a whole set of issues that have yet to be tackled that that I and many others have been working on and so I found this quote from a paper that I wrote in 2011 and I guess it's appropriate but it's also disappointing I started a paper that I wrote by saying despite 40 years of promising performance very few natural language processing or machine learning systems contribute to medical science or care this is some of the work that me and my team have been involved with 15 studies where were for 250 plus hospitals 18 health plans every one of these exercises has been an effort not really to move the needle up on the prediction but to move the technology down to the folks that can make use of them and these are some of the problems you need to solve you have to become useful you you cannot settle for a one-size-fits-all approach to understanding data could you imagine for a minute if Toyota used exactly the same approach to their data as Honda in trying to build cars better faster cheaper and I've had nicer price points right where's the competitive advantage to that but that's been the norm in health care but if we switch to value-based care and we truly create incentives toward quality it's really going to come down to who can use their data to do a better job faster and cheaper can't ignore the free text must be interpretable the downside on these methods they do lend themselves to black box results that must be solved in order to make this stuff useful and finally math is math software is a different animal in order to make this work it must be software so we worked as researchers for a number of years and then this one paper was an interesting milestone for us because we do these Bake off's in computer science where everyone gets three months to take the data and do the best that they with the same set of training data to solve the same problem and then they submit the results blinded to judges which is always scary but it's really the only way you figure out how good you are so I was early on in my research career and was a little nervous when I submitted the results because we took the training data the day before the results were due and we were competing against 22 teams from across the world from very prestigious universities and we were trying to automatically identify from the free text of three hospitals electronic medical record systems all problems all treatments and all tests and we would be measured blinded by the same bar that everyone else would be we took the data on the day that the day before the results were due we pressed a button we submitted the results and I prayed to God that I wouldn't be laughed out of my academic field we did not win but we came in eleventh a couple points off the leader and that was the argument that we have been trying to make is that we're past the point of this set of technologies needing to be exclusive to a few it's time to democratize it so we made it open source we gave talks I wrote papers I very naively thought I would change the world I eventually became frustrated with the fact that that's not how the world works we commercialize this stuff and I'm going to talk to you a little bit about some of the work that we're doing and some of these results are from research some are from industry in any case I want to show you what happens when you start to democratize this stuff and you don't force people to pick from a menu of what other people think matters number one you start to focus on really specific narrow problems to answer questions like what really happened so we work with six different hospitals to understand which veterans with PTSD are getting best practice care right there are three types of care to our best practice the other is this big bucket of who knows what that that we know aren't the best types of care to deliver the six different hospitals claimed to be delivering 75 percent best practice care we took hundreds of thousands of free text behavioral health care be notes and with 96% accuracy one day later told both the researchers but then they told their institution that actually no that number is more like 23% they write a paper but hopefully if they do their job right the institution begins to change the quality of care delivered based on for the first time a real quantitative understanding of what's really happening second you start to move past readmission right or utilization and you start to think about really specific problems so we're working with an organization that's helping hospitals understand more narrow things like non routine discharge following a very specific type of surgery total knee total hem and we took claims data and a couple elements of the EMR data and within a very short amount of time we're able to turn around results that were six or seven points more accurate than anything that had been published in the literature but it's really not about a few more points right because the reason they're working with us is because they're going to go after hundreds of different problems and they're starting to roll in electronic medical record data so they know that the way this industry is going requires them to be more flexible adaptive and democratized than the way the industry currently stands an interesting thing that one of our customers are starting to do then this is around revenue recognition and HCC work is there daisy chaining models so if you want to solve any problem but particularly run revenue recognition first you've gotta identify the diabetics that's a model but it's a patient level model then you want to take all of the data from that model all of the patients that were diabetics and go after the diabetics with complication because that's where you're leaving money on the table where you're already providing complicated care but you're not getting paid for it but your unit of analysis isn't even the patient the unit of analysis is now the document right and then you swap out the unit of analysis from document to finding free text evidence of the fact that a clinician delivered a continuous glucose monitor read which helps that should get paid an additional $550 if you can do this across millions of records suddenly you can help health care institutions get paid in this value-based model and you can do it automatically and not just for one problem but for many problems and the final example and this is what we're doing with a health plan their Medicaid plan and so they invest a lot of money to stabilize the health of a population this will be very familiar to anyone who's working in a risk model if the members in that plan leave after one year of enrollment they take a real financial hit not only the acquisition cost but all the cost to stabilize the health of that member they have no idea and also if people leave the after one year it may be safe to say that they're not happy which effects star ratings and all these other things that health plans care about so they use this approach to number one identify why people might leave which is not something that you can answer with a query or with traditional approaches and what what they learned and using this approach is a couple really interesting in I guess it starts as evidence but then you follow up on it and it turns into anecdote and then you follow up on that and it turns into improve care or improved in this case operations so it turns out that they hot spotted a bunch of clinicians that we're losing a bunch of members very quickly it wasn't about the clinicians those clinicians were in one part of Chinatown and then they went out to the field and saw that their competition was blanketing three large apartment buildings with benefits that they couldn't offer at least they hadn't thought to offer and every one there was a mass exodus just in that part of town they also discovered that there was a real issue with the way one pharmacy was handling this sort of healthy benefit gift card that they were giving out so it turned out that when their members were coming up to the counter and drop they're items that they wanted to purchase the pharmacy was using judgment to say some of those items are not healthcare related and to deny payment right so their case managers knew this but it never would have come up to management it's sort of anecdotal until the data teaches you to look there and the other thing that they're doing now is they're producing rank ordered lists of the members most likely to leave and for the first time they've created proactive call scripts to get out ahead of the problem and that was a big challenge for them they Bailey are used to inbound complaints but now they've changed their model because of data to do outbound proactive outreach which is just a very different but real strategic advantage so this paper came out two months ago it's a paper and papers are kind of boring but it was super exciting for us because what it was was a team of researchers at Dartmouth published a paper using an early research version of the software that we had created from the bake-off and all that and they used it to automatically detect falls from inpatient stays using the free text of electronic medical record and their sample size was in the small hundreds and nobody on their team was a data scientist nobody knew natural language processing nobody knew machine learning these guys are friends we worked with them six seven years ago they didn't even tell me that they wrote this paper and the last time I was working with them it was on behavioral health issues and this to me is what's most exciting because it this is where we need to be a group of people that understand the problem and the data using a tool like you would use your cell phone to identify the people that needed their help so they could provide better care and if we move toward a value-based world the number of questions that we will need to answer absolutely requires that we the experts get the hell out of the way as quickly as possible so our hope in trying to wrap this up and I've convinced you in some way shape or form that number one this is happening right now you just show pants do you guys believe that this is happening right now some skeptics are just you're bored because I talking for so long number two do you believe show of hands that the ROI and health care will be greater even than the ROI in other industries that's a tough one to swallow because we've watched that light at the end of the tunnel seem distant for so long I guess the last question is do you believe that it will fundamentally change what we consider analytics today show of hands those that don't have your hands up get a good look at the people with their hands up you'll be asking them for a job in five years I'll just get it listen obviously I love this stuff I've been eating sleeping breathing it for a long time if there's anything I can do to help anyone whether through bouncing ideas or making connections or sharing with what I've done sharing the mistakes that I've made so you can avoid stepping on the same landmines it would be my absolute pleasure to do so there's my contact information thank you for your time and attention [Applause] thank you for a very interesting presentation and a completely understand and appreciate what you were conveying it seems to me the problem isn't about analytics though and it's not even necessarily about healthcare it's an economics issue and the issue is that most consumers don't understand when they receive good healthcare or bad healthcare like my father were raised to accept the health care you get and not questioning because that could be detrimental and we're not informed how do you solve it from an economic standpoint so that people care about and understand and engage to shift the vested interest in the healthcare system we have in the US I mean you know that's really off the cuff just a softness off thanks yeah so I mean that's the that's the question right I mean the reason why this stuff reason why health care in general can't count the number of people that die by accident is because we have not been incentivized to do so price transparency definitions of quality these are the big challenges that our policymakers struggle with right APM's hu sees a cos 20 different flavors moving the teeth back one year in Macra I mean it's a real struggle to create policy that can incentivize healthcare toward quality in ways that we haven't been able to do if I had the answer you know I'd give it to you because I am all of us need this to be answered as quickly as possible and I debated how far down this road to go in the design of this talk because really I do believe that's everything as I said economics is everything policy in healthcare dictates our economics and policy eats culture and strategy for breakfast and healthcare I don't have the answer what I can say is we're closer now to paying for quality which means analytics will move away from simply reporting on what happened one very quickly the greatest threat to progress to the type of progress I'm describing here is a reversal back toward fee for care or right now if you're a policy nerd and you're paying attention to very specific policies which could dictate whether or not analytics remains reporting based on mandated measures versus true true competitive advantage based on analytics the one bit of policy that I'm watching that is a fork in the road is Macra and you have a one hand MIPS which is really largely loaded up with more reporting and dropdowns and check boxes and frustration and it is in my opinion the alternative to that are the APM s which are they define an outcome they define a type of therapy or treatment and they say based hopefully they add risk to this which they haven't done effectively yet but what they layer in risk of the patient they're sick or the healthier then you'll start saying we're going to give you X dollars to get it right and that's the closest thing to I'm going to give you $25,000 for a car now you figure out how to make it most effective yes hi hi I believe I'm first of all I love the fact that you're talking about the notes that come on the electronic medical record I know that nurses often do the right thing but they don't always document so how do you drive documentation in order to do your work do you think natural language processing meaning the notes will make that better yeah so I have long said that clinicians need to dictate in binary but no one listens to me ah not to be clear I mean the point is if we in the analytic space think that we can dictate what gets documented and how what gets documented will be dictated by the incentives the incentives are dictated by policy it's our job to not wait for a perfect world and if you want to chase the windmill and try to get nurses to document or more effectively I can tell you that many of really good people has lost a career to that and I just I didn't feel like waiting right and I think if we're going to start using the data we have clinicians are furious with the quality of electronic medical records in the experience that they're having with it I just recently wrote an article where I made the argument that analytics can help maybe make it a little bit more worth it by taking the data we have today and turning it into insight to help doctors provide higher quality care and that's an opportunity as opposed to complaining about the lack of quality but you cannot do that using reports on queries or sequel right you can't do that with traditional statistics so if we're going to move now you have to be able to accommodate messy inaccurate unstructured data which gets back to my argument that we need to start using these technologies and gets back to not my argument but the objective fact that we have 40 years worth of empirical evidence suggesting that we are past the time or the challenge of proving that this stuff can work hi hi thank you very much it's a very great presentation thank you sir and also the work you are doing for this field and you are done and you're doing so far it's excellent I appreciate it probably this question is more subjective the question is in the field of machine learning right so data is all about and there is what data matters a lot so the more data you have the more accurate results are more trends you're going to get so in that line all our data is mostly related to icd-9 so far and we have been moving to icd-10 right so if you take icd-9 codes and do your training sides mostly I'm talking to letter to supervised learning and you apply this on a icd-10 set how you're going to get an accurate data visits yeah so there's a couple of loaded issues in there number one is the assumption that we should be working with claims data at all and I would say that it is a piece of the puzzle there's also a number of studies that have shown that your assumption of more data leads to better predictions is not necessarily accurate and here's why I say that studies have shown that having 10,000 versus a billion rows of claims data doesn't matter depending on the problem right because claims data is such a loose surrogate of what's really wrong with a patient that having more of it doesn't necessarily get us past this rather disappointing area under ROC that we see time and time again between point six eight and point seven five and we play games by focusing on the top decile and saying hey we've got a point eight five there and maybe that's okay depending on what you're trying to do with the results so what we have found in that example with trying to predict which members are likely to disenroll or even the work we're doing right now where we're focusing on preventable readmissions for five different disease categories as part of a case management effort we're using claims data and we're using home visit notes HRA ADL's Likert scale Survey data like whether or not a person is capable of being mobile or toileting themselves and that is where real value is gained it's it's not in having more and more and more claims data we can do really accurate work with only ten thousand five thousand and and the false study only three or four hundred if the data is richer and wider as opposed to being narrower and longer so the icd-9 to icd-10 problem in machine learning it is absolutely the truth that if you train on apples if you train on EMR data you can't just apply it to oranges icd-9 claims data right so that is a limitation of these approaches is if you're going to try to prevent readmissions whatever data you bring in is the data you then need to apply it to and so what everyone's had to do that's work with claims data for any problem is they've had to have mappings from icd-9 to icd-10 right but here's the thing if you're using a rules-based approach to map from icd-9 and icd-10 and then you're applying the rules based logic to icd-10 be aware that every mistake you made or your clinicians made in capturing that code and then you have the analyst and mapping it propagates so that listen don't be surprised if you go from 13 and a half thousand codes that are between 20 and 50 percent accurate to then adding another 50 5,000 codes your accuracy is going to drop off a little bit especially when the people who are actually coding this stuff hate the fact that they have to do it and you won't be surprised to learn that I was at a conference the other day and a clinician stood up and said I won't name the EMR vendor but they said no matter what you do if you're using this EMR tool never admit that you're dealing with a diabetic because you're still in the next 20 minutes clicking yourself out of the EMR to get back to focusing on the patient that's a good question these are real problems yeah my second question is those who are planning to are willing to try this machine learning using the open-source tools I'm sorry if it is the wrong question for you what kind of tools that you recommend especially for healthcare one is I mean in terms of machine learning probably you can choose some of the libraries that are available and the second part is NLP is definitely a major factor in healthcare so what kind of open sources that are you recommend for especially for healthcare yeah I'm going to be here for a couple more hours and it depends on the problems you're trying to solve so if you want to come see me I'll lay out my favorites okay thank you problem a lot of smart people work at institutions that have regrettably adopted epic and I'm wondering what you'd have to say about how do people want to take initiative to use analytics to solve problems or what an institution that uses epic practically speaking get at the data do the work without going through years of administrative tasks yes you're trying to get me to step on a landmine and talk poorly about effort and so um here's what you need is what everyone needs time as I've spent years right there are every single organization I have joined and problem that I have chose to focus on started with one question will I have access to data and surprisingly the answer is not always yes and so you have to be very strategic about where you work and what problems you focus on the reason for that and I am the reason why every IT department in healthcare right now is not super excited with the requests of individuals even high-level individuals trying to get access to the data is because of this incentivization of the rapid adoption of electronic medical records we still get paid for keeping these systems running and for seeing more patients more quickly the vast majority of healthcare is still paid in a fee-for-service model in that model if you can collect seventeen thousand dollars to adopt epic I can tell you that epic and that hospital are working as hard as they can as fast as they can to simply install and not break anything and so they just don't have time to focus on the things that don't matter like improving the quality of care yeah move to Canada I mean we're here no it wasn't a if the FD incentive is are to be the FD incentive isn't there it's really hard to get organizations to give the data I get a huge kick out of people talking about data sharing as though all hospitals good I mean they don't even share it with the people inside the institution that are trying to improve care yet but if we keep going to value-based care not they won't do it out of the kindness of their hearts they'll do it because their competitive advantage is entirely predicated on their ability to use the data to become more effective and I you know I mean over a beer we can talk about how we can trick some people in your organization but I don't I don't know how far to go without [Applause]


  1. You can’t quantify people in care

  2. This guy is out of touch with health care

  3. Awesome anser on : What are some of the best resources for learning machine learning, specifically as it relates to the healthcare industry
    read here :

  4. lol the picture for the great recession at 8:18 shows the german stock market with the well known Dirk Müller. I wonder if he knows where his picture is shown 😀

  5. hello , I am working on data mining in health care but i am not able to implement it..will u suggest something ?

  6. Science give people more oppurtunities while math and measurement make things more challenging. When we can measure and compare the total valur of our tasks it takes away our ignorance of being unproductive and force us to work harder

  7. I am new in Big Data Analytics and healthcare…and I would love to grow even more….right now I work as an Medical Safety Surveillance Analyst. I love it but I really wish i could grow more…Any thoughts?

Leave a Reply

Your email address will not be published. Required fields are marked *