Skip to content

The Electronic Medical Record – What is it Good For?

April 10, 2012

Most larLaptop and Stethoscopege cohort studies supported by NHLBI have recruited participants from communities.  Over recent decades, we have seen increasing use of large health plans to assemble cohorts with data collection based on electronic medical records (EMRs).  In what settings can data obtained in this way substitute for the more precise phenotyping characteristic of the traditional cohort study?  What types of research questions can be better explored in electronic medical records and what types better explored in population-based cohort studies?

We welcome comments on the wise use and pitfalls of EMRs for epidemiologic research.

Posted by the Epidemiology Branch, NHLBI

18 Comments leave one →
  1. Richard Cooper permalink
    April 13, 2012 1:14 pm

    As an extension of the previous discussion on new study design, I strongly believe the EMR will be CV epi of the future. I would like to share an experience in this regard.

    For the last couple of years we have been working with a group in the province of Valencia, Spain. The total pop of 10 million is fully registered in a single data base – including all out patient visits, lab data, pharmacy records, hosptializations, of course death records, and in addition social information about unemployment, disability, etc. We have written a couple of manuscripts about what I call studies of the whole population, where one can estimate prevalence of risk factors, incidence of new events, outcome of procedures, etc etc on the whole population, including children. In this region the pop is very stable, so there are life course records. i think this opportunity is relatively unique. While Scandinavia has many population based registries and cohorts, and the UK has a volunteer data base of EMR and of course the biobank, none include the whole population. In addition, in Valencia they have sampled sub-groups for more intensive specialized exams and follow-up.

    I don’t see a way to post papers but if anyone would like a copy of those that are published and the ones in draft form please contact me at Also, the main problem in Spain at the moment – in addition to the economic crisis – is the weak scientific infrastructure, so if others would like to assist in this effort I think they would be welcome. I realize the US doesn’t have this system, and will not likely have one in the near future, but I believe much can be learned.

    Of course enormous problems must be overcome, not the least of which is confidentiality and quality of data, but that is the nature of new research.

  2. April 14, 2012 1:48 am

    EMR has limited use and should be used mostly for exploratory searches for new hypotheses testable through more rigorous research. A few potential uses of the EMR are:

    1. identifying cases in CC studies for rare or unusual conditions. This is hard to impossible to do with cohort studies. Medical Records are useful here.

    2. Exploring data bases for relationships between exposures and outcomes that are likely to be large. However, care must be used in interpretations because of large biases involving referral for care.

    3. Getting incidence prevalence of cases that will be predominantly hospitalized. For example, For example serious pediatric congenital conditions. Here also, cases may be missed because of non referral.

  3. Richard Cooper permalink
    April 15, 2012 2:09 pm

    I don’t entirely agree with Bob. As the quality of the record improves the uses will extend well beyond the search for cases. This is very clear already in the VA, although of course the sample is not helpful for population inferences. Extensive use has been made of the national EMR in Iceland for genetic studies, including families, and there seems to be good validity. In Spain patients use the outpatient clinic 3-4 x’s a year and very large proportions have basic screening, like blood sugar and cholesterol. While there will always be some biases, and some of those biases can be controlled, it would seem to me that samples of that size can provide robust estimates of risk factor levels and their trends. All surveys, including NHANES, have only about 60% participation. A universal EMR has much higher participation than that. Furthermore, by drawing even smallish random samples from the lists one can determine whether the total EMR is biased, and address more refined questions.

    I think the days of a large research cohorts – like ARIC, F’ham etc – are over. There are no etiologic research question in CVD left that can be – or need to be – answered that way. (I realize that is a bit of ‘strong hypothesis’ . . .). Two main issues that universal EMR can adress are: 1) to address Lenfant’s plea that we “use what we know”, ie, monitor quality of care, and b) find unexpected associations requiring large samples, as the recent discovery that statins increase risk of diabetes that emerged first (I think ) from the population data in Denmark. The large consortia for genetic studies such as GIANT and ICBP also rely heavily on EMR-based samples – in Europe of course.

    The US is at a huge disadvantage here, and needs to partner with European systems. While the NHGRI eMERGE project is building reasonably sized data bases with extensive lab and imaging data that will soon be a tool for at least one level of EMR-based analyses these are hospital-based samples.. Nonetheless, from what I can see almost all of the new and interesting findings in CVD are coming out of studies using Medicare data, etc.

  4. Veronique Roger permalink
    April 15, 2012 8:16 pm

    I think that Richard shares an appealing vision while Bob depicts today’s reality. Electronic medical records (EMR) present both challenges and opportunities to the community of researchers doing epidemiology work.
    Potential advantages of the EMR are intuitive including the efficiency and reduced cost of data collection, the ability to carry out longitudinal follow-up, and the vision of linking EMR data with biospecimen repositories for genomics, proteomics and other bio-marker studies.
    While all these constructs are appealing, research within an EMR system will require a robust standardization to be fully operationalized and will be hindered by the fragmentation of care unless it is deployed in a single record system. The reliance on medical documentation for the ascertainment of certain types exposures and outcomes will inherently underascertain patient-centered measures, thereby limiting the scope of the research that can be performed.It is important to delineate the goal of use of EMR: case finding, ascertainment of exposure or outcomes. Outcomes ascertainment, particularly for death, hospitalizations, and healthcare utilization can be captured with reasonable accuracy from the EMR. The characteristics of certain exposures are more problematic and selected examples include, but are not limited to, measures of physical activity, many behavioral characteristics, and biological or imaging markers that are obtained through clinical settings and therefore can only occur through a clinical order and hence our subject to our referral bias. In the current era, EMR-based research would seem to be best combined with partial manual data collection to balance efficiency and accuracy.

    • Phil Greenland permalink
      April 17, 2012 12:48 pm

      I think Dr. Roger has summed this up nicely. I agree with her (the pros and the cons). Phil

  5. Richard Cooper permalink
    April 17, 2012 1:21 pm

    I don’t want to take up more than my fair share of space here . . . but I think what has made me excited about the potential for EMR is to see a future with universal coverage. That doesn’t remove any of the obstacles – well described by Bob and Veronique – but in my view it transforms the potential value/power of such a system. Obviously such a system would need component parts – biobank, structure for recruitment for trials, cohorts with specialized measurements etc.

    So there are two key ideas which have not been part of population science before – 1) “epidemiology of the whole population” , and, of course related, 2) a single system. In the US we create endless, separate overlapping systems for surveillance and analysis (Nat Hosp Discharge Survey; NHANES; MEDICARE files: Biopreparedness networks, etc). These are extremely inefficient and incomplete. The class of observations that then become possible (temporal trends in CHD events associated with air quality) would have value not primarily by adding to our knowledge base of individual-level attributes and associates, but aggregate effects. In as sense, it would reconnect epidemiology, public health and clinical research into a single discipline.

  6. Joe Coresh permalink
    April 24, 2012 1:38 pm

    We should combine Epidemiologic Cohort Data with EMR data

    I’d like to propose that EMR systems and even more importantly real time data collection techniques (soon everyone will be carying a device nearly everywhere) are a wonderful opportunity to enhance our epidemiologic cohorts. A hybrid model with gold standard validated data collection during visits which inherently are infrequent can be complemented by interim data collection using the increasingly available array of tools.

    In many ways, this isn’t “new” since the large NHLBI cohorts have used medical chart data to obtain events and adjudicate them for a long time. The opportunity is to get more detailed data and often the “raw” data (echocardiograms, angiograms, CTs, MRIs, lab results) in a much more efficent manner. Likewise to the extent that outpatient data become more uniformly available they can be integrated. Finally, electronic data collection in the home and during people’s daily activities is becoming increasingly feasible in terms of receiving, processing and analyzing the full volume of information. I think we need to test a wide range of methods for data collection and models for consent and high participation — my cell phone provider knows where I am all the time and has nearly no limits on their use of the data (probably not optimal), should medical researchers be able to access such data? can the rules for merging databases for epidemiologic research be simplified with more uniform guidelines on data protection?

    I think the hybrid model will become increasingly more powerful with increasing technology.

    “Pure” EMR data can lead to tremendously large studies which provide generalizability, power and subgroup analysis. We should be cautious that most standard statistical techniques don’t formally acknowledge sources of error other than sampling error. In the real world as sample size increases, data errors often increase. I like the idea of looking for consistency across study types. In looking at CVD risk and mortality for the basic associations in the CKD Prognosis Consortium we found impressive consistency across different types of studies (administrative EMR with labs, cohorts and trials).

  7. Steve Sidney permalink
    April 27, 2012 12:45 am

    I generally agree with the Dr. Roger’s assessment. EMR data certainly has a great deal of utility now and potentially much more utility in the future. EMR systems are extraordinarily complex. Assuming that data are entered into the EMR accurately, the utility of the data is dependent on the careful efforts of trained computer programmers and data analysts who ideally have input from the users of the data, e.g. clinicians and researchers. I have had the opportunity and challenge over the past few years to oversee the development of a large cardiovascular disease database from 15 health care systems from around the United States for the purpose of surveillance and for establishing a platform on which to carry out comparative effectiveness and disparities research projects. The amount of effort that has gone into assuring data quality and in resolving data quality issues from some of the sites has been enormous. Many, if not most, electronic diagnoses, benefit from validation efforts which still require individual case review. As the EMR becomes more sophisticated and techniques such as natural language processing mature, we may overcome much of this need. Using data from multiple sites for the same purpose requires standardization of the data elements.

    The bottom line is that the utility of an EMR database for research is dependent on the efforts that are made to assure high-quality data. This being said, once these issues are ironed out, the utility of EMR data-based studies to address a number of etiological and comparative effective research questions is quite substantial. As others have said, these databases can also be used to identify individuals or cohorts on which to perform additional data collection of variables that are not commonly part of the EMR.

    The challenge of cardiovascular disease / risk factor surveillance deserves special mention here. Last year, the Institute of Medicine published a report titled, “A Nationwide Framework for Surveillance of Cardiovascular and Chronic Lung Diseases.” The report recommended that that the Secretary of HHS establish and provide adequate resources for a standing national working group to oversee and coordinate cardiovascular and chronic pulmonary disease surveillance activity. As EMRs develop and become a source of data for tens of millions of people, the potential to use them for broad-based CVD surveillance will increase substantially. Besides the issues I noted earlier, much of the challenge in using ERMs for this purpose lies in the question of the defining the denominator. In our work with the 15 health care systems, we have calculated the number of years of membership in a particular stratum (e.g., 35-44 year old men) in a given year as the denominator as for determining rates. In most instances, membership is recorded in the EMR on a monthly basis, so that, for example, 4 people who each add 9 months of membership in a given year would provide 3 person-years of membership (4 people x 9 months = 36 months). In other words, you might have 180,000 people who were members in a given year who provided 150,000 years of follow-up. So the issue of starting and stopping membership in a given health care system provides an interesting question in the interpretation of data for surveillance. I believe that the opportunities and challenges for surveillance provided by the ever increasing number of EMRs should have serious discussion. The NHLBI, which has conducted workshops on CVD surveillance in the past, or the CDC should consider calling for a meeting to address the current state of CVD surveillance and these new opportunities and challenges presented by EMRs.

  8. RIchard Cooper permalink
    April 28, 2012 8:10 pm

    I was unaware of the report on the “National Framework . . ” that Steve pointed out – but I certainly should have known about it. Having just read it, obviously it summarizes much of the background information and thinking required to address the question being posed in this blog. LIkewise, it addresses in substantial detail under “emerging data sources” the issues that were discussed here related to the EMR.

    It seems to that the report reinforces the impression I had that two separate questions were really being discussed here. The first is “where are we now?” In that context the EMR and HIT are clearly limited, and as Steve points out only a hybrid system – relying on both routine electronic sources and “research sepecifc curating” – could give data of adequate quality. The second question, however, is more about “What is likely to be the situation in 5, 10 or 15 years?” In terms of the second question it just seems hard for me to believe that the models used in cohort studies, registries, etc that have been the bread and butter of CVD epi for 50 are likely to remain viable.

    Having said that, as the “National Framework” report forcefully points out, without a unique identifier for a patient and without the capacity or willingness to share data, any attempt at creating an electronic system with suffiient, if not universal, coverage is unthinkable. From the research perspective that obstacle ranks up there with the challenge for actual reform of the US system in general . . . .

    But if we are trying to think beyond the constraints of existing systems, I find the UK experience (THIN – the 454 linked practices) or the Spanish system (unique identifier, single database for all aspects of health care for the entire population) concrete evidence of what the infrastructure for CVD research must someday look like.

    FInally, one last brief comment on the primary obstacle that researchers see (quite correctly) in electronic systems – data quality. Research studies or national surveys (like NHANES) can obviously produce high quality data for many traits. However, given our interest in exposure-outcome relationships, we overlook the fact that these samples are always highly biased (eg, much higher survival rates than the general population). Second, many of the questions we are interested in in CVD epi have been answered (ie, we don’t need additional high quality data), and many of the ones that remain cannot be answered with methods currently used in surveys. For example, much emphasis is plaed on “physical activity guidelines” and monitoring what proportion of the population meets them. Surveys normally suggest that ~ half of the population meets the guidelines; however when objective measurement is made (eg, with accelerometery in NHANES), the guidelines are being met by only 1 or 2% of the population. I realize this is a broad generalization, but the point I am making is that there is an equally urgent need to ask hard questions about the quality and usefulness of data generated by the myriad of current surveys and surveillance systems described in the “framework” report (eg, BRFSS, etc).

  9. NHLBI Moderator permalink
    April 30, 2012 11:45 am

    The comments from this question have been exceptional and have stimulated considerable discussion within the NHLBI. As mentioned, there are advantages and disadvantages for research at our current level of progress in establishing uniform, valid and comprehensive electronic medical records (EMR). We encourage continued discussion, not only on the ideal possibilities, but on the practical reality of EMR development.

  10. May 14, 2012 1:09 pm

    I generally agree with the comments from Veronique Roger and Steve Sidney. The EMR presents considerable potential for research, but it does not come free. There are few incentives in the clinical setting to insure that data are accurate or complete. Some recent regulation–such as Meaningful Use–may improve this situation for select variables, but the change is likely to come slowly. Veronique makes a very important point about variables that are missing from the EMR, such as behavior and, often, counseling encounters. Natural language processing can extract some of these data, but the quality and completeness of medical notes is even more problematic than variables such as blood pressure. Advances in recording of patient-reported outcomes will lessen this problem, but without mandates and very efficient ways to collect these variables, this will come slowly. Primary care clinics are overwhelmingly busy these days, so medical assistants are already pressed to take a good blood pressure and ask about smoking.

    Another limitation is the dependence on claims data, especially for hospitalizations, even in fairly fully-developed integrated care systems, because at least some hospitals are not owned by the system.

    I also want to emphasize Steve’s point that obtaining the data that are there is not a push-button enterprise. Every data run discovers problems and issues that must be resolved. Talented and experienced programmers are needed. It is certainly less expensive than a traditional cohort study, but it may not be as inexpensive as NHLBI hopes.

    Richard is correct that the future potential for using clinical data for research holds great promise. The current political climate makes the advent of universal coverage seem quite remote, and until that happens, there will be many problems, including bias from under-representation of disadvantaged populations (although as Richard points out cohort studies and NHANES suffer the same bias for different reasons). I chaired a panel on surveillance for NHLBI some 20 years ago and we made recommends similar to the IOM (though less comprehensive), but nothing came of it, presumably due at least in part to privacy concerns at NCHS. At the time I felt that the future of epidemiology would shift to European countries with universal coverage and national identification numbers (Sweden is another example). That hasn’t entirely happened, but we do risk slipping behind.

    I do not want to be too pessimistic, however. Using EMR data for epidemiology (and clinical trials) is possible, even now, but with realism and careful selection of appropriate studies and outcomes. Going forward, we need to help the clinical systems develop ways to collect data that are more valid and reliable but no more time-consuming or expensive for them.

  11. May 19, 2012 9:32 am

    I am surprised at the amount of skepticism towards EMRs mentioned by so many of the discussants. I come from the perspective that it is inevitable (already happening) and kind of a no-brainer.

    Some of the trends driving this are cost related – the need for large numbers (as in pharmacoepidemiology) to assess adverse events (as in the FDA Sentinel project), or to do sub-group analysis. It isn’t feasible to design and implement a cohort study for every research question or topic, and perhaps efforts to upgrade the quality of data in EMRs will yield results in many areas of health care and public health.

    Another IOM report that touches on this issue (indirectly), The Clinical Trials Enterprise in the United States: A Call for Disruptive Innovation, ( envisions research converging with practice through the use of EMRs.

    At minimum, we might say that an EMR study should always precede more expensive study designs such as cohort study or randomized trials, and then identify the factors that might require moving from the EMR study to the cohort study or randomized trial. It would be interesting to know how many times, results from EMR studies lead to further studies, and how many times EMR studies resolve the discussion about a medical issue and lead to clinical guidelines or policy decisions.

  12. Denise S-M permalink
    June 3, 2012 12:41 am

    Caveat: my comments represent my personal views.

    Issues for me are practical and depend on the purpose of the research. I am skeptical, as a couple folks mentioned, about the quality of EMR data. We know from experience with study adjudication committees that MI, HF, stroke, etc are not measured/diagnosed consistently, nor is cause of death. Adjudication committees in clinical trials spend hours reviewing charts to assure accuracy of the data. It’s not clear why we would think those very same data are useful for research–without the adjudication. Think of the 2-3-hour or so measurement sessions in cohort studies like CARDIA and ARIC. How can clinically-derived data possibly match the accuracy and quantity of those measures?

    Perhaps huge sample sizes could make inaccurate data useful–and smooth out the variations. Regardless of sample size, data from more than one EMR will need to be combined and formatted for statistical analyses.

    Then there is the study purpose. Conceptually, I don’t see how EMRs could be useful for etiologic epidemiology looking at causes of disease. Apparently-healthy people–who you want to measure in order to examine disease onset–don’t go to the doctors all that frequently. Also health care doesn’t measure and record things you may want to study in etiology studies–behavioral risk factors like diet or physical activity, stress, environmental factors.

    I can see using EMRs for clinical epidemiology to examine factors associated with prognosis in patients who have disease. Diagnosed patients do go to the doctor (where the data are collected) and studies could examine which factors are associated with better prognosis–characteristics of care provided as well as patient and disease characteristics. Maybe EMRs can be used to examine factors associated with delivery of evidence-based care derived from systematic reviews and clinical guidelines. Factors that could be examined include physician and setting characteristics, access to guidelines, and local efforts to encourage, incentivise, or remind physicians of evidence-based recommendations — like reminders built into the EMR and other clinical-decision-support tools.

    So EMRs may possibly be useful for studies that focus on patients with disease and on healthcare delivery. In other words, clinical epidemiology, health-services research, and clinical implementation research. I’m not sure about other types of studies. And I’m not sure using EMRs would be less costly, because larger sample sizes will be needed, nor as robust, because of inaccurate data.

  13. June 6, 2012 9:44 am

    I suggest that readers view the webcast of HealthDatapalooza (going on now) to hear the latest apps developed for our world – many using electronic medical records. Kathleen Sibelius just spoke about the impact of electronic medical records on diabetes outcomes. Kind of funny that the conversation on this blog is taking place while, the world is racing ahead!
    Here’s the link to the webcast:

  14. July 6, 2012 9:28 am

    People might also take a look at the webinar from the recent PCORI meeting, National Workshop to Advance Use of Electronic Data – Interesting discussions, and relevant to the discussion here.

  15. July 8, 2012 2:16 pm

    An example of some of the methodologic work taking place with regards to electronic medical records. The example is from the field of rheumatoid arthritis, but similar work must be taking place regarding cardiovascular conditions and endpoints. JAMIA (Journal of the American Informatics Association) has a monthly journal webinar series. This month they discussed an article, “Portability of an algorithm to identify rheumatoid arthritis in electronic health records” available online at: The webinar is available online at

  16. November 9, 2012 12:16 pm

    I very much agree with Veronique and Denise regarding data quality of EHR. The majority of EHRs in the US now use EPIC. This is because it is the “best” EHR on the market yet is very un-user friendly, non-intuitive, and non-uniform. Because it is so clunky, individual medical systems, such as Kaiser, do their own layered on programming to fit their needs. Measured productivity in my medical system continues to be 20% less at three years after implementation. Given the economic issues at play in healthcare, this results in regular errors and non-sensical information accepted in the interest of time by staff, nurses, even docs. Did you know that Epic was started by an opthamologist and his daughter rather than computer programmers? When I have discussed this with software companies, they say that there are not enough users (medical systems) to generate a profit compared to games or business software, so they have no incentive to get in the game. Also, EPIC is a closed system, e.g. not open to computer folks innovative apps that folks might do for free out of desperation?

    MY SUGGESTION – NHLBI should commission and fund a major initiative to unify and standardize US EHR. It is only our healthcare system (and biomedical epidemiology research) at stake.


  1. What you have told us. «

Leave a Comment

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s