Skip to content

Should the NHLBI fund data collection and data analysis separately?

May 1, 2012

Would it Checking Blood Pressurebe more cost efficient and maximize productivity to fund data collection separately from data analysis in NHLBI-initiated projects?  Consider the National Health and Nutrition Examination Survey (NHANES) model: the survey content is determined mostly by government agencies, the data collection is performed by government contractors who are highly experienced in data collection, limited analyses are conducted by the funding agencies, the public-use data files are rapidly available for distribution from easily accessible public websites, and most analyses are conducted by researchers outside of the government.

What would be gained if we adopt a system that separates data collegraphs and chartsctors and data analyzers, funding the strongest bidders to each? What would we lose?  Would separation of functions produce better science?  What conditions would need to be satisfied to make the funding structure optimal? 

Posted by the Epidemiology Branch, NHLBI

10 Comments leave one →
  1. Matt Gillman permalink
    May 1, 2012 2:49 pm

    From an epidemiologist:
    Separating data collection and analysis works well for simpler data sets with excellent documentation; NHANES is a good example. And pretty well for bioassays. Often in complex or longer-term studies, the data collectors have a nuanced view of what each data element can bring to the table. The cross-talk between the collectors and analyzers, I believe, helps to avoid threats to validity. I don’t subscribe to the tack some projects have taken–post all the data and let the analyzers have a crack at them, assuming that seemingly contradictory results will come out in the (long-term) wash.

  2. Steven R. Levine, M.D. permalink
    May 1, 2012 3:07 pm

    For large national studies (NHANES). I think it makes good sense to separate. For local studies where data is collected in one city, medical center, or even region, not sure it is the best model. Investigators that design and conduct the study should have a large role in the analytic plan and writing up and presenting the data.

  3. May 1, 2012 4:43 pm

    Barry Popkin There are two traditions. The epidemiology-biomedical way is to control and limit access to data to give the group collecting all advantages. But if funded by public NIH funds, then the data are underutilized and science loses. I come from a tradition of social science sharing of data where the analysis is what counts and let competition for ideas win out in use of the data.

    I handle as separate data collection and dissemination two huge multipurpose complex surveys–the China Health and Nutrition Survey and the Russian Longitudinal Monitoring Survey. The former now will be disseminating an array of fasting blood and toenail biomarkers and later a full GWAS along with a vast array of intermediate behaviors [diet, physical activity/inactivity, smoking drinking] and measures at the individual, household and community level. It has 10,000 downloads of data (multiple by some-this represents unique users). This is as complex as any survey. Separating data collection forces the data into the public domain and then allows scholars, myself included, to compete for analysis. I have an advantage but the usage for this survey is so far beyond other cohorts in the US just because the data are free immediately in the public domain and there is no control over how to interpret each piece of analysis. I believe that when analyses are tied into collection, dissemination is stifled always and science loses.

    Epidemiology has a long tradition of living off data each group collects and not sharing them. This means the public goods–the funded data-are used in very narrow and limited ways by others. For instance, CARDIA and all the other cohorts, if they were open data sets, with many confidentiality controls, they would certainly be more widely used. And there are many cohorts that are only used by the scholars involved though they are supposed to be open to others. I believe this is a misuse and abuse in some ways of public funding.
    I suggest NHLBI try the other option for one of its cohorts. NHLBI will find amazing differences in the science that emerges. For example my China data are used for hundreds of dissertations each year by scholars who would not have access to such rich data as they are not at the privileged institutions or with professors with access to the data.

  4. May 2, 2012 3:44 pm

    Pushing this debate a bit further. Why not NHLBI positions itself as a worldwide prime generator of, open source, data related to cardiovascular (and other chronic conditions). Very good examples arise from the Demographic and Health Surveys (DHS) that have promoted large usage of data for various sectors, not only health. See:

  5. NHLBI Moderator permalink
    May 3, 2012 4:38 pm

    Thanks for the excellent discussion on this important question, please continue.
    For those who are unaware, the NHLBI has an extensive list of data sets that are available on request to researchers. Please see:

  6. Denise Simons-Morton permalink
    June 3, 2012 8:25 pm

    Caveat: opinions are solely my own.

    Separating data collection from analyses might make sense–if there are no hypotheses to drive the data collection. Once you hire data collectors who do not need to develop the data collection design and measures based on a research purpose, or on hypotheses, it changes the whole enterprise.

    One could, of course, collect every measure one can think of–and can afford. And epidemiologists can ask a research question of a dataset, or go on a”fishing expedition” looking at data this way and that to see what pops out. These happen all the time.

    But shouldn’t we always have at least a few hypotheses to warrant collecting the data in the first place? Wouldn’t we get more out of research if it had some focus?

    If the data collectors have or need a scientific rationale for which data to collect, they should have the first opportunity to analyze the data. Oh…that’s what is done now. But maybe if you want data sharing, you could make the datasets available. Again, what is done now.

    Maybe there should be easier access to the datasets and more outreach to encourage their use. But I’m skeptical that changing the whole system, which has been highly successful, would improve the science.

  7. June 5, 2012 2:41 pm

    I would also be very careful about making a wholesale change along these lines. When investigators and analyses are divorced from data collection, there is risk that the data will not be of sufficient quality. I agree with Matt Gilman that this is a reasonable model for small, well-defined datasets using well-established methods and prescribed quality control. More innovative studies, however, may be exploring new methods and the analyses will help to define the data collection.

    Certainly this is true now for data that are being collected from electronic medical records. Clinically-collected data are full of problems and the analytic process is reiterative.

    Many years ago I worked with an economist who wanted to analyze data we were collecting in a large prevention trial. He did not want to include anyone from our study as an author on the paper that would result. I learned in that process that economists are used to analyzing standard datasets provided publicly and cared not a whit about the quality of the data. Perhaps this is why economics is called the “dismal science”.

    Also keep in mind that analyses of fully public data, such as NHANES, is not coordinated. One can start an analysis and spend a fair amount of time on it only to see it published by someone else. That discourages use of such data.

    I do agree that data that are collected with public funds need to be made public after a reasonable delay to allow the primary investigators the first chance at publishing them. This is the model for many of the current NHLBI-supported epidemiology studies such as ARIC. I agree with Denise Simons-Morton that there may be a need to publicize these datasets more widely.

  8. Christopher Sempos permalink
    June 6, 2012 4:21 pm

    To a certain extent doesn’t NHLBI already have the ability to fund data collection and analysis separately with the R01 and R21 mechnisms? I do not know if any additional method for separating the funding of them is necessary. In any event, what I do believe would be helpful is for NHLBI to encourage PI’s to increase their percent effort on R01’s. In addition, try to develop a system which discourages large numbers of Co-Is on R01’s. Besides trying to provide funding for colleagues, padding of the “budget” with collaborators is done to short-circuit review criticisms that the needed expertise is not there. If PI’s could obtain more additional support from a single R01 they might have more time to read, think and come up with thoughtful ways to look at the data they have collected.

  9. July 6, 2012 9:37 am

    With the growing availability of publicly available data, funds for secondary analysis would be very helpful, and could be very productive. The cost of data analysis is much less than the cost of data collection, yet large amounts of data are unanalyzed, even in data sets such HCUP. Increased funding for secondary data analysis may be one way to produce value for fewer dollars.

    Some methodologic issues may arise as larger and larger data sets become available. I notice reviewers continuing to emphasize statistical significance in huge data sets. This is a mistaken attempt to implement what they learned in stat 101, when they learned how to interpret studies with sample sizes in the 10s and 100s. In large, large data sets we often find everything to be statistically significant, and the challenge is to identify meaningful trends or differences, and interpret the differences. Funding for these and other methodologic issues would be very constructive.


  1. What you have told us. «

Leave a Comment

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s