Data Coordination Center Best Practices

March 26, 2015

A Case Study from the Li-Fraumeni Exploration (LiFE) Consortium

The success of any multi-site collaborative research project involving high volume or “Big Data” requires an effective data aggregation and harmonization strategy built upon community consensus and rigorous project management. Data Coordination Centers (DCC) serve to lift the burden of implementing a data aggregation and harmonization protocol from the individual research institutions within a collaborative project. However, there is very little published guidance available to establish appropriate protocols for DCC development and success.

The application of DCC practices to large-scale, multi-institutional cohorts strengthen epidemiological studies by generating validated and standardized data about the relationship between various exposures (e.g. genetic, behavioral, environmental) and the disease or condition of interest. DCCs help to strengthen data quality, validity, and precision, in addition to allowing for more complex analyses to determine the associations between various exposures and diseases/conditions.  Pooling data from multiple sites within a consortium through a DCC, especially for rare diseases or conditions, adds to the scientific rigor of resulting studies and provides tremendous value for retrospectively studying diseases with long induction or latent periods, such as cancer.

ESAC, Inc. has substantial experience in the development of cancer DCCs including the Clinical Proteomic Tumor Analysis Consortium (CPTAC) DCC, and the Breast and Colon Cancer Family Registries (BC-CFR), and has most recently applied successful strategies and lessons learned to the development of the Li-Fraumeni Exploration (LiFE) Data Coordination Center, funded by the National Cancer Institute. The LiFE DCC serves as a central data repository for researchers to pool and analyze their collective data gathered from patients afflicted with a condition known as Li-Fraumeni Syndrome (LFS) – a rare inherited disorder leading to a higher risk of developing multiple cancers throughout one’s lifetime.

Since October of 2013, ESAC has worked with seven Li-Fraumeni Exploration (LiFE) Consortium sites to develop a DCC focusing on the following challenges: collaboration and communications infrastructure development; data harmonization and management; data use, sharing, and access; database and web portal development and implementation; and other necessary software tools development. ESAC worked with participating LiFE sites in collecting and storing tools, protocols, designs, data dictionaries, and processes for collecting descriptive data and metadata from all participating LiFE Consortium sites.

Achieving the goal of large scale harmonization for all data collected across all participating sites required an extensive time commitment from the participating LiFE Consortium sites and the LiFE DCC developed by ESAC, Inc. working together through frequent meetings and communication to establish standards and consensus. The LiFE DCC staff worked with technical representatives from participating LiFE sites to retrospectively and prospectively harmonize all data variables into a single LiFE DCC data dictionary, which involved defining data fields and establishing the format for data entry into each field of the data dictionary. Once all data was submitted by each LiFE site, ESAC began populating a relational database (the framework developed using the approved LiFE DCC data dictionary) for data pertinent to the harmonized variables. Database development included the creation of modularized components as described by NCI and LiFE PIs, loading of data, quality control testing, tracking of data versions, and incorporation of new prospective data. The LiFE DCC, in coordination with the LiFE PIs, can generate tailored datasets for collaborative projects (by approved internal and external investigators) supported by the LiFE Consortium.

ESAC also provides training to LiFE consortium investigators, technical representatives, and NCI personnel to familiarize them with LiFE DCC’s tools and resources. A web-based query tool to explore summary data and establish feasibility of planned collaborative studies was instituted by LiFE DCC staff. The LiFE DCC also maintains a web portal to house and manage consortium-related public and semi-public information including publications, ongoing projects, tracking of data use requests, descriptive data tables, and related resources. This resource serves as a tool for LFS researchers to analyze the latest aggregate research data to understand disease mechanisms, which will ultimately enable scientists to target new therapeutic approaches to fight this rare disease. We hope the DCC proves to be an invaluable resource to the larger LFS research community, allows medical professionals and researchers to realize their goals of attaining a more complete understanding of LFS through this shared resource, and assists in conducting effective studies which can produce better treatments for individuals and families afflicted by the condition.


Leave a Reply

We are quite impressed with not only by the technical skills provided by ESAC's software engineers but also by the depth and breadth of their subject matter experts who have provided guidance on standards and interoperability, quality assurance, and project management issues.

COTR
HHS/ONC

ESAC's technical savvy and personable staff provides tremendous value to projects involving multi-center coordination and high dimensional data management. They merge a fundamental understanding of biology with expertise in data quality control and data security. Integrating all of these factors is key to delivering a secure and fast data portal for the scientific community.

Program Manager
NCI

ESAC significantly improved the quality of bioinformatics applications of the NCI CBIIT program using industry best practices to ensure systems were validated both functionally and scientifically for use by the community.

Program Manager
NCI

ESAC continues to be an exceptional collaborator providing on-site contractors with both the expertise for building new tools and data integration platforms, and the subject matter expertise in the biosciences needed to manage projects and provide significant added value beyond the technology development.

Director
Innovation Center for Biomedical Informatics at Georgetown University Medical Center

Contact Us

Address

1801 Research Blvd.
Suite 500
Rockville, MD 20850
Location

Phone

(855) ESAC INC
(855) 372 2462

Your Name *
Your Email *
Your Message
captcha Enter Security Code: