March 26, 2015
A Case Study from the Li-Fraumeni Exploration (LiFE) Consortium
The success of any multi-site collaborative research project involving high volume or “Big Data” requires an effective data aggregation and harmonization strategy built upon community consensus and rigorous project management. Data Coordination Centers (DCC) serve to lift the burden of implementing a data aggregation and harmonization protocol from the individual research institutions within a collaborative project. However, there is very little published guidance available to establish appropriate protocols for DCC development and success.
The application of DCC practices to large-scale, multi-institutional cohorts strengthen epidemiological studies by generating validated and standardized data about the relationship between various exposures (e.g. genetic, behavioral, environmental) and the disease or condition of interest. DCCs help to strengthen data quality, validity, and precision, in addition to allowing for more complex analyses to determine the associations between various exposures and diseases/conditions. Pooling data from multiple sites within a consortium through a DCC, especially for rare diseases or conditions, adds to the scientific rigor of resulting studies and provides tremendous value for retrospectively studying diseases with long induction or latent periods, such as cancer.
ESAC, Inc. has substantial experience in the development of cancer DCCs including the Clinical Proteomic Tumor Analysis Consortium (CPTAC) DCC, and the Breast and Colon Cancer Family Registries (BC-CFR), and has most recently applied successful strategies and lessons learned to the development of the Li-Fraumeni Exploration (LiFE) Data Coordination Center, funded by the National Cancer Institute. The LiFE DCC serves as a central data repository for researchers to pool and analyze their collective data gathered from patients afflicted with a condition known as Li-Fraumeni Syndrome (LFS) – a rare inherited disorder leading to a higher risk of developing multiple cancers throughout one’s lifetime.
Since October of 2013, ESAC has worked with seven Li-Fraumeni Exploration (LiFE) Consortium sites to develop a DCC focusing on the following challenges: collaboration and communications infrastructure development; data harmonization and management; data use, sharing, and access; database and web portal development and implementation; and other necessary software tools development. ESAC worked with participating LiFE sites in collecting and storing tools, protocols, designs, data dictionaries, and processes for collecting descriptive data and metadata from all participating LiFE Consortium sites.
Achieving the goal of large scale harmonization for all data collected across all participating sites required an extensive time commitment from the participating LiFE Consortium sites and the LiFE DCC developed by ESAC, Inc. working together through frequent meetings and communication to establish standards and consensus. The LiFE DCC staff worked with technical representatives from participating LiFE sites to retrospectively and prospectively harmonize all data variables into a single LiFE DCC data dictionary, which involved defining data fields and establishing the format for data entry into each field of the data dictionary. Once all data was submitted by each LiFE site, ESAC began populating a relational database (the framework developed using the approved LiFE DCC data dictionary) for data pertinent to the harmonized variables. Database development included the creation of modularized components as described by NCI and LiFE PIs, loading of data, quality control testing, tracking of data versions, and incorporation of new prospective data. The LiFE DCC, in coordination with the LiFE PIs, can generate tailored datasets for collaborative projects (by approved internal and external investigators) supported by the LiFE Consortium.
ESAC also provides training to LiFE consortium investigators, technical representatives, and NCI personnel to familiarize them with LiFE DCC’s tools and resources. A web-based query tool to explore summary data and establish feasibility of planned collaborative studies was instituted by LiFE DCC staff. The LiFE DCC also maintains a web portal to house and manage consortium-related public and semi-public information including publications, ongoing projects, tracking of data use requests, descriptive data tables, and related resources. This resource serves as a tool for LFS researchers to analyze the latest aggregate research data to understand disease mechanisms, which will ultimately enable scientists to target new therapeutic approaches to fight this rare disease. We hope the DCC proves to be an invaluable resource to the larger LFS research community, allows medical professionals and researchers to realize their goals of attaining a more complete understanding of LFS through this shared resource, and assists in conducting effective studies which can produce better treatments for individuals and families afflicted by the condition.