Meta-analysis in the Big-data era

October 7, 2015 Research No comments

Meta-analysis in the Big-data era

Meta-analysis plays an important role in summarizing and synthesizing scientific evidence derived from multiple studies. (In Greek, ‘meta’ pertains to ‘with, across, or after’, referring to a level above or beyond.) By combining multiple data sources, one can achieve higher statistical power, more accurate estimation, and greater reproducibility. To date, there are more than 97000 publications containing ‘meta-analysis’ in the PubMed database. The role of meta-analysis in biomedical research will only become bigger as we enter the big-data era.

Traditionally, meta-analysis can handle only a handful of covariates. Modern high-throughput genome technology, however, has generated an enormous amount of data with a large number of genomic features. (In statistics, the large number of features is often referred to as high-dimensions.) Under such a scenario, it is imperial to develop variable-selection methods for meta-analysis. (Here, variable-selection refers to the selection of the most important/predictive features out of the vast majority of noise features.) Incorporation of variable-selection into meta-analysis will improve model interpretation, reduce prediction errors, and provide better prioritization of genomic features for follow-up studies.

Existing variable selection methods require direct access to the raw data (i.e., patient-level data). Unfortunately, raw data are often unavailable because of high cost, logistical difficulties, time constraints, IRB restrictions, and other study policies. Summary statistics, instead, reduce the raw data to a much compressed level, and are much easier to be accessed and managed. Taking GWAS as an example, virtually all meta-analyses to date have been conducted at the summary-statistics level rather than the raw-data level. The emergence of big data, such as next-generation sequencing data, makes the collation of raw data even more challenging. A question naturally arises as to whether it is possible to conduct effective variable selection using only summary statistics.

In this article, we propose a new approach, Sparse Meta-Analysis (SMA), in which variable selection for meta-analysis is based solely on summary statistics. (Here, ‘sparse’ refers to the model post variable-selection.) Remarkably, we find that SMA is as efficient as using raw data if the correlation information of the summary statistics from each study is available. Hence, SMA skips the traditional bottleneck of data-aggregation, and provides an innovative tool for conducting high-dimensional meta-analysis. In addition, SMA can harness information shared by different studies while allowing heterogeneity among studies. In the era of big data, SMA will be extremely useful when it is impractical to collect or store all of the raw data (because SMA only needs summary statistics which are much more manageable). Thus, by utilizing summary statistics for variable selection, SMA can help people to conduct research that would be deemed impossible by other meta-analysis approaches.

Publication

Sparse meta-analysis with high-dimensional data.
He Q, Zhang HH, Avery CL, Lin DY
Biostatistics. 2015 Sep 21

Read offline:

	Boosting vaccine power: how injection site matters A location of the body surface where a vaccine is administered might be just as important as the vaccine itself. Our research explores how injecting a vaccine at a specific…
	Ferrate technology: an innovative solution for… Sewers might be out of sight, but they play a huge role in shaping the well-being of a society. They quietly carry away all the wastewater from our homes, businesses,…
	Is multiple sclerosis triggered by immunological… Multiple sclerosis (MS) is an autoimmune disease where immune cells (T cells) and antibodies progressively damage the myelin sheath surrounding nerve cells leading to their loss of function. We have…
	Sleep abnormalities in different clinical stages of… Sleep abnormalities are commonly observed in individuals with schizophrenia and related psychotic disorders. Research shows that over half of people with these conditions have trouble sleeping. Poor sleep also often…
	Does UV-B radiation modify gene expression? Frequently the harsh environmental conditions, such as, high temperatures, low freezing conditions, high levels of PAR and UV-B sun radiation induce remarkable adaptive reactions in plants. These responses suggest that…
	A compact high yield isotope enrichment system Highly enriched isotopes such as 10B, 98Mo, and 100Mo are in great demand for the manufacturing of radioactive isotopes for medical diagnostic imaging studies and therapeutic applications. In order to…

genomics, model, sparsity

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31