Microbiome sequencing data are known to be biased; the measured taxa relative abundances can be systematically distorted from their true values at every step in the experimental/analysis workflow. If this bias is not accounted for, it can lead to spurious discoveries and invalid conclusions. Unfortunately, in order to measure bias it is necessary to have samples for which the true relative abundances are known, such as model or mock community samples. In this chapter, we propose a log-linear model for the biases observed when analyzing model communities data. Our model expands the recent work from McLaren, Willis and Callahan (MWC) [eLife, 8:e46923, 2019] that proposed a multiplicative bias structure for microbiome data. Our extension of the MWC model is general enough to allow testing of complex hypotheses, and readily handles situations in which samples have different number of bacteria present by design. An F-test with permutation-based hypothesis testing is proposed to assess statistical significance. We conduct simulations to show the validity and the power of our method, and also demonstrate the utility of our method through an analysis of a complex model communities dat=aset that allows us to directly test the multiplicative bias assumption of the MWC model. An R package implementing the proposed work is publicly available at https://github.com/zhaoni153/MicroBias
Learning Objectives:
1. Understanding the multiplicative bias-generation procedure in microbiome sequencing.
2. Understanding how we can measure the sequencing bias as ratios of abundances.