Metagenomic analysis has been used extensively in recent years to study microbiota.
QINDAO (Quantitative INDex of Alpha diversity Overview) provides gene functions, bacterial features, and diversities from the result of the shotgun metagenomic or 16S ribosomal RNA gene analysis.
How to start?
QINDAO provides predicted orthologues (KEGG orthology) and pathways (KEGG pathway) based on files of ‘amplicon sequence variants’ and ‘type of 16S ribosomal RNA gene’ from the QIIME2 results. It uses the PICRUSt2 (Douglas et al. Nat. Biotechnol. 2020) algorithm internally.
Differential abundance analyses are performed using MaAsLin2 (Mallick et al. PLoS Comput. Biol. 2021) for the categorical or continuous variables in the metadata. QINDAO employs the total sum scaling method to normalize the explanatory variables in the model by dividing the lead and relative abundance of orthologues and pathways by the sum of their respective values. Those variables which showed more than 10% of the total sample were selected and log-transformed. After these pre-processing steps, QINDAO evaluates p-values, corrected p-values (q-values) using the Benjamini-Hochberg method, and regression coefficients for orthologues and pathways.
The example of resultComparison of each KEGG pathway After analysis by QINDAO, you will find the result as below.
Click the ‘Pathway’ tab (A), then click the ‘Table’ tab (B).
You will obtain a table as below.
Statistically significant pathways (p<0.05) are listed in order from the most differentially expressed gene. You can order by p-values and q-values (C). Keyword search is also available (D).
If you click the ‘Pathway’ button, you will see the differentially expressed genes in each pathway (E).
Click the ‘Pathway’ tab (F), then click the ‘Plot’ tab (G). You will obtain a table as below.
The regression coefficients in the KEGG pathways with p-values less than 0.05 will be shown. The y-axis indicates pathways, the x-axis indicates regression coefficients, and error bars indicate 95% confidence intervals.
QINDAO annotates bacterial species (NCBI Taxonomy IDs) and bacterial features (e.g., Gram staining, motility, oxygen demand, spore-forming capacity, morphology) using the Genome OnLine Database (Mukherjee et al. NAR. 2021). QINDAO uses an original script to convert the input taxonomy IDs into NCBI Taxonomy IDs and combine them with the metadata of bacterial characteristics.
Each bacterial Taxonomy ID was aggregated to the species-level taxon and annotated when there was a one-to-one relationship between the bacterial species and the bacterial characteristic. For example, Bifidobacterium longum was annotated as 'Gram-positive' because all known strains, including strains D2957 and X-95, are Gram-positive. On the other hand, HMW 616 is Gram-positive, but YCH46 is Gram-negative; therefore, the label of Bacteroides fragilis is 'mixed'. If the metadata of the bacterial species was not in the Genomes OnLine Database or was blank, 'Unknown' was assigned.
The example of result
Click the ‘Feature’ tab (H), then select the item of phenotype (I).
You will obtain a table as below.
α-diversity has two perspectives: 'richness' and 'evenness.' The richness perspective reflects how many different types of taxonomy or gene function are present in the sample, while the evenness perspective indicates how equally present. Based on the above perspectives, QINDAO calculates the following α-diversities.
The total number of amplicon sequence variant (ASV), KEGG orthology (KO), and Pathway types was defined as ASV Richness, KO Richness, and Pathway Richness, respectively.
Shannon indices (ASV Shannon, KO Shannon, and Pathway Shannon) as the diversities of evenness and richness were calculated as follows: \begin{align} H^\prime=\ -\sum_{i=1}^{S}{(p_i\log_2{p_i)}} \end{align} where pi is defined as each feature ratio, and S is defined as Richness.
Pielou evenness indices (ASV Pielou, KO Pielou, and Pathway Pielou) as the diversities of evenness were calculated as follows: \begin{align} J=\frac{H^\prime}{\log{S}} \end{align} where H' is defined as Shannon index, and S is defined as Richness.
In QINDAO, in addition to the conventional α-diversity as described above, the α-diversities of gene function is calculated from new perspectives. Our laboratory proposed a 'Pathway Connecting Index (PCI)' based on the connections of KEGG Pathways inferred from taxonomy on a metabolic pathway map and 'Potential Compounds' based on the substances involved in the KEGG pathways. These are defined as follows:
Unweighed PCI as the diversity of connections between KEGG Pathways (pathway) was calculated as follows: \begin{align} uPCI=\sum_{i=1}^{S1}\sum_{k=1}^{S2}1 \end{align} where S1 is defined as the type of pathway, and S2 is defined as the type of pathway that is connected to the i-th pathway.
Weighed PCI was calculated as follows: \begin{align} wPCI=\sum_{i=1}^{S1}\sum_{k=1}^{S2}\left(\frac{2p_ip_k}{p_i+p_k}\right) \end{align} where S1 is defined as the type of pathway, S2 is defined as the type of pathway that is connected to the i-th pathway, pi is defined as each feature ratio.
The number of compounds was defined as the number of potential compounds based on the information of compounds (KEGG compound) that exist as nodes of each KEGG pathway observed in the sample.
In summary, QINDAO calculates and evaluates the following α-diversities.
feature | perspective | α-diversity | |
---|---|---|---|
Taxonomy | ASV | richness | ASV Richness |
richness+evenness | ASV Shannon | ||
evenness | ASV Pielou | ||
Gene functions | KEGG orthology | richness | KO Richness |
richness+evenness | KO Shannon | ||
evenness | KO Pielou | ||
KEGG pathway | richness | Pathway Richness | |
richness+evenness | Pathway Shannon | ||
evenness | Pathway Pielou | ||
connection (weighted) | Weighted PCI | ||
connection (unweighted) | Unweighted PCI | ||
substrate | Potential Compounds |
The calculation of the difference in α-diversity is tested by both the Mann-Whitney U-test and the Welch's t-test when the metadata associated with each sample is a binary categorical variable. If the metadata is a categorical variable with three or more values, a one-way ANOVA test is performed.
The example of result
Click the 'α-Diversity' tab (J), then click the 'Plots' tab (K).
You will obtain a table as below.
If the metadata is a binary categorical variable, the effect size, Cohen's d, is calculated as the difference in α-diversity.
Click the 'α-Diversity' tab (L), then click the 'Effect sizes' tab (M).
You will obtain a table as below.
While the α-diversity analysis evaluates the diversity in a single sample, the β-diversity analysis evaluates the diversity among samples. Specifically, the similarity matrix is obtained by pairwise calculation of the similarity of components and compositions among all samples.
QINDAO employed Jaccard and Bray-Curtis as similarity coefficients. For example, the Jaccard between samples A and B is \begin{align} Jaccard(A,\ B)=\frac{|A\cap B|}{|A\cup B|} \end{align} where A and B are the elements contained in samples A and B, respectively. This gives the similarity based on the type of constituent elements between samples A and B.
Also, Bray-Curtis (BC) is \begin{align} {BC}_{AB}=\frac{\sum{|n_{Ai}-n_{Bi}|}}{\sum{(n_{Ai}+n_{Bi})}} \end{align} where n_Ai and n_Bi are the relative abundances of the i-th component of samples A and B, respectively. This provides a similarity between samples A and B based on the type of components and their respective proportions.
These similarity coefficients can be considered as non-Euclidean distances and can be dimensionally reduced by Principal Coordinate Analysis (PCoA). With the above method, Jaccard, Bray-Curtis calculations and principal coordinates analysis based on the relative abundance of each of the ASVs, orthologs and pathways are performed, allowing the similarity of each sample to be evaluated.
feature | β-diversity | |
Taxonomy | ASV | ASV Jaccard |
ASV Bray-Curtis | ||
Gene functions | KEGG orthology | KO Jaccard |
KO Bray-Curtis | ||
KEGG pathway | Pathway Jaccard | |
Pathway Bray-Curtis |
If the metadata are categorical variables, a nonparametric analysis of variance, permutational multivariate analysis of variance (PERMANOVA), is performed.
The example of result
Click the 'β-Diversity' tab (N), then click the 'Amplicon sequience variant', 'KEGG Orthology', or 'KEGG Pathway' tab (O).
You will obtain a table as below.