
Citation: | Chuanyin Dai, Fumin Lei. 2024: Mitogenomic variation in the Black-throated Tit (Aegithalos concinnus): Conserved structure, concerted evolution of duplicate control regions and multiple distinct evolutionary lineages. Avian Research, 15(1): 100210. DOI: 10.1016/j.avrs.2024.100210 |
The mitochondrial genome is a prominent research topic due to its indispensable role in organisms and its application in many research disciplines. However, few studies have investigated intraspecies mitogenomic variation. In this study, 69 mitogenomes of the Black-throated Tit (Aegithalos concinnus) were assembled and annotated from a large number of short reads generated using high-throughput sequencing technology. Comparative analyses revealed that mitogenomic characteristics such as length, gene and nucleotide composition, codon usage, and duplicated control regions were relatively conserved despite substantial intraspecies morphological changes. Yet, all the individuals from the subspecies A. c. iredalei had one more nucleotide in the 12S rRNA than the other studied subspecies. Phylogenetic analyses showed five distinct lineages based on the complete mitogenomes and the 13 combined protein-coding genes, whereas only four lineages were observed when using the duplicate control regions. Most interestingly, each lineage had both copies of the control regions of the comprising individuals, indicating that the paralogous control regions were more similar than the orthologous sequences from the distinct lineages. This suggested the control regions had undergone concerted evolution. The Black-throated Tit has complex evolutionary history and needs further investigating the taxonomic status of these lineages, as well as the underlying evolutionary processes. Our findings call for more research on intraspecies mitogenomic variation.
The mitochondrion, an organelle in eukaryotic cells that is thought to have originated from an endosymbiotic event, has its own genome (van Valen and Maiorana, 1980; Roger et al., 2017). In contrast to nuclear genomes, which tend to expand in size, including through an increase in the number of introns, proliferation of transposons, enlargement of intergenic regions, and duplication of genes (Lynch, 2006), the number of genes in the mitochondrion has reduced significantly along with the evolution of diverse eukaryotic lineages. Although the original ancestral bacteria of the mitochondrion had approximately 3000 genes, the mitochondrial genome now has only 37 genes in most animals, including 13 protein-coding genes, 2 rRNAs, and 22 tRNAs (Hill, 2015), as most of its original genes have been transferred into the nuclear genome.
The mitochondrion performs a wide suite of functions in eukaryotic cells (Boore, 1999; Detmer and Chan, 2007; Chong and Mueller, 2013; Ballard and Pichaud, 2014; Roger et al., 2017). However, those functions can only be achieved with the intimate and synchronized interactions between the products of the mitochondrial and nuclear genomes (Rand et al., 2004; Wolff et al., 2014) due to its unique and fascinating evolutionary history. For example, the mitochondrion is well-known as a power house that generates approximately 95% of total adenosine triphosphate (ATP) through the process of oxidative phosphorylation by the electron transport system (ETS) (Das, 2006), which is formed by five multi-subunit complexes embedded in the inner mitochondrial membrane (Saraste, 1999). The 13 mitochondrial genes encode proteins for 4 subunits of the ETS complexes (I, III, IV, and V), while more than 75 nuclear genes exclusively encode all proteins of ETS complex II and co-encode the proteins for the other 4 complexes (Burton and Barreto, 2012; Hill, 2017). There are also about 105 nuclear genes encoding non-protein products that co-function in the transcriptional, translational, and DNA replication mechanisms that enable the production of ETS proteins (Burton and Barreto, 2012). Consequently, a change in a mitogenome can trigger a cascading effect of malfunctions if the nuclear genes do not change accordingly (Hill, 2017). As such, the mitogenome is evolutionarily conserved in terms of both the order and the number of genes (Françoso et al., 2023), being a circular double-stranded molecule generally containing no introns and large intergenic spacers in animals (Wolstenholme, 1992).
However, exceptions to this exist, as gene duplication and rearrangement have been observed in various animal lineages (Macey et al., 1997; Mindell et al., 1998; Mueller and Boore, 2005; Gissi et al., 2010; Kong et al., 2020; Zardoya, 2020). For example, besides the 37 genes, the typical animal mitogenome usually contains one large non-coding control region involved in regulating and initiating mitochondrial replication and transcription, but recent studies have found duplicated control regions—as well as even more duplication with its adjacent genes—in many birds (Mackiewicz et al., 2019; Urantówka et al., 2021). The changes are more substantial in invertebrates; for example, a recent study found a whole-mitogenome duplication in two Australian Stingless Bees (Françoso et al., 2023).
Mitochondrial DNA (mtDNA) has a long history of use in studies of molecular phylogenetics, phylogeography, and species identification (e.g., DNA barcodes) (Hebert et al., 2003; Zink and Barrowclough, 2008). However, studies solely based on mitochondrial DNA have been criticized due to several well-known limitations of mtDNA, such as evidence of pervasive natural selection and the inability of maternal inheritance to capture the whole evolutionary histories of the relevant species (Edwards et al., 2005; Edwards and Bensch, 2009). Yet, in recent years, the mitonuclear compatibility species concept (Hill, 2017), which defines a species with a set of coadapted mtDNA and nuclear genes participating in ETS, argued that the mitochondrial genotype was the most efficient and best biomarker for diagnosing species (Hill, 2016, 2017, 2019). This is particularly suitable for avian species, since this evolutionary lineage has the greatest need for efficient ATP production due to its highly energy-demanding life strategies, such as higher body temperature and the movement pattern in flying (Hill, 2015, 2017). On the other hand, gene order and rearrangement can also provide phylogenetic insights, especially at higher taxonomic levels, as recent studies have indicated that these changes are likely lineage-specific (Schirtzinger et al., 2012). For example, the gene order in avian species is thought to have originated from the typical vertebrate gene order by a single duplication of a fragment ranging from nd5 to tRNA-Phe genes, and then random losses of one copy of the duplicated genes and non-coding regions in different avian lineages (Schirtzinger et al., 2012). This has resulted in the different gene composition and order appearing in distinct avian orders, families or genera (Mackiewicz et al., 2019; Urantówka et al., 2021), although an alternative opinion has been suggested (Mindell et al., 1998).
The Black-throated Tit (Aegithalos concinnus) is a small and abundant Long-tailed Tit inhabiting various vegetation types, such as pine forest, shrubs, and open broadleaf forest at mid-range altitudes (600–3200 m). It is distributed along the Himalayan foothills in Pakistan, Nepal, Bhutan, India, and Myanmar, and over much of eastern China, including Taiwan island, as well as along mountain ridges in tropical Indochina, such as in Thailand, Laos, and Vietnam (Fig. 1). An isolated population inhabits the neighboring region in southern Laos, central Vietnam, and Cambodia (Fig. 1). Although all A. concinnus share a similar pattern and arrangement of plumage, at least six subspecies (A. c. iredalei, A. c. manipurensis, A. c. talifuensis, A. c. pulchellus, A. c. concinnus, and A. c. annamensis) can be distinguished via their morphological traits, such as color differences in the crown, belly, and flank, and the color and width of the postocular lines. Moreover, analyses have found that genetic divergence is remarkable not only among subspecies, but also within some subspecies when evaluated with mtDNA (Päckert et al., 2010; Dai et al., 2011, 2017). Most interestingly, the Pygmy Tit (Psaltria exilis) was found to be more closely related than a lineage predominately comprising the nominate subspecies to the other subspecies of the Black-throated Tit (Johansson et al., 2016).
Recently, complete mitogenomes have been reported for one bird each from the subspecies A. c. talifuensis and A. c. concinnus, in which duplications were observed; that is, two control regions existed within each bird (Wang et al., 2015a). The two paralogous control regions are not equal in length, and the majority of sequences are similar and thought to be under concerted evolution. Concerted evolution is defined as the situation in which paralogous genes or sequences within a species are more similar than the orthologous ones between species (Liao, 1999; Ogoh and Ohmiya, 2007; Morris-Pocock et al., 2010; Li et al., 2015a; Shi et al., 2020). The duplicated control regions have been regarded to increase the copy number and expression of mitochondrial genes (Kumazawa et al., 1996) and therefore to meet energy requirements (Eberhard and Wright, 2016). However, as only two individuals were sequenced in that report (Wang et al., 2015a), whether the observed pattern occurred at individual, population, or species level is unclear since the control region has been recorded as the most variable area in the avian mitogenome (Sammler et al., 2013). For example, the pattern of tandem repeats varies sharply in the control region of the Yellow-brown Tit (Sylviparus modestus) (Wang et al., 2015b). Moreover, as has been mentioned before, the genetic divergence between and even within some subspecies (such as in A. c. talifuensis and A. c. iredalei) is notable, and occurs to an extent comparable with that between independent species (Päckert et al., 2010). This further raises the very interesting question of whether the observed gene rearrangement is the case for the other unanalyzed A. concinnus subspecies, and whether the pattern of concerted evolution proposed for the duplicated control regions is applicable with respect to the multiple deeply split lineages.
Thanks to the unremitting efforts of software developers, it is now possible to obtain complete mitogenomes from the high-throughput short reads generated by next-generation sequencing technology. In this study, we assembled complete mitogenome of the Black-throated Tit from short reads available in GenBank, and then carried out a comparative and phylogenetic analysis to elucidate this species' mitogenomic characteristics in a phylogeographic context. As such, the specific aims of this study were as follows: 1) to detect the intraspecific mitogenomic variation in view of the sharp divergence both in genetics and morphology within the species; 2) to elucidate whether the divergence revealed by a partial mitochondrial sequence can be confirmed by the complete mitogenome; 3) to observe whether the duplicated control regions are indeed evolved in concerted evolution in terms of the different subspecies or divergent lineages.
The short reads generated by next-generation sequencing technology were downloaded from GenBank, with a total of 70 individuals represented. These samples covered three subspecies distributed in mainland China, of which seven individuals were A. c. iredalei, 30 A. c. talifuensis, and 33 A. c. concinnus. The sampling locations are shown in Fig. 1. The short reads of two individuals of Aegithalos iouschistos were also downloaded and used as the outgroups in the phylogenetic analyses. The accession numbers were SRR10467124 to SRR10467170, SRR15193483 to SRR15193500, and SRR15193502 to SRR15193509.
The program GetOrganelle 1.7.6 (Jin et al., 2020) was used to assemble the mitogenomes. The assembling command used was the program developers' recommended recipe, which was slightly modified in some cases to use the whole available short reads for assembly. Only one sample from A. c. iredalei failed to be assembled as a completed circular sequence, and it was therefore excluded from subsequent analyses. The assembled mitogenomes were annotated with MitoZ 3.6 (Meng et al., 2019).
The nucleotide composition of each mitogenomic sequence was calculated, while nucleotide composition bias was estimated using the following formulae: AT-skew = [A – T]/[A + T] and GC-skew = [G – C]/[G + C]. The 13 protein-coding genes were extracted and then the codon usage, relative synonymous codon usage (RSCU), and nucleotide composition for the 1st, 2nd, 3rd, and all three codon positions for the combined sequences were calculated. The rRNAs, tRNAs, and control regions were also extracted, and their nucleotide compositions and nucleotide composition biases were calculated. All these calculations were carried out using the PhyloSuite v1.2.3 program (Zhang et al., 2020; Xiang et al., 2023).
Three datasets were used for phylogenetic analyses: the complete mitogenomic sequences, the merged sequence of 13 protein-coding genes, and the sequences of the duplicated control regions. For the former two datasets, outgroups were used. All the datasets were aligned using the MAFFT (Rozewicki et al., 2019) plugin with the L-INS-I algorithm, as implemented in PhyloSuite (Zhang et al., 2020; Xiang et al., 2023). Phylogenetic trees were constructed using both maximum likelihood (ML) and Bayesian inference (BI). Maximum likelihood phylogenies were inferred using IQ-TREE (Minh et al., 2020) under the partition schemes, while the best-fit models were determined by ModelFinder (Kalyaanamoorthy et al., 2017) using the corrected Akaike information criterion (AICc). Node support was evaluated using 1000 ultrafast bootstrap replicates. Bayesian inference phylogenies were inferred using MrBayes (Ling et al., 2015) with two independent runs, each with four Markov chain Monte Carlo chains (one cold) for 5,000,000 generations, with sampling every 1000 generations. The first 25% of the runs were discarded as burn-in prior to estimating posterior probabilities for branch support. The best-fit partitioning schemes and substitution models were determined using PartitionFinder (Lanfear et al., 2012) via the "greedy" algorithm, with branch lengths estimated as "linked, " and AICc. Convergence was assessed as the standard deviation of split frequencies < 0.01. All these analyses were carried out in PhyloSuite with the corresponding plugins.
The alignable sequences of the duplicated control regions were extracted with the Gblock (Castresana, 2000) plugin, with the default settings implemented in PhyloSuite (Zhang et al., 2020; Xiang et al., 2023). Then the phylogenetic trees were constructed by ML and BI as mentioned above for the other two datasets. Phylogenetic analyses were also carried out using a dataset in which a few nucleotides unique to each copy were further trimmed both at the beginning and end of the alignable sequence, with a length of 1048 bp.
Kimura 2-parameter distances among mitochondrial lineages defined by phylogenetic analyses were computed based on the complete mitogenome and the combined sequences of the 13 protein-coding genes in MEGA X (Kumar et al., 2018).
The lengths of the complete mitogenomic sequences ranged from 17,939 to 17,941 bp (Table 1 and Appendix A), without subspecies-specific clustering in length. The mitogenome structures for all the individuals were identical and comprised 37 genes, including 13 protein-coding genes, 2 rRNA genes, 22 tRNA genes, and 2 control regions (Fig. 2). Most genes, including 12 protein-coding genes, 14 tRNAs, and 2 rRNAs, as well as the 2 control regions, were on the H-strand, while one protein-coding gene (nd6) and 8 tRNAs (trnA, trnC, trnE, trnN, trnP, trnQ, trnS2, and trnY) were encoded on the L-strand (Fig. 2). The proportions of nucleotide composition were as follows: A = 29.56 ± 0.05% (range: 29.5–29.6%), T = 24.95 ± 0.15% (range: 24.8–25.2%), G = 14.38 ± 0.04% (range: 14.3–14.4%), C = 31.12 ± 0.14% (range: 30.8–31.2%) (Table 1). The AT content (mean: 54.51 ± 0.12%, range: 54.3–54.9%) was substantially higher than that of GC (45.5 ± 0.13%, range: 45.1–45.6%). The overall AT skew and GC skew ranged from 0.078 to 0.089, with a mean of 0.085, and from −0.371 to −0.362, with a mean of −0.368, respectively (Table 1).
Length | A% | T% | G% | C% | AT% | GC% | AT-skew | GC-skew | |
mDNA | 17,939–17,941 | 29.56 ± 0.05 | 24.95 ± 0.15 | 14.38 ± 0.04 | 31.12 ± 0.14 | 54.51 ± 0.12 | 45.5 ± 0.13 | 0.085 ± 0.003 | −0.368 ± 0.002 |
PCGs | 11,394 | 28.16 ± 0.07 | 25.60 ± 0.20 | 14.05 ± 0.06 | 32.20 ± 0.19 | 53.75 ± 0.14 | 46.25 ± 0.15 | 0.047 ± 0.004 | −0.392 ± 0.005 |
tRNA | 1545 | 29.68 ± 0.20 | 28.20 ± 0.04 | 21.60 ± 0.18 | 20.56 ± 0.07 | 57.88 ± 0.20 | 42.16 ± 0.17 | 0.026 ± 0.004 | 0.025 ± 0.005 |
rRNA | 2580–2581 | 32.89 ± 0.03 | 21.43 ± 0.06 | 20.68 ± 0.04 | 25.03 ± 0.06 | 54.32 ± 0.06 | 45.71 ± 0.07 | 0.211 ± 0.001 | −0.096 ± 0.001 |
CR1 | 1157–1158 | 24.34 ± 0.25 | 31.38 ± 0.15 | 14.08 ± 0.31 | 30.21 ± 0.22 | 55.71 ± 0.30 | 44.29 ± 0.30 | −0.126 ± 0.005 | −0.364 ± 0.011 |
CR2 | 1205–1207 | 25.67 ± 0.25 | 31.04 ± 0.17 | 13.34 ± 0.21 | 29.96 ± 0.25 | 56.71 ± 0.33 | 43.29 ± 0.33 | −0.095 ± 0.005 | −0.384 ± 0.006 |
The total length of protein-coding genes for each individual was identical, at 11,394 bp (Table 1). All protein-coding genes initiated strictly with ATG as the start codon, except for nd3, which initiated with an ATA codon. However, stop codons varied among genes, with TAA being the most prevalent. Seven genes (cox2, atp6, atp8, nd2, nd3, cytb, and nd4L) ended with TAA, whereas three genes (cox1, nd1, and nd6) ended with AGG, and nd5 ended with AGA. The incomplete stop TA was present in cox3, while T appeared in nd4. The genes atp8 and atp6 shared 10 nucleotides, while nd4L and nd4 overlapped by 7 nucleotides. The total number of codons in the protein-coding genes was 3787. Codons encoding Cys were the rarest (0.77 ± 0.004%), while those encoding Leu1 occurred most frequently (15.95 ± 0.09%) (Appendix B). Overall, the five most abundant codons were CUA (310.46 ± 3.25), AUC (201 ± 3.38), UUC (162.88 ± 5.2), CUC (155.81 ± 4.54), and GCC (142.75 ± 1.06), while CUA had the highest RSCU value (2.81 ± 0.03), followed by CGA (2.17 ± 0.04) and GAA (1.89 ± 0.04). The proportions of nucleotide composition of the combined protein-coding gene sequence were 28.16 ± 0.07% for A, 25.6 ± 0.2% for T, 14.05 ± 0.06% for G, and 32.2 ± 0.19% for C. The AT content (53.75 ± 0.14%) was higher than that of GC (46.25 ± 0.15%) (Table 1). AT skew and GC skew were 0.047 ± 0.004 and −0.392 ± 0.005, respectively (Table 1). The proportions of nucleotide composition for the 1st codon position were 27.900 ± 0.030% for A, 19.969 ± 0.110% for T, 23.184 ± 0.037% for G, and 28.954 ± 0.099% for C, whereas AT content, AT skew, and GC skew were 47.869 ± 0.127%, 0.165 ± 0.002, and −0.111 ± 0.002, respectively. The proportions of nucleotide composition for the 2nd codon position were 18.566 ± 0.057% for A, 40.112 ± 0.033% for T, 12.833 ± 0.056% for G, and 28.490 ± 0.031% for C, whereas AT content, AT skew, and GC skew were 58.678 ± 0.067%, −0.368 ± 0.001, and −0.377 ± 0.001, respectively. The proportions of nucleotide composition for the 3rd codon position were 38.013 ± 0.167% for A, 16.667 ± 0.496% for T, 6.118 ± 0.119% for G, and 39.201 ± 0.436% for C, whereas AT content, AT skew, and GC skew were 54.681 ± 0.355%, 0.391 ± 0.014, and −0.730 ± 0.007, respectively.
The 12S rRNA gene was 979 bp long for most individuals, with an average AT content of 51.63 ± 0.10% (Table 1). However, in all six individuals from the subspecies A. c. iredalei, it was 980 bp long. The 16S rRNA gene was 1601 bp in length for all samples, with an AT content of 55.91 ± 0.10% (Table 1). The total length of tRNAs was 1545 bp, with each tRNA gene ranging in size from 66 bp (trnS1) to 75 bp (trnL2). The AT content of the 22 tRNAs was 57.88 ± 0.20% (Table 1). All the tRNA genes showed the expected cloverleaf secondary structures with normal base pairing, including trnS1, which does not have the entire dihydrouridine arm in most metazoan mitogenomes.
Two larger non-coding control regions (CR1 and CR2) were located between trnE and trnF, of 1157–1158 bp in length (CR1), and between trnT and trnP, of 1205–1207 bp (CR2) in length (Table 1). The base composition of CR1 was 24.34 ± 0.25% for A, 31.38 ± 0.15% for T, 14.08 ± 0.31% for G, and 30.21 ± 0.22% for C (Table 1). The AT content (55.71 ± 0.30%) was higher than the GC content (44.29 ± 0.30%) in CR1. The base composition of CR2 was 25.67 ± 0.25% for A, 31.04 ± 0.17% for T, 13.34 ± 0.21% for G, and 29.96 ± 0.25% for C, and the AT content (56.71 ± 0.33%) was higher than GC content (43.29 ± 0.33%) (Table 1).
A congruent phylogenetic tree with five distinct lineages was reconstructed within the sampled Black-throated Tit data based on the complete mitogenomes and the combined sequences of 13 protein-coding genes, as well as by using the different BI and ML methods (Fig. 3). The supporting values of the divergent nodes were very high. The birds formed two major clades, in which one clade was composed of the individuals from the nominate subspecies and a few from A. c. talifuensis, and the other clade consisted of the majority of A. c. talifuensis and all the A. c. iredalei. The former clade further split into two lineages, of which one lineage consisted exclusively of the broadly distributed nominate A. c. concinnus (lineage A) while the other was composed of individuals of both subspecies, but with a rather limited distribution (lineage B). The clade consisting of A. c. talifuensis and A. c. iredalei further diverged into three lineages, in which the A. c. talifuensis (lineage C) is sister lineage to the eastern A. c. iredalei (lineage D), whereas the western A. c. iredalei (lineage E) occupied the basal position of the clade and was sister to the group comprising A. c. talifuensis and the eastern A. c. iredalei.
The BI phylogenetic tree constructed by the alignable duplicated CR sequences was slightly different from the above-mentioned trees. It also had two clades, but split into only four lineages (Fig. 4C). The clade consisting of all the individuals from the nominate subspecies and a few from A. c. talifuensis (lineages A and B) was collapsed into one lineage. The other clade had the same three lineages, in which the comprising individuals and the relationships among the lineages were consistent with those identified using the other two datasets. In contrast, although ML inference also identified the same four lineages, the relationships among those lineages had substantially changed in the ML phylogeny. For example, one lineage from the subspecies A. c. iredalei (lineage D) was sister to the lineage consisting of A. c. talifuensis (lineage C), and their sister group was the lineage composed of the individuals from the nominate subspecies and a few from A. c. talifuensis (the collapsed lineage A and B), whereas another lineage from the subspecies A. c. iredalei (lineage E) was located at the basal position (Fig. 4A). Although the supporting values for each lineage were very high, the relationships among these lineages were not well supported in both phylogenetic trees. Within each lineage, orthologous control regions usually clustered as separate groups. Interestingly, when using the further-trimmed alignment with 1048 bp, most of the paralogous control regions of the same individuals were clustered as a sister pair (Fig. 4B and D).
The genetic distances among the lineages identified in the phylogenetic analyses ranged from 0.0046 to 0.0453 based on the complete mitogenomes. However, only one pair had a value of 0.0046, whereas the others had values higher than 0.037. The values of genetic distance were slightly higher when computed with the 13 protein-coding gene sequences, which ranged from 0.0056 to 0.0595. Similarly, only one pair had a value of 0.0056, and the other pairs had values higher than 0.04. Details are listed in Table 2.
Lineage | A | B | C | D | E |
A | 0.0056 | 0.0548 | 0.0588 | 0.0585 | |
B | 0.0046 | 0.0541 | 0.0584 | 0.0575 | |
C | 0.0409 | 0.0408 | 0.0495 | 0.0595 | |
D | 0.0453 | 0.0452 | 0.0378 | 0.0588 | |
E | 0.0451 | 0.0442 | 0.0451 | 0.0461 | |
Upper right are the distances computed based on 13 protein-coding gene sequences, whereas the lower left are the distances computed based on complete mitogenome sequences. |
Currently, the mitogenome is a popular research topic due to its key functional role in organisms and its application in many research disciplines (Li et al., 2015b; Sun et al., 2020a, 2020b). However, the majority of studies have reported the mitogenome structure, and its associated phylogenetic implications of one or only a few species (Li et al., 2015b; Shi et al., 2020, 2023; Sun et al., 2020a, 2020b; Yang et al., 2023). While a few studies have indeed investigated the mitogenomic variation within a species, they generally paid attention to the genetic divergence and its applications within the species but not to the mitogenome itself (Johnsen et al., 2017; Fields et al., 2018; Lait et al., 2018; Lombardo et al., 2022; Lubbe et al., 2022). This study represents the first to investigate the differences in mitogenomic characteristics within a species characterized by morphologically distinct composing members, and found two interesting outcomes.
Firstly, the total length of the mitogenome of the Black-throated Tit is relatively conserved, with only a 3-bp variation, as are other characteristics such as nucleotide composition, AT content, and codon usages. The length of the previously reported mitogenomes compiled via Sanger sequencing technology fall into the length range of this study (Wang et al., 2015a), indicating that the mitogenomes assembled from the short reads were accurate. This is understandable, as the avian mitogenome is selectively conserved and compact, meaning that larger changes may occur only in the non-coding region. In the Black-throated Tit, the non-coding regions do not show much length variation. Although the differences between the two paralogous control regions ranged from 50 to 53 bp within an individual, the differences between the orthologous control regions were minor between individuals. This differs from the findings of a previous study that revealed extreme variation in the length of the orthologous control regions within species; for example, the length of the control region varied from 1374 bp to 1609 bp in the Yellow-browed Tit (Wang et al., 2015b). However, the Yellow-browed Tit has one control region, while the Black-throated Tit has duplicated control regions. Due to few complete intraspecies mitogenomes being available at present, whether this is the underlying reason for the length change of the control region is unclear and could be tested in future.
Secondly, although almost all the mitochondrial genes were of identical length among the sampled individuals, interestingly, six individuals exhibited a difference in length at the 12S rRNA gene. All these six birds were from the subspecies A. c. iredalei, with a 980-bp 12S rRNA gene, while the other individuals had lengths of 979 bp. This additional site was not located at the same position, but at two different locations. This precludes the possibility of this difference arising from sequencing or assembling errors. As such, this result indicates that a change in gene length can occur within a species. Such change has never been reported in Aves before, which highlights the necessity of conducting comparative mitogenomic analyses within species, especially for those exhibiting notable changes in morphology.
While many avian species have one control region, two control regions have been observed in the Black-throated Tit. The control region contains the initiation site of mitogenome replication and transcription (Shadel and Clayton, 1997), and the maintenance of two control regions is most likely a result of the functionality of both regions. Otherwise, the non-functional copy would degenerate and eventually be lost due to the selectively compact nature of the avian mitogenome to maintain optimal coding efficiency and replication rate. An additional control region could be one more location for the initiation of transcription and replication, which could increase both the copy number and expression of mitochondrial genes, making mitogenomes with duplicated control regions selectively advantageous (Kumazawa et al., 1996). Mitogenomes with two control regions would begin replicating at a faster rate than those with only one control region if the replication of the mitogenome were to start simultaneously at both control regions (Shao et al., 2005). Moreover, two control regions have been regarded to provide efficient and simultaneous replication and transcription from multiple loci to meet energy requirements (Eberhard and Wright, 2016).
The Black-throated Tit is a small bird weighing 4.8–8.0 g (Li et al., 1982), and moves very actively through vegetation. During the wintering periods, it usually flocks in a foraging group of medium-to-large size (approximately 8–20 individuals or even more) that searches for food on trees one by one, spanning a large region with each search (Personal observations). In the breeding season, on the other hand, feeding activity is extremely energy-intensive due to a much larger clutch size, generally ranging from five to eight (usually seven) eggs (Li et al., 2012; Lv et al., 2023). It has been reported that the number of nestling-feeding activities performed by a breeding pair was as high as 303 in one diurnal day (Li et al., 1982). In addition, except for the nominate subspecies, the other Black-throated Tit subspecies almost exclusively live in mountainous regions at medium-to-high altitudes. This may imply a high energy requirement for maintaining the species' survival and life activities. Therefore, two functional control regions would be selectively favored in this bird.
Previous studies have indicated that the two functional copies of control regions can undergo concerted or independent evolution in animals (Li et al., 2015a; Liao, 1999; Ogoh and Ohmiya, 2007; Tatarenkov and Avise, 2007), where concerted evolution is that where the sequence similarity between the paralogous copies within species is much smaller than that between the orthologous copies from different species (Morris-Pocock et al., 2010; Akiyama et al., 2017). A previous study found that the relationship between paralogous and orthologous control regions was dependent on the nucleotide portion used in the Black-throated Tit (Wang et al., 2015a). If using all the alignable nucleotides, the orthologous and paralogous copies were clustered respectively, while the paralogous copies of an individual clustered when a few variable nucleotides at the beginning and end of the alignment were excluded (Wang et al., 2015a). Therefore, the authors suggested that some portion of the control regions evolved via independent evolution, while others had undergone concerted evolution (Wang et al., 2015a).
In the present study, the phylogeny based on the alignable control regions had four highly supported lineages. Each lineage contained both copies of the control regions for the individuals involved. Such a phylogeny indicates the paralogous control regions are more similar than the orthologous sequences from distinct lineages, highlighting that the duplicated control regions evolved in concert. The pattern became more evident in the phylogeny constructed after a few nucleotides unique to each copy were excluded both at the beginning and end of the alignment, in which the paralogous sequences of an individual were closely related and clustered together in most of the birds. This finding is slightly different from that of the previously mentioned study (Wang et al., 2015a), highlighting the importance of including more individuals within a species in the analysis.
The use of the complete mitogenome and the combined protein-coding sequences revealed an identical phylogenetic constitution, with five highly supported lineages within the sampled Black-throated Tits in spite of only three subspecies being analyzed. Such a notable genetic divergence in this species has been reported in previous studies. For example, the uncorrected p-distance was found to range from 4.3% between the two lineages within the subspecies A. c. iredalei to 5.9% between the subspecies A. c. manipurensis and A. c. concinnus based on a partial sequence of the cytb gene (Päckert et al., 2010), whereas genetic distance of as high as 4.8% was reported within the subspecies A. c. talifuensis based on three mitochondrial protein-coding genes (Dai et al., 2011). The current study therefore confirms these previous findings, which could indicate the coexistence of several distinct evolutionary lineages and a complex demographical history for the Black-throated Tit. However, it is worth noting that the phylogenetic analyses using the duplicated control regions showed only four lineages. The two lineages identified with multiple genes or the complete mitogenome collapsed into one lineage because of the shallow divergence, with a genetic distance of no more than 1% (Dai et al., 2011).
Within this context, the taxonomic status of these deeply divergent lineages and their underlying evolutionary processes merit further investigation. Unfortunately, a phylogenetic analysis involving all the subspecies is absent due to the difficulty in obtaining high-quality samples, as some are distributed in politically unstable Southeast Asian countries. Yet interesting findings have been reported in recent years. For example, a multilocus phylogenetic study showed that the Black-throated Tit has diverged into two major evolutionary groups, in which the Pygmy Tit was nested in and closely related to one group of this species (Johansson et al., 2016). However, a later multilocus genetic analysis revealed the absence of reproductive isolation between the two non-sister and deeply diverged lineages, since individuals sampled at the contact zone were almost identified as F2 hybrids (Dai et al., 2017).
Although the mitonuclear compatibility species concept strongly suggested that mitochondrial genotype could be used to diagnose species (Hill, 2015, 2016, 2017, 2019), the actual situation should be elucidated or further confirmed by using more data, such as nuclear genome data and samples covering all subspecies and a wide distribution. In addition, the random breeding between the two non-sister and deeply diverged lineages merits further investigation, as it may represent a case of historical mitogenome introgression from an extinct species (Zhang et al., 2019). Moreover, this lineage pair is also a good system for testing the hypothesis of mitonuclear compatibility; for example, by answering whether differences exist in the nuclear genomes, and whether the functions of these divergent nuclear regions are linked to the mitochondrion (Morales et al., 2018).
This study represents the first to investigate the differences in mitogenomic characteristics within a species characterized by morphologically distinct composing members and notable genetic divergence. It has been observed that mitogenomic characteristics such as length, gene and nucleotide composition, codon usage, and duplicated control regions were relatively conserved. However, the length of 12S rRNA exhibited subspecies-specific changes. Both the complete mitogenomes and the 13 combined protein-coding genes showed 5 distinct lineages, indicating its complex evolutionary history and the need for further investigating the taxonomic status of these lineages, as well as the underlying evolutionary processes. The duplicated control regions may be selectively favored to meet the high energy requirements of this species and evolve in concert. These peculiar findings call for more research on intraspecies mitogenomic variations.
Chuanyin Dai: Writing – original draft, Software, Methodology, Investigation, Funding acquisition, Formal analysis, Data curation, Conceptualization. Fumin Lei: Writing – review & editing, Supervision.
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
We are grateful to two anonymous reviewers for their valuable comments, which have significantly improved this manuscript.
Supplementary data to this article can be found online at https://doi.org/10.1016/j.avrs.2024.100210.
Abel Souriau, Jorma Sorjonen, Adam Petrusek, Tereza Petrusková. 2025: Local song evolution after three decades in a complex songster, the Thrush Nightingale. Avian Research, 16(1): 100224. DOI: 10.1016/j.avrs.2025.100224 | |
Qi Lu, Pengcheng Wang, Jiang Chang, De Chen, Shenghan Gao, Jacob Höglund, Zhengwang Zhang. 2024: Population genomic data reveal low genetic diversity, divergence and local adaptation among threatened Reeves’s Pheasant (Syrmaticus reevesii). Avian Research, 15(1): 100156. DOI: 10.1016/j.avrs.2023.100156 | |
Wenqing Zang, Zhiyong Jiang, Per G.P. Ericson, Gang Song, Sergei V. Drovetski, Takema Saitoh, Fumin Lei, Yanhua Qu. 2023: Evolutionary relationships of mitogenomes in a recently radiated Old World avian family. Avian Research, 14(1): 100097. DOI: 10.1016/j.avrs.2023.100097 | |
Per G.P. Ericson, Martin Irestedt. 2022: Comparative population genomics reveals glacial cycles to drive diversifications in tropical montane birds (Aves, Timaliidae). Avian Research, 13(1): 100063. DOI: 10.1016/j.avrs.2022.100063 | |
Prashant Ghimire, Nishma Dahal, Ajit K. Karna, Surendra Karki, Sangeet Lamichhaney. 2021: Exploring potentialities of avian genomic research in Nepalese Himalayas. Avian Research, 12(1): 57. DOI: 10.1186/s40657-021-00290-5 | |
George P. Tiley, Akanksha Pandey, Rebecca T. Kimball, Edward L. Braun, J. Gordon Burleigh. 2020: Whole genome phylogeny of Gallus: introgression and data-type effects. Avian Research, 11(1): 7. DOI: 10.1186/s40657-020-00194-w | |
Chuang Zhou, Shuai Zheng, Xue Jiang, Wei Liang, Megan Price, Zhenxin Fan, Yang Meng, Bisong Yue. 2018: First complete genome sequence in Arborophila and comparative genomics reveals the evolutionary adaptation of Hainan Partridge (Arborophila ardens). Avian Research, 9(1): 45. DOI: 10.1186/s40657-018-0136-3 | |
Jente Ottenburghs, Robert H. S. Kraus, Pim van Hooft, Sipke E. van Wieren, Ronald C. Ydenberg, Herbert H. T. Prins. 2017: Avian introgression in the genomic era. Avian Research, 8(1): 30. DOI: 10.1186/s40657-017-0088-z | |
Keren R. Sadanandan, David J. X. Tan, Kolbjørn Schjølberg, Philip D. Round, Frank E. Rheindt. 2015: DNA reveals long-distance partial migratory behavior in a cryptic owl lineage. Avian Research, 6(1): 25. DOI: 10.1186/s40657-015-0035-9 | |
Lina Wu, Yanfeng Sun, Juyong Li, Yaqing Li, Yuefeng Wu, Dongming Li. 2015: A phylogeny of the Passerida (Aves: Passeriformes) based on mitochondrial 12S ribosomal RNA gene. Avian Research, 6(1): 1. DOI: 10.1186/s40657-015-0010-5 |
Length | A% | T% | G% | C% | AT% | GC% | AT-skew | GC-skew | |
mDNA | 17,939–17,941 | 29.56 ± 0.05 | 24.95 ± 0.15 | 14.38 ± 0.04 | 31.12 ± 0.14 | 54.51 ± 0.12 | 45.5 ± 0.13 | 0.085 ± 0.003 | −0.368 ± 0.002 |
PCGs | 11,394 | 28.16 ± 0.07 | 25.60 ± 0.20 | 14.05 ± 0.06 | 32.20 ± 0.19 | 53.75 ± 0.14 | 46.25 ± 0.15 | 0.047 ± 0.004 | −0.392 ± 0.005 |
tRNA | 1545 | 29.68 ± 0.20 | 28.20 ± 0.04 | 21.60 ± 0.18 | 20.56 ± 0.07 | 57.88 ± 0.20 | 42.16 ± 0.17 | 0.026 ± 0.004 | 0.025 ± 0.005 |
rRNA | 2580–2581 | 32.89 ± 0.03 | 21.43 ± 0.06 | 20.68 ± 0.04 | 25.03 ± 0.06 | 54.32 ± 0.06 | 45.71 ± 0.07 | 0.211 ± 0.001 | −0.096 ± 0.001 |
CR1 | 1157–1158 | 24.34 ± 0.25 | 31.38 ± 0.15 | 14.08 ± 0.31 | 30.21 ± 0.22 | 55.71 ± 0.30 | 44.29 ± 0.30 | −0.126 ± 0.005 | −0.364 ± 0.011 |
CR2 | 1205–1207 | 25.67 ± 0.25 | 31.04 ± 0.17 | 13.34 ± 0.21 | 29.96 ± 0.25 | 56.71 ± 0.33 | 43.29 ± 0.33 | −0.095 ± 0.005 | −0.384 ± 0.006 |
Lineage | A | B | C | D | E |
A | 0.0056 | 0.0548 | 0.0588 | 0.0585 | |
B | 0.0046 | 0.0541 | 0.0584 | 0.0575 | |
C | 0.0409 | 0.0408 | 0.0495 | 0.0595 | |
D | 0.0453 | 0.0452 | 0.0378 | 0.0588 | |
E | 0.0451 | 0.0442 | 0.0451 | 0.0461 | |
Upper right are the distances computed based on 13 protein-coding gene sequences, whereas the lower left are the distances computed based on complete mitogenome sequences. |