Ancient rivers shaped the current genetic diversity of the wood mouse (Apodemus speciosus) on the islands of the Seto Inland Sea, Japan

The current distributions of organisms have been shaped by both current and past geographical barriers. However, it remains unclear how past geographical factors—currently cryptic on the sea floor—affected the current distributions of terrestrial animals. Here, we examined the effects of currently cryptic ancient rivers on current genetic differentiation of the large Japanese wood mouse, Apodemus speciosus, which inhabits islands in the Seto Inland Sea, Japan. Genome-wide polymorphisms were identified by GRAS-Di (Genotyping by Random Amplicon Sequencing, Direct) analysis of 92 A. speciosus individuals. Maximum-likelihood analysis was performed with 94,142 single nucleotide polymorphisms (SNPs) identified by GRAS-Di analyses. Ancient rivers were visualized by Geographic Information System (GIS) channel analysis. Maximum-likelihood analysis showed strong support for the monophyly of each population in the islands in the Seto Inland Sea; it also showed close relationships between Innoshima-Ikuchijima, Ohmishima-Hakatajima-Oshima, Ohmishima-Hakatajima, Ohsakikamijima-Ohsakishimojima, Kamikamagarijima-Shimokamagarijima, and Kurahashijima-Etajima islands. The principal component analyses of the SNPs also supported these relationships. Furthermore, individuals from islands located on the east and west sides of the main stream of the ancient river were clustered on each side with strong support. These phylogenetic relationships were completely congruent with the paleogeographic relationships inferred from ancient rivers. In conclusion, the findings demonstrated that the current distribution of genetically distinct island lineages was shaped by ancient rivers that are currently submerged beneath the Seto Inland Sea, Japan.

nonglacial cycles induced sea level fluctuations [2,3]. Therefore, islands surrounded by shallow sea floor would have been considerably affected by changes in sea level [4,5], whereby the current sea floor may have characteristics that contributed to the current biogeographic patterns of terrestrial animals. Although the effects of changes in the island-area and/or connectivity due to Late Quaternary sea-level oscillations on the current biodiversity were examined previously (e.g., plants [6]; insects [7]), those of currently submerged structures have not been assessed. An understanding of these cryptic geographical factors would provide novel insights into the lineage differentiation of organisms that cannot be understood in the framework of contemporary biogeography.
The Japanese archipelago is composed of 6852 islands with more than 0.1 km of coastline (Geospatial Information Authority of Japan); this provides the opportunity to explore the effects of cryptic geographical factors because of the arrangement of the islands, surrounded by shallow seas. Among the islands in the Japanese archipelago, the Seto Inland Sea (the largest inland sea in Japan) possesses more than 700 islands (Ministry of Environment, Japan). Because the mean depth of the inland sea is ca. 38 m (Ministry of Environment, Japan), the current Seto Inland Sea was not present during the last glacial period, when the sea level dropped below this depth; the global sea level was ca. 120-135 m lower than today at the last glacial maximum (LGM) [2,3,8]. Therefore, the origin of the sea is comparatively young, beginning in the earlier Holocene because of the marine transgression (i.e., the Jomon transgression; [9]). Previous geological and ostracod paleobiogeographical studies inferred that the islands in the Seto Inland Sea began formation because of sea water invasion at ca. 11 ka (kilo annum); the current form was established by ca. 8 ka [10][11][12]. However, the geological history of the islands of the Seto Inland Sea is poorly known because of limited geological information present at the sea floor. Two major paleo-drainages (hereafter referred to as ancient rivers; Additional File 1) were inferred from the seafloor topography, the Hoyo and Kitan Rivers; these may have flowed into the Pacific Ocean through Bungo (between Shikoku and Kyushu islands) and Kii (between Shikoku and Honshu islands) channels, respectively, affecting the formation process of the islands of the Seto Inland Sea [13,14]. These ancient rivers are assumed to have already been present during LGM and played an initial role to form the islands in the transgression. Previous biogeographic studies of a plant (Rhododendron ripense; [15]), snails (Semisulcospira spp.; [16]), and fish (Japanese rosy bittering Rhodeus ocellatus kurumeus; [17]) in the main Japanese islands (Honshu, Shikoku, and Kyushu) suggested the influence of ancient rivers on the current population genetic structures of these species. To our knowledge, there have been no reports regarding the relationships between ancient rivers and the phylogeography of organisms inhabiting the islands in the Seto Inland Sea; the formation of this phylogeography was presumably directly influenced by ancient rivers. Clarifying the effects of the ancient river on the population genetic structure of organisms in the Seto Inland Sea requires additional lines of evidence.
Terrestrial organisms on islands are potential candidates to understand the geological relationships among islands. The large Japanese wood mouse, Apodemus speciosus, is an old terrestrial mammal in Japan, originating at around 6 Ma [18,19] (see [20] for review of origins of mammals in Japan); it is distributed in major and adjacent small islands throughout the Japanese archipelago except the Ryukyu Islands [21]. It is also adapted to the deciduous forest ecosystems in the islands of the Seto Inland Sea [22,23]. Therefore, analysis of the wood mice in these islands is expected to provide information concerning the geological history of the islands in the Seto Inland Sea. Sato et al. (2017) examined the genetic variations of the wood mouse populations in the islands of the Seto Inland Sea using a portion of the nucleotide sequence of the mitochondrial Dloop region (ca. 300 bp); they reported that the genetic diversity was reduced on smaller islands and that the Dloop haplotypes were not shared among islands [24]. These results suggested that the wood mouse populations are genetically homogeneous on each island because of genetic drift and differentiated from each other by island separation [24]. However, interrelationships among the Dloop haplotypes from these islands were unclear because of the low resolution related to the use of only a short fragment of the Dloop region (ca. 300 bp).
Because of recent progress in next-generation sequencing (NGS) technology, this methodology can be used to obtain data regarding large numbers of polymorphisms in the genome. Several genome-wide reduced-representation sequencing approaches have been developed, such as RAD-seq [25], MIG-seq [26], and GRAS-Di [27,28]. The GRAS-Di method (i.e., Genotyping by Random Amplicon Sequencing, Direct) was first reported in conference proceedings by Toyota Motor Corporation [27,28]; it is a PCR-based genotyping-by-sequencing technique that uses only 3-bp random multiplexed primers. Although it has attracted little attention, GRAS-Di is considered a useful alternative to other genotyping-bysequencing methods, such as RAD-seq or MIG-seq [29]. It can detect thousands of SNPs in the genome, usually larger than the SNPs detected by MIG-seq, and does not require a large amount of high-quality DNA, in contrast to RAD-seq [29]. This method has only recently been used for various organisms in the ecological and evolutionary context, including vertebrates [29], invertebrates [30], and plants [31]. For example, Hosoya et al. (2019) examined the estuarine mangrove fishes around the Ryukyu Islands using the GRAS-Di method; they also resolved the population genetic structures among islands for some mangrove fish species, suggesting that this is a powerful method to assess intraspecific genetic variations [29]. The ability of this method to identify genomewide polymorphisms could be informative for wood mice on the islands of the Seto Inland Sea.
In this study, we examined genetic variations among wood mice inhabiting the islands of the Seto Inland Sea using the GRAS-Di method; we tested the hypothesis that ancient rivers could explain the genetic relationships among the wood mice on these islands.

Samples examined in this study
We used DNA samples from 92 individuals of the large Japanese wood mouse, A. speciosus, consisting of 80 samples extracted in our previous studies [24,32] and 12 newly obtained samples from individuals in the field survey described below (Table 1). Wood mouse individuals were captured in Kure (Kre) on Honshu Island, Kurahashijima Island (Kra), and Etajima Island (Eta) using Sherman live traps baited with oats (Avena sativa) in 2019 ( Fig. 1B; Table 1). After the mice had been captured, we collected several tissue samples using an ear punch for genetic analyses; we then released the animals to each sampling point. We obtained permission from Hiroshima Prefecture to perform the field survey and laboratory experiments on the wood mice; the study protocol was approved by the Animal Care and Use Committee of Fukuyama University (H30-Animal-8). We also followed the guidelines of the Procedure of Obtaining Mammal Specimens established by the Mammal Society of Japan.

DNA extraction, PCR, and sequencing
DNA was extracted from ear tissue samples using a DNeasy Blood & Tissue Kit (Qiagen, Hilden, Germany), in accordance with the manufacturer's instructions. The extracted DNA concentration was calculated using a NanoDrop spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA) and used to determine the amount of DNA input for PCR analyses. Two-step PCR was used to construct a library for NGS via the GRAS-Di method. Two consecutive rounds of PCR were performed using a PrimeSTAR ® HS DNA Polymerase kit (Takara, Kusatsu, Japan) on a LifeECO thermal cycler (Bioer Technology, Hangzhou, China). The first PCR mixture (25 μL) contained 5 × PrimeS-TAR Buffer (Mg 2+ ), 0.2 mM of each dNTP, 40 pM of multiplexed primer mix (for each random primer, see Additional File 2), PrimeSTAR HS DNA Polymerase (0.625 U/sample), template DNA (15 ng), and PCR-grade water. The first round of PCR was performed with an initial denaturation step at 98 °C for 2 min, followed by 30 cycles of denaturation at 98 °C for 10 s, annealing at 50 °C for 15 s, and extension at 72 °C for 20 s.
The second round of PCR was performed directly on PCR products obtained from the first round of PCR in a mixture (50 μL The concentrations of the libraries were assessed using a Synergy H1 microplate reader (BioTek, Winooski, VT, USA) with QuantiFluor dsDNA System (Promega, Madison, WI, USA); the qualities of these libraries were examined using a Fragment Analyzer (Agilent Technologies, Santa Clara, CA, USA) with a dsDNA 915 Reagent Kit (Agilent Technologies). Finally, NGS was performed using an Illumina HiSeq X Ten with HiSeq X Ten Reagent Kit v2 (150 cycles, paired-end). Raw FASTQ data for each sample generated from the NGS run were deposited in the DDBJ SRA database (DRA013162). Experiments from PCR to sequencing were performed by Bioengineering Lab. Co., Ltd. (Sagamihara, Japan).

Data filtering and SNP calling
For FASTQ data generated from the NGS run and demultiplexed into each individual, Sickle version 1.33 [33] was used to trim sequence data into only data with quality > Q30 and length > 100 bp. FASTX-Toolkit version 0.0.14 (http:// hanno nlab. cshl. edu/ fastx_ toolk it/ index. html, Accessed 2 December 2021) was then used to unify the length of the examined sequence to 100 bp (fastx_trimmer).We then used Bowtie2 version 2.3.4.1  [34] to map these trimmed sequences to the reference genome of Apodemus speciosus (Accession code in NCBI: Aspe_assembly01 [35]) and obtained ".sam" file, followed by conversion of ".sam" into sorted ".bam" file in Samtools version 1.7 [36]. With these sorted ".bam" files, we applied STACKS version 2.59 [37,38] with the commands "ref_ map.pl" and "populations" to obtain the SNPs among samples and the input files for the subsequent phylogenetic and principal component analyses. We extracted a randomly selected SNP from each locus using the "writerandom-snp" option in "populations". We also omitted SNPs sites that showed more than 0.5 heterozygosity and less than 0.05 mean allele frequency. Other parameters in the STACKS were set by default. a The locality codes are consistent with those in Fig. 1 b Total reads obtained from NGS c Filtered reads with more than Q30 quality and with length more than100 bp d Samples newly obtained in this study e Sequences examined in STACKS for each sample (base pair)

Phylogenetic inferences
For phylogenetic analysis, Iqtree version 2.0.3 [39] was used to construct the maximum likelihood tree with the GTR + G substitution model; nodal support in the phylogenetic tree was evaluated using the Ultrafast bootstrap proportion method (10,000 replications) [40]. Because there was no outgroup information in our samples, the midpoint rooting method was used to construct the phylogeny.
We then used tidyverse library in the R environment [42] to depict the PCA graphs with ggplot function.

Assessment of correlation between genetic and geographic distances
To assess the correlation between genetic and geographic distances, we compared p-distances calculated in MEGA X [43] based on the SNP data and geographic distances calculated from coordinates (Additional File 3) captured in QGIS version 3.4.12 [44]. The calculation of the geographic distances from coordinates was performed with geosphere library in the R environment. We used vegan library in R to conduct a Mantel test to evaluate the statistical significance between genetic and geographic distances with the pearson method and 9999 permutations.

Visualization of ancient rivers
To visualize ancient rivers from the geological data, channel analysis was performed using QGIS and SAGA GIS version 2.3.2 [45]. The dataset used in this analysis was GEBCO_2019 Grid [46], a continuous terrain model for ocean and land with a spatial resolution of 15 arc seconds. This dataset is composed of digital elevation data obtained by satellites, bathymetry data obtained with multi-beam sonar from ships, or the sea floor topography estimated from satellite altimeters. The SAGA GIS hydrological modeling tool was used for channel analysis. The tool Fill sinks xxl [47] was used to fill surface depressions in the Digital Elevation Model, thereby generating a depression-less elevation layer. A Strahler stream order raster was created from the filled elevation data using the Strahler order tool. Finally, using the SAGA GIS tool, Terrain Analysis Channels, tributaries with Strahler's stream order 4-8 were extracted.

Extracted data
From GRAS-Di sequencing with the Illumina NGS platform, we obtained a mean (standard deviation) of 531,306 (242,846) total reads among the 92 samples examined (Table 1). These data were filtered for quality > Q30 and length > 100 bp. Consequently, the filtered data yielded a mean (standard deviation) of 401,983 (186,770) reads (Table 1). Sequences examined in the STACKS analyses had a mean (standard deviation) length of 2,688,332 (932,656) bp (Table 1).

Phylogenetic relationships
Through SNP-calling in STACKS analyses, we obtained 94,142 SNPs among 92 individuals. The maximum likelihood tree inferred from these 94,142 SNPs showed that all individuals on each island in the Seto Inland Sea were clustered (Fig. 1A). These clades were supported with high bootstrap values (100%; Fig. 1A; Additional File 4). Individuals in the larger Shikoku Island (Ima, Koc, and Tsu) were also clustered in a clade with high bootstrap value (97%; Fig. 1A; Additional File 4). With regard to the interrelationships among island individuals, we observed six highly supported relationships among adjacent islands ( Fig. 1A and B; labeled A-F): A, Inn (Innoshima)-Iku (Ikuchijima); B, Ohm (Ohmishima)-Hak (Hakatajima)-Ohs (Ohshima); C, Ohm-Hak; D, Osk (Ohsakikamijima)-Oss (Ohsakishimojima); E, Kk (Kamikamagarijima)-Sk (Shimokamagarijima); F, Kra (Kurahashijima)-Eta (Etajima). These relationships were supported by high bootstrap values (100%; Fig. 1A; Additional File 4). Additionally, individuals from eastern islands (A + B; 99% for Inn, Iku, Ohm, Hak, and Ohs) and western islands (D + E; 95% for Osk, Oss, Kk, and Sk) located in each side of the main stream of the hypothetical ancient Hoyo river were clustered in each clade with strong support (Fig. 1A; Additional File 4). Individuals from Muk (Mukaishima) and Kra-Eta were not included in the eastern and western clades, respectively, and were closely related to those from Honshu (Fuk, Ono, and Kre). See Additional File 4 for all bootstrap values in the phylogeny.

Principal component analysis
The result of the PCA analysis was consistent with that observed in the phylogenetic analyses (Fig. 1C). Five of six clusters that were observed in the phylogeny (A, B, C, D, E in Fig. 1A) were also clustered in the PCA plot (Fig. 1C). The cluster F including Kra and Eta individuals was weakly supported in the comparison excluding the individuals within the clade A, B, C, D, and E (Fig. 1D).

Correlation between genetic and geographic distances
There was no correlation between genetic and geographic distances (Additional File 5). Mantel test provided a coefficient of r = -0.08681, which was statistically not significant (P = 0.9413). These results show that the close relationships among island lineages cannot be explained by geographical proximity between the islands in the Seto Inland Sea.

Ancient rivers detected
Through channel analyses, we detected ancient rivers that were presumed to be present in the Seto Inland Sea. We assessed the Strahler's stream order through 4-8 (Additional File 6) and focused on major streams by combining streams concerned with the island relationships for visual reasons (Fig. 1B). See Additional File 6 for precise information of extracted rivers from each Strahler's stream order. The major stream (hypothesized to be an ancient river known as Hoyo River) was found to have divided several islands; their division patterns were completely congruent with the phylogenetic relationships estimated above (Fig. 1A and B).

Discussion
Compared with our previous study using a portion of the mitochondrial Dloop sequence (ca. 300 bp; [24]), we obtained much a larger data set (2,688,332 bp) for phylogenetic analysis. The number of SNPs (94,142) extracted from these data was much greater than that in a previous study of estuarine mangrove fishes, which detected 4000-9000 SNPs after quality filtering [29]. This may be because we used reference-genome based SNP calling. Our initial analysis with de novo SNP calling provided 7678 SNPs that were consistent with those in the previous study. The large number of SNPs generated by the GRAS-Di analysis provided novel findings concerning the implications of the ancient river in shaping the current genetic diversity of A. speciosus on the islands of the Seto Inland Sea.
We used a concatenated SNPs dataset for the phylogenetic analyses. Concatenating the data might ignore coalescent variance and assume that all loci share the same genealogy [48]. Especially, incomplete lineage sorting (ILS) is often a problem for inferring the phylogenetic relationships. However, the phylogenetic node where the ILS is concerned is an "anomaly zone", which has a tight span between consecutive divergences (e.g., due to rapid radiation) and in a condition in which the ancestral population size was large [49]. Such a node is difficult to resolve using either concatenation or ILS-aware methods, often producing low support values. As inferred from the high bootstrap values obtained in this study, the supported relationships should not have been affected by the ILS. Furthermore, considering the small population sizes in the island populations as inferred from the low genetic diversity of each island population compared with the Honshu or Shikoku populations [24], it is not likely that ancestral polymorphisms have been considerably maintained in such a small population as observed in mtDNA analyses where haplotypes in islands were not observed in Honshu and Shikoku [24]. Moreover, it is difficult to detect the ILS in this study because a small fragment sequence of less than 150 bp from each locus (HiSeq sequencer produces 150 bp at maximum) would have poor phylogenetic information [50]. In other words, the topology obtained from the concatenated sequences could not be rejected by any gene-tree discordances from each locus. It should also be noted that there is much hidden phylogenetic information in each locus that would not be emergent in the single locus analysis [50]. Using the concatenation method that would reveal the hidden information in each locus, a correct phylogenetic relationship would be supported even under the presence of gene tree discordance [50]. Extraction of only one SNP randomly from each locus might provide open and/or hidden phylogenetic information, enabling fair analyses. For these reasons, we performed a concatenated phylogenetic analysis.
As shown in Fig. 1B, we found complete consistency between phylogenetic affinity among some island lineages and the geographic relationship predicted by ancient rivers that were detected by channel analyses. The PCA analyses also provided consistent results (Fig. 1C, D). Overall, the results strongly support the hypothesis of the presence of the ancient Hoyo River flowing on the west side of the Seto Inland Sea. Although it is not clear that the fluid water flowed in the cold environment during LGM at around 20 ka, the channel analysis using sea level 120 m lower than today suggested the presence of the structure of ancient rivers in the Seto Inland Sea at that time (Additional file 1). We therefore assumed that the gene flows should have been restricted by the ancient rivers during the LGM. To supplement the geological evidence, which has been difficult to explore without largescale investigation on the sea floor surface, the presence of ancient rivers in the Seto Inland Sea has been suggested by biological (particularly biogeographical) data. Kondo et al. (2009) examined microsatellite variations in a plant species, Rhododendron ripense, which has a seed dispersal strategy involving water; they suggested that populations were genetically different among river systems on Honshu, Shikoku, and Kyushu Islands that could be interconnected via two major ancient rivers in the Seto Inland Sea [15]. The current genetic structure of East Asian freshwater snails in Japan was presumably affected by past mtDNA introgressions that were facilitated by ancient rivers in the Seto Inland Sea [16]. Furthermore, Takahashi et al. (2020) showed genetic differentiation of the Japanese rosy bitterling, Rhodeus ocellatus kurumeus, between Honshu-Shikoku and Kyushu lineages using the Double Digest RAD-seq technique; they suggested the effect of ancient rivers on the lineage differentiation [17]. These previous studies, considered with the present study, supported the presence of the ancient river, thereby suggesting that the ancient river hypothesis is correct. Although interrelationships among the island lineages were not resolved in our previous study through the analysis of short Dloop sequences [24], much more abundant sequence data from the GRAS-Di analyses in the present study greatly increased the resolution concerning phylogenetic relationships among island lineages in the Seto Inland Sea. Notably, the individuals from islands on the east and west sides of the Hoyo River were closely related on each side, further supporting the ancient river hypothesis. It is also remarkable that Muk and Kra-Eta individuals were not clustered with the eastern and western clades, respectively, and instead were closely related to the Honshu individuals, also highlighting the effect of the ancient river. In future studies, application of the same methodology used in the present study for the other island populations of wood mice would illuminate the overall formation process of islands in the Seto Inland Sea. Furthermore, understanding the effects of current rivers on the genetic diversity of the wood mice would also be expected.
The observation that the island lineages were clustered in each clade in the phylogeny suggested that the effects of genetic drift through the bottleneck were stronger on the smaller islands in the Seto Inland Sea, making islandspecific lineages in combination with independent mutations that occurred on each island after separation of the islands or with differential fixation of ancestral genetic variations. Such an island trend is also congruent with the results reported by Sato et al. [24], whereby the Dloop haplotypes detected on each island were not shared among different islands. However, the trend that the island haplotypes were more closely related to Honshu haplotypes than to Shikoku haplotypes, as observed in the Dloop analyses [24], was not confirmed in the present study, except for Muk and Kra-Eta islands, which would not have been separated from Honshu by ancient rivers. This may have been related to the stochastic effect caused by the low resolution of the Dloop sequences or the difference between mitochondrial DNA genealogy and phylogeny based on genome-wide SNP data. Further studies with more samples from Honshu and Shikoku Islands and high-throughput genome sequencing are needed to test these trends. Such studies should also clarify the effect of ancient rivers on the Honshu and Shikoku Islands. It should be further noted that effective population size of the ancestral population has affected the degree of maintenance of ancestral genetic variation, thereby explaining the long terminal branches in the phylogeny. However, it is difficult to estimate the population size parameter in this study with only a few individuals from each island. Analyses of more samples from islands would clarify the reasons for the island specificity in more detail.

Conclusions
Using the GRAS-Di high-throughput sequencing method, we clarified the genetic diversity of the large Japanese wood mouse that would have been shaped by an ancient river, the Hoyo River. To our knowledge, this is the first study to clarify the implications of an ancient river-currently submerged on the sea floor-in the genetic differentiation of island organisms. Application of the same method to various terrestrial organisms would elucidate the geological history of the islands of the Seto Inland Sea and the mechanisms that have shaped ecosystems on each island.