biological
entity. Microbial communities, which carry out the biogeochemical
processes of nature, are made up of
populations (often assumed to be species) which in turn are made
up of ecotypes, and then of genotypes. To understand the composition
and function of communities we must also understand the diversity
within species since many functional differences in populations
can occur at this level. We have used the greater than 150 sequenced
microbial genomes to explore how much variation in gene content
occurs among closely related organisms since this indicates how
similar their phenotypes should be. Using average nucleotide identity
(ANI) of the conserved genes between any pairs of related strains
as a measure of evolutionary distance, we found that members of
some species may vary by up to 25% in common gene content. This
high degree of gene diversity within a species suggests that a
common phenotype cannot be reliably predicted for such species.
Other examples show that evolutionary divergence as measured by
ANI to be quite high among members of these species but that the
common gene content is unusually high. In this case the niche of
the organism appears to have provided strong selection for the
same gene content. Together these results indicate that an organism’s
ecology must be considered in the species concept and that the
current species definition is too liberal if one reasonably expects
a species to reflect phenotype and where the organism lives. Advancing
the prokaryotic species concept from genomic insights will help
understand the extensive terra diversity and aid in predicting
behavior and function of natural populations.
Biogeography of terra populations is important for understanding
functional capacities of the landscape and reveals the organism’s
ecology. It gives the characterization of diversity a spatial component.
Any important aspect of biogeography is to understand whether populations
are cosmopolitan or endemic, and if endemic to what degree, i.e.
at the level of species, ecotype or genotype. Endemism is to be
expected when the rate of genetic change in a local environment
is greater than the rate of dispersal of the organism. This is
expected to be different for different organisms, reflecting their
different biology, and for different habitats, reflecting their
accessibility, resources and stability. We have used fluorescent
Pseudomonas and Burkholderia cepacia complex as two models with
different biology and genome features to explore this question
on a global scale. As expected, populations are not endemic at
the level of the 16S rRNA gene, unless dispersal has been blocked,
because this is a highly conserved gene. However, genome structure,
as measured by rep-PCR, shows endemism at local scales, i.e. under
200m, and a high degree of endemism at continental scales. As intermediate
measures of endemism we have use whole genome DNA-DNA hybridization,
microarrays of 1kb genome fragments and multi-locus sequence typing
of faster evolving core genes such as recA and gyrB. All fill the
gap between the above two methods with the microarrays providing
the most resolution. All methods indicate moderate to weak endemism
in these two target populations.
To advance gene-based infrastructure for exploring key genes in
the unknown terra microbiology, we have developed a Functional
Gene Database whose gene retrieval is based on protein searches
using a Hidden Markov Model (HMM). This has the major advantage
over BLAST in that it is a model-based search and it is know to
be the best indicator of common function. Online alignment and
NJ-joining treeing functions are the currently available analytical
tools with a probe match underdevelopment. The retrieved sequences
are scored relative to the model for full length and conserved
motifs. This tool will be helpful in characterizing metagenome
sequences and targeted functional genes retrieved from soil DNA,
and may help in subdividing genes families according to functional
differences.
Challenges for advancing terra microbiology center on understanding
the microbe in relation to its habitat, or in the genomic context,
in defining the interaction of the genome with the organism’s
ecology. The following topics are important to advancing this goal:
a) determining biogeography of populations, b) defining density,
dynamics of those populations, c) defining gene - phenotype linkages,
d) understanding “genes” of unknown function, e) characterizing
environmental control of expression, f) conceptualizing models
of “species” evolution. These goals will be greatly
aided by more genome sequence, of both organisms and metagenomes.
For ascribing gene function to ecology, we would be greatly aided
by more sequences from closely related organisms that are successful
in different niches and/or exhibit different phenotype. With such
information we stand a better chance of interpreting genome information
in an ecological context.
|