|
|
Title:
Genetic polymorphisms associated with Alzheimer's Disease, methods of
detection and uses thereof
United States Patent: 7,695,911
Issued: April 13, 2010
Inventors: Li; Yonghong
(Palo Alto, CA), Grupe; Andrew (Orinda, CA)
Assignee: Celera
Corporation (Alameda, CA)
Appl. No.: 11/586,427
Filed: October 24, 2006
|
|
|
Executive MBA in Pharmaceutical Management, U. Colorado
|
Abstract
The present invention is based on the
discovery of genetic polymorphisms that are associated with Alzheimer's
Disease. In particular, the present invention relates to nucleic acid
molecules containing the polymorphisms, variant proteins encoded by such
nucleic acid molecules, reagents for detecting the polymorphic nucleic
acid molecules and proteins, and methods of using the nucleic acid and
proteins as well as methods of using reagents for their detection.
Description of the
Invention
The present invention provides SNPs
associated with Alzheimer's Disease, nucleic acid molecules containing
these SNPs, methods and reagents for the detection of the SNPs disclosed
herein, uses of these SNPs for the development of detection reagents, and
assays or kits that utilize such reagents. The AD-associated SNPs
disclosed herein are useful for diagnosing, screening for, and evaluating
predisposition to Alzheimer's Disease and other neurological pathologies
in humans. Furthermore, such SNPs and their encoded products are useful
targets for the development of therapeutic agents in treating Alzheimer's
Disease and other neurological pathologies.
A large number of SNPs have been identified from re-sequencing DNA from 39
individuals, and they are indicated as "Applera" SNP source in Tables 1-2 (see Original Patent).
Their allele frequencies, observed in each of the Caucasian and
African-American ethnic groups, are provided. Additional SNPs included
herein were previously identified during shotgun sequencing and assembly
of the human genome, and they are indicated as "Celera" SNP source in
Tables 1-2 (see Original Patent). Furthermore, the information provided in
Table 1-2, particularly the allele frequency information obtained from 39
individuals and the identification of the precise position of each SNP
within each gene/transcript, allows haplotypes (i.e., groups of SNPs that
are co-inherited) to be readily inferred. The present invention
encompasses SNP haplotypes, as well as individual SNPs.
Thus, the present invention provides individual SNPs associated with
Alzheimer's Disease, as well as combinations of SNPs and haplotypes in
genetic regions associated with Alzheimer's Disease, polymorphic/variant
transcript sequences (SEQ ID NOS:1-5) and genomic sequences (SEQ ID
NOS:16-19) containing SNPs, encoded amino acid sequences (SEQ ID NOS:
6-10), and both transcript-based SNP context sequences (SEQ ID NOS: 11-15)
and genomic-based SNP context sequences (SEQ ID NOS:20-31) (transcript
sequences, protein sequences, and transcript-based SNP context sequences
are provided in Table 1 (see Original Patent) and the Sequence Listing;
genomic sequences and genomic-based SNP context sequences are provided in
Table 2 (see Original Patent) and the Sequence Listing), methods of
detecting these polymorphisms in a test sample, methods of determining the
risk of an individual of having or developing Alzheimer's Disease, methods
of screening for compounds useful for treating neurological pathologies
such as Alzheimer's Disease associated with a variant gene/protein,
compounds identified by these screening methods, methods of using the
disclosed SNPs to select a treatment strategy, methods of treating a
disorder associated with a variant gene/protein (i.e., therapeutic
methods), and methods of using the SNPs of the present invention for human
identification.
The present invention provides novel SNPs associated with Alzheimer's
Disease, as well as SNPs that were previously known in the art, but were
not previously known to be associated with Alzheimer's Disease.
Accordingly, the present invention provides novel compositions and methods
based on the novel SNPs disclosed herein, and also provides novel methods
of using the known, but previously unassociated, SNPs in methods relating
to Alzheimer's Disease (e.g., for diagnosing Alzheimer's Disease). In
Tables 1-2, known SNPs are identified based on the public database in
which they have been observed, which is indicated as one or more of the
following SNP types: "dbSNP"=SNP observed in dbSNP, "HGBASE"=SNP observed
in HGBASE, and "HGMD"=SNP observed in the Human Gene Mutation Database (HGMD).
Particular SNP alleles of the present invention can be associated with
either an increased risk of having or developing Alzheimer's Disease, or a
decreased risk of having or developing Alzheimer's Disease. SNP alleles
that are associated with a decreased risk of having or developing
Alzheimer's Disease may be referred to as "protective" alleles, and SNP
alleles that are associated with an increased risk of having or developing
Alzheimer's Disease may be referred to as "susceptibility" alleles, "risk"
alleles, or "risk factors." Thus, whereas certain SNPs (or their encoded
products) can be assayed to determine whether an individual possesses a
SNP allele that is indicative of an increased risk of having or developing
Alzheimer's Disease (i.e., a susceptibility allele), other SNPs (or their
encoded products) can be assayed to determine whether an individual
possesses a SNP allele that is indicative of a decreased risk of having or
developing Alzheimer's Disease (i.e., a protective allele). Similarly,
particular SNP alleles of the present invention can be associated with
either an increased or decreased likelihood of responding to a particular
treatment or therapeutic compound, or an increased or decreased likelihood
of experiencing toxic effects from a particular treatment or therapeutic
compound. The term "altered" may be used herein to encompass either of
these two possibilities (e.g., an increased or a decreased
risk/likelihood).
Those skilled in the art will readily recognize that nucleic acid
molecules may be double-stranded molecules and that reference to a
particular site on one strand refers, as well, to the corresponding site
on a complementary strand. In defining a SNP position, SNP allele, or
nucleotide sequence, reference to an adenine, a thymine (uridine), a
cytosine, or a guanine at a particular site on one strand of a nucleic
acid molecule also defines the thymine (uridine), adenine, guanine, or
cytosine (respectively) at the corresponding site on a complementary
strand of the nucleic acid molecule. Thus, reference may be made to either
strand in order to refer to a particular SNP position, SNP allele, or
nucleotide sequence. Probes and primers, may be designed to hybridize to
either strand and SNP genotyping methods disclosed herein may generally
target either strand. Throughout the specification, in identifying a SNP
position, reference is generally made to the protein-encoding strand, only
for the purpose of convenience.
References to variant peptides, polypeptides, or proteins of the present
invention include peptides, polypeptides, proteins, or fragments thereof,
that contain at least one amino acid residue that differs from the
corresponding amino acid sequence of the art-known
peptide/polypeptide/protein (the art-known protein may be interchangeably
referred to as the "wild-type," "reference," or "normal" protein). Such
variant peptides/polypeptides/proteins can result from a codon change
caused by a nonsynonymous nucleotide substitution at a protein-coding SNP
position (i.e., a missense mutation) disclosed by the present invention.
Variant peptides/polypeptides/proteins of the present invention can also
result from a nonsense mutation, i.e. a SNP that creates a premature stop
codon, a SNP that generates a read-through mutation by abolishing a stop
codon, or due to any SNP disclosed by the present invention that otherwise
alters the structure, function/activity, or expression of a protein, such
as a SNP in a regulatory region (e.g. a promoter or enhancer) or a SNP
that leads to alternative or defective splicing, such as a SNP in an
intron or a SNP at an exon/intron boundary. As used herein, the terms
"polypeptide," "peptide," and "protein" are used interchangeably.
Isolated Nucleic Acid Molecules and SNP Detection Reagents & Kits
Tables 1 and 2 provide a variety of information about each SNP of the
present invention that is associated with Alzheimer's Disease, including
the transcript sequences (SEQ ID NOS:1-5), genomic sequences (SEQ ID
NOS:16-19), and protein sequences (SEQ ID NOS:6-10) of the encoded gene
products (with the SNPs indicated by IUB codes in the nucleic acid
sequences). In addition, Tables 1 and 2 include SNP context sequences,
which generally include 100 nucleotide upstream (5') plus 100 nucleotides
downstream (3') of each SNP position (SEQ ID NOS:11-15 correspond to
transcript-based SNP context sequences disclosed in Table 1, and SEQ ID
NOS:20-31 correspond to genomic-based context sequences disclosed in Table
2), the alternative nucleotides (alleles) at each SNP position, and
additional information about the variant where relevant, such as SNP type
(coding, missense, splice site, UTR, etc.), human populations in which the
SNP was observed, observed allele frequencies, information about the
encoded protein, etc.
Isolated Nucleic Acid Molecules
The present invention provides isolated nucleic acid molecules that
contain one or more SNPs disclosed Table 1 and/or Table 2. Isolated
nucleic acid molecules containing one or more SNPs disclosed in at least
one of Tables 1-4 (see Original Patent) may be interchangeably referred to
throughout the present text as "SNP-containing nucleic acid molecules."
Isolated nucleic acid molecules may optionally encode a full-length
variant protein or fragment thereof. The isolated nucleic acid molecules
of the present invention also include probes and primers (which are
described in greater detail below in the section entitled "SNP Detection
Reagents"), which may be used for assaying the disclosed SNPs, and
isolated full-length genes, transcripts, cDNA molecules, and fragments
thereof, which may be used for such purposes as expressing an encoded
protein.
As used herein, an "isolated nucleic acid molecule" generally is one that
contains a SNP of the present invention or one that hybridizes to such
molecule such as a nucleic acid with a complementary sequence, and is
separated from most other nucleic acids present in the natural source of
the nucleic acid molecule. Moreover, an "isolated" nucleic acid molecule,
such as a cDNA molecule containing a SNP of the present invention, can be
substantially free of other cellular material, or culture medium when
produced by recombinant techniques, or chemical precursors or other
chemicals when chemically synthesized. A nucleic acid molecule can be
fused to other coding or regulatory sequences and still be considered
"isolated." Nucleic acid molecules present in non-human transgenic
animals, which do not naturally occur in the animal, are also considered
"isolated." For example, recombinant DNA molecules contained in a vector
are considered "isolated." Further examples of "isolated" DNA molecules
include recombinant DNA molecules maintained in heterologous host cells,
and purified (partially or substantially) DNA molecules in solution.
Isolated RNA molecules include in vivo or in vitro RNA transcripts of the
isolated SNP-containing DNA molecules of the present invention. Isolated
nucleic acid molecules according to the present invention further include
such molecules produced synthetically.
Generally, an isolated SNP-containing nucleic acid molecule comprises one
or more SNP positions disclosed by the present invention with flanking
nucleotide sequences on either side of the SNP positions. A flanking
sequence can include nucleotide residues that are naturally associated
with the SNP site and/or heterologous nucleotide sequences. Preferably the
flanking sequence is up to about 500, 300, 100, 60, 50, 30, 25, 20, 15,
10, 8, or 4 nucleotides (or any other length in-between) on either side of
a SNP position, or as long as the full-length gene or entire
protein-coding sequence (or any portion thereof such as an exon),
especially if the SNP-containing nucleic acid molecule is to be used to
produce a protein or protein fragment.
For full-length genes and entire protein-coding sequences, a SNP flanking
sequence can be, for example, up to about 5 KB, 4 KB, 3 KB, 2 KB, 1 KB on
either side of the SNP. Furthermore, in such instances, the isolated
nucleic acid molecule comprises exonic sequences (including protein-coding
and/or non-coding exonic sequences), but may also include intronic
sequences. Thus, any protein coding sequence may be either contiguous or
separated by introns. The important point is that the nucleic acid is
isolated from remote and unimportant flanking sequences and is of
appropriate length such that it can be subjected to the specific
manipulations or uses described herein such as recombinant protein
expression, preparation of probes and primers for assaying the SNP
position, and other uses specific to the SNP-containing nucleic acid
sequences.
An isolated SNP-containing nucleic acid molecule can comprise, for
example, a full-length gene or transcript, such as a gene isolated from
genomic DNA (e.g., by cloning or polymerase chain reaction [PCR]
amplification), a cDNA molecule, or an mRNA transcript molecule.
Polymorphic transcript sequences are provided in Table 1 and in the
Sequence Listing (SEQ ID NOS: 1-5), and polymorphic genomic sequences are
provided in Table 2 and in the Sequence Listing (SEQ ID NOS:16-19).
Furthermore, fragments of such full-length genes and transcripts that
contain one or more SNPs disclosed herein are also encompassed by the
present invention, and such fragments may be used, for example, to express
any part of a protein, such as a particular functional domain or an
antigenic epitope.
Thus, the present invention also encompasses fragments of the nucleic acid
sequences provided in Tables 1-2 (transcript sequences are provided in
Table 1 as SEQ ID NOS:1-5, genomic sequences are provided in Table 2 as
SEQ ID NOS:16-19, transcript-based SNP context sequences are provided in
Table 1 as SEQ ID NO:11-15, and genomic-based SNP context sequences are
provided in Table 2 as SEQ ID NO:20-31) and their complements. A fragment
typically comprises a contiguous nucleotide sequence at least about eight
or more nucleotides, more preferably at least about twelve or more
nucleotides, and even more preferably at least about sixteen or more
nucleotides. Further, a fragment could comprise at least about 18, 20, 22,
25, 30, 40, 50, 60, 80, 100, 150, 200, 250 or 500 nucleotides in length,
or any other number in between. The length of the fragment will be based
on its intended use. For example, the fragment can encode epitope-bearing
regions of a variant peptide or regions of a variant peptide that differ
from the normal/wild-type protein, or can be useful as a polynucleotide
probe or primer. Such fragments can be isolated using the nucleotide
sequences provided in Table 1 and/or Table 2 for the synthesis of a
polynucleotide probe. A labeled probe can then be used, for example, to
screen a cDNA library, genomic DNA library, or mRNA to isolate nucleic
acid corresponding to the coding region. Further, primers can be used in
amplification reactions, such as for purposes of assaying one or more SNPs
sites or for cloning specific regions of a gene.
An isolated nucleic acid molecule of the present invention further
encompasses a SNP-containing polynucleotide that is the product of any one
of a variety of nucleic acid amplification methods, which are used to
increase the copy numbers of a polynucleotide of interest in a nucleic
acid sample. Such amplification methods are well known in the art, and
they include but are not limited to, polymerase chain reaction (PCR) (U.S.
Pat. Nos. 4,683,195 and 4,683,202; PCR Technology: Principles and
Applications for DNA Amplification, ed. H. A. Erlich, Freeman Press, New
York, N.Y. [1992]), ligase chain reaction (LCR) (Wu and Wallace, Genomics
4:560 [1989]; Landegren et al., Science 241:1077 [1988]), strand
displacement amplification (SDA) (U.S. Pat. Nos. 5,270,184 and 5,422,252),
transcription-mediated amplification (TMA) (U.S. Pat. No. 5,399,491),
linked linear amplification (LLA) (U.S. Pat. No. 6,027,923), and the like,
and isothermal amplification methods such as nucleic acid sequence based
amplification (NASBA), and self-sustained sequence replication (Guatelli
et al., Proc. Natl. Acad. Sci. USA 87: 1874 [1990]). Based on such
methodologies, a person skilled in the art can readily design primers in
any suitable regions 5' and 3' to a SNP disclosed herein. Such primers may
be used to amplify DNA of any length so long that it contains the SNP of
interest in its sequence.
As used herein, an "amplified polynucleotide" of the invention is a SNP-containing
nucleic acid molecule whose amount has been increased at least two fold by
any nucleic acid amplification method performed in vitro as compared to
its starting amount in a test sample. In other preferred embodiments, an
amplified polynucleotide is the result of at least ten fold, fifty fold,
one hundred fold, one thousand fold, or even ten thousand fold increase as
compared to its starting amount in a test sample. In a typical PCR
amplification, a polynucleotide of interest is often amplified at least
fifty thousand fold in amount over the unamplified genomic DNA, but the
precise amount of amplification needed for an assay depends on the
sensitivity of the subsequent detection method used.
Generally, an amplified polynucleotide is at least about 16 nucleotides in
length. More typically, an amplified polynucleotide is at least about 20
nucleotides in length. In a preferred embodiment of the invention, an
amplified polynucleotide is at least about 30 nucleotides in length. In a
more preferred embodiment of the invention, an amplified polynucleotide is
at least about 32, 40, 45, 50, or 60 nucleotides in length. In yet another
preferred embodiment of the invention, an amplified polynucleotide is at
least about 100, 200, 300, 400, or 500 nucleotides in length. While the
total length of an amplified polynucleotide of the invention can be as
long as an exon, an intron or the entire gene where the SNP of interest
resides, an amplified product is typically up to about 1,000 nucleotides
in length (although certain amplification methods may generate amplified
products greater than 1000 nucleotides in length). More preferably, an
amplified polynucleotide is not greater than about 600-700 nucleotides in
length. It is understood that irrespective of the length of an amplified
polynucleotide, a SNP of interest may be located anywhere along its
sequence.
In a specific embodiment of the invention, the amplified product is at
least about 201 nucleotides in length, and comprises one of the
transcript-based context sequences or the genomic-based context sequences
shown in Tables 1-2. Such a product may have additional sequences on its
5' end or 3' end or both. In another embodiment, the amplified product is
about 101 nucleotides in length, and it contains a SNP disclosed herein.
Preferably, the SNP is located at the middle of the amplified product
(e.g., at position 101 in an amplified product that is 201 nucleotides in
length, or at position 51 in an amplified product that is 101 nucleotides
in length), or within 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 15, or 20
nucleotides from the middle of the amplified product (however, as
indicated above, the SNP of interest may be located anywhere along the
length of the amplified product).
The present invention provides isolated nucleic acid molecules that
comprise, consist of, or consist essentially of one or more polynucleotide
sequences that contain one or more SNPs disclosed herein, complements
thereof, and SNP-containing fragments thereof.
Accordingly, the present invention provides nucleic acid molecules that
consist of any of the nucleotide sequences shown in Table 1 and/or Table 2
(transcript sequences are provided in Table 1 as SEQ ID NOS:1-5, genomic
sequences are provided in Table 2 as SEQ ID NOS:16-19, transcript-based
SNP context sequences are provided in Table 1 as SEQ ID NO:11-15, and
genomic-based SNP context sequences are provided in Table 2 as SEQ ID
NO:20-31), or any nucleic acid molecule that encodes any of the variant
proteins provided in Table 1 (SEQ ID NOS:6-10). A nucleic acid molecule
consists of a nucleotide sequence when the nucleotide sequence is the
complete nucleotide sequence of the nucleic acid molecule.
The present invention further provides nucleic acid molecules that consist
essentially of any of the nucleotide sequences shown in Table 1 and/or
Table 2 (transcript sequences are provided in Table 1 as SEQ ID NOS:1-5,
genomic sequences are provided in Table 2 as SEQ ID NOS:16-19,
transcript-based SNP context sequences are provided in Table 1 as SEQ ID
NO:11-15, and genomic-based SNP context sequences are provided in Table 2
as SEQ ID NO:20-31), or any nucleic acid molecule that encodes any of the
variant proteins provided in Table 1 (SEQ ID NOS:6-10). A nucleic acid
molecule consists essentially of a nucleotide sequence when such a
nucleotide sequence is present with only a few additional nucleotide
residues in the final nucleic acid molecule.
The present invention further provides nucleic acid molecules that
comprise any of the nucleotide sequences shown in Table 1 and/or Table 2
or a SNP-containing fragment thereof (transcript sequences are provided in
Table 1 as SEQ ID NOS:1-5, genomic sequences are provided in Table 2 as
SEQ ID NOS:16-19, transcript-based SNP context sequences are provided in
Table 1 as SEQ ID NO:11-15, and genomic-based SNP context sequences are
provided in Table 2 as SEQ ID NO:20-31), or any nucleic acid molecule that
encodes any of the variant proteins provided in Table 1 (SEQ ID NOS:6-10).
A nucleic acid molecule comprises a nucleotide sequence when the
nucleotide sequence is at least part of the final nucleotide sequence of
the nucleic acid molecule. In such a fashion, the nucleic acid molecule
can be only the nucleotide sequence or have additional nucleotide
residues, such as residues that are naturally associated with it or
heterologous nucleotide sequences. Such a nucleic acid molecule can have
one to a few additional nucleotides or can comprise many more additional
nucleotides. A brief description of how various types of these nucleic
acid molecules can be readily made and isolated is provided below, and
such techniques are well known to those of ordinary skill in the art
(Molecular Cloning: A Laboratory Manual, Sambrook and Russell, Cold Spring
Harbor Press, New York [2000]).
The isolated nucleic acid molecules can encode mature proteins plus
additional amino or carboxyl-terminal amino acids or both, or amino acids
interior to the mature peptide (when the mature form has more than one
peptide chain, for instance). Such sequences may play a role in processing
of a protein from precursor to a mature form, facilitate protein
trafficking, prolong or shorten protein half-life, or facilitate
manipulation of a protein for assay or production. As generally is the
case in situ, the additional amino acids may be processed away from the
mature protein by cellular enzymes.
Thus, the isolated nucleic acid molecules include, but are not limited to,
nucleic acid molecules having a sequence encoding a peptide alone, a
sequence encoding a mature peptide and additional coding sequences such as
a leader or secretory sequence (e.g., a pre-pro or pro-protein sequence),
a sequence encoding a mature peptide with or without additional coding
sequences, plus additional non-coding sequences, for example introns and
non-coding 5' and 3' sequences such as transcribed but untranslated
sequences that play a role in, for example, transcription, mRNA processing
(including splicing and polyadenylation signals), ribosome binding, and/or
stability of mRNA. In addition, the nucleic acid molecules may be fused to
heterologous marker sequences encoding, for example, a peptide that
facilitates purification.
Isolated nucleic acid molecules can be in the form of RNA, such as mRNA,
or in the form DNA, including cDNA and genomic DNA, which may be obtained,
for example, by molecular cloning or produced by chemical synthetic
techniques or by a combination thereof (Molecular Cloning: A Laboratory
Manual, Sambrook and Russell, Cold Spring Harbor Press, New York [2000]).
Furthermore, isolated nucleic acid molecules, particularly SNP detection
reagents such as probes and primers, can also be partially or completely
in the form of one or more types of nucleic acid analogs, such as peptide
nucleic acid (PNA) (U.S. Pat. Nos. 5,539,082; 5,527,675; 5,623,049;
5,714,331). The nucleic acid, especially DNA, can be double-stranded or
single-stranded. Single-stranded nucleic acid can be the coding strand
(sense strand) or the complementary non-coding strand (anti-sense strand).
DNA, RNA, or PNA segments can be assembled, for example, from fragments of
the human genome (in the case of DNA or RNA) or single nucleotides, short
oligonucleotide linkers, or from a series of oligonucleotides, to provide
a synthetic nucleic acid molecule. Nucleic acid molecules can be readily
synthesized using the sequences provided herein as a reference;
oligonucleotide and PNA oligomer synthesis techniques are well known in
the art (see, e.g., Corey, "Peptide nucleic acids: expanding the scope of
nucleic acid recognition," Trends Biotechnol. 15[6]:224-9 [June 1997], and
Hyrup et al., "Peptide nucleic acids [PNA]: synthesis, properties and
potential applications," Bioorg. Med. Chem. 4[1]:5-23 [January 1996]).
Furthermore, large-scale automated oligonucleotide/PNA synthesis
(including synthesis on an array or bead surface or other solid support)
can readily be accomplished using commercially available nucleic acid
synthesizers, such as the Applied Biosystems (Foster City, Calif.) 3900
High-Throughput DNA Synthesizer or Expedite 8909 Nucleic Acid Synthesis
System, and the sequence information provided herein.
The present invention encompasses nucleic acid analogs that contain
modified, synthetic, or non-naturally occurring nucleotides or structural
elements or other alternative/modified nucleic acid chemistries known in
the art. Such nucleic acid analogs are useful, for example, as detection
reagents (e.g., primers/probes) for detecting one or more SNPs identified
in Table 1 and/or Table 2. Furthermore, kits/systems (such as beads,
arrays, etc.) that include these analogs are also encompassed by the
present invention. For example, PNA oligomers that are based on the
polymorphic sequences of the present invention are specifically
contemplated. PNA oligomers are analogs of DNA in which the phosphate
backbone is replaced with a peptide-like backbone (Lagriffoul et al.,
Bioorganic & Medicinal Chemistry Letters 4:1081-1082 [1994], Petersen et
al., Bioorganic & Medicinal Chemistry Letters 6:793-796 [1996], Kumar et
al., Organic Letters 3[9]:1269-1272 [2001], WO96/04000). PNA hybridizes to
complementary RNA or DNA with higher affinity and specificity than
conventional oligonucleotides and oligonucleotide analogs. The properties
of PNA enable novel molecular biology and biochemistry applications
unachievable with traditional oligonucleotides and peptides.
Additional examples of nucleic acid modifications that improve the binding
properties and/or stability of a nucleic acid include the use of base
analogs such as inosine, intercalators (U.S. Pat. No. 4,835,263) and the
minor groove binders (U.S. Pat. No. 5,801,115). Thus, references herein to
nucleic acid molecules, SNP-containing nucleic acid molecules, SNP
detection reagents (e.g., probes and primers), oligonucleotides/polynucleotides
include PNA oligomers and other nucleic acid analogs. Other examples of
nucleic acid analogs and alternative/modified nucleic acid chemistries
known in the art are described in Current Protocols in Nucleic Acid
Chemistry, John Wiley & Sons, New York (2002).
The present invention further provides nucleic acid molecules that encode
fragments of the variant polypeptides disclosed herein as well as nucleic
acid molecules that encode obvious variants of such variant polypeptides.
Such nucleic acid molecules may be naturally occurring, such as paralogs
(different locus) and orthologs (different organism), or may be
constructed by recombinant DNA methods or by chemical synthesis.
Non-naturally occurring variants may be made by mutagenesis techniques,
including those applied to nucleic acid molecules, cells, or organisms.
Accordingly, the variants can contain nucleotide substitutions, deletions,
inversions and insertions (in addition to the SNPs disclosed in Tables
1-2). Variation can occur in either or both the coding and non-coding
regions. The variations can produce conservative and/or non-conservative
amino acid substitutions.
Further variants of the nucleic acid molecules disclosed in Tables 1-2,
such as naturally occurring allelic variants (as well as orthologs and
paralogs) and synthetic variants produced by mutagenesis techniques, can
be identified and/or produced using methods well known in the art. Such
further variants can comprise a nucleotide sequence that shares at least
70-80%, 80-85%, 85-90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%
sequence identity with a nucleic acid sequence disclosed in Table 1 and/or
Table 2 (or a fragment thereof) and that includes a novel SNP allele
disclosed in Table 1 and/or Table 2. Further, variants can comprise a
nucleotide sequence that encodes a polypeptide that shares at least
70-80%, 80-85%, 85-90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%
sequence identity with a polypeptide sequence disclosed in Table 1 (or a
fragment thereof) and that includes a novel SNP allele disclosed in Table
1 and/or Table 2. Thus, an aspect of the present invention that is
specifically contemplated is isolated nucleic acid molecules that have a
certain degree of sequence variation compared with the sequences shown in
Tables 1-2, but that contain a novel SNP allele disclosed herein. In other
words, as long as an isolated nucleic acid molecule contains a novel SNP
allele disclosed herein, other portions of the nucleic acid molecule that
flank the novel SNP allele can vary to some degree from the specific
transcript, genomic, and context sequences shown in Tables 1-2, and can
encode a polypeptide that varies to some degree from the specific
polypeptide sequences shown in Table 1.
To determine the percent identity of two amino acid sequences or two
nucleotide sequences of two molecules that share sequence homology, the
sequences are aligned for optimal comparison purposes (e.g., gaps can be
introduced in one or both of a first and a second amino acid or nucleic
acid sequence for optimal alignment and non-homologous sequences can be
disregarded for comparison purposes). In a preferred embodiment, at least
30%, 40%, 50%, 60%, 70%, 80%, or 90% or more of the length of a reference
sequence is aligned for comparison purposes. The amino acid residues or
nucleotides at corresponding amino acid positions or nucleotide positions
are then compared. When a position in the first sequence is occupied by
the same amino acid residue or nucleotide as the corresponding position in
the second sequence, then the molecules are identical at that position (as
used herein, amino acid or nucleic acid "identity" is equivalent to amino
acid or nucleic acid "homology"). The percent identity between the two
sequences is a function of the number of identical positions shared by the
sequences, taking into account the number of gaps, and the length of each
gap, which need to be introduced for optimal alignment of the two
sequences.
The comparison of sequences and determination of percent identity between
two sequences can be accomplished using a mathematical algorithm
(Computational Molecular Biology, ed. A. M. Lesk, Oxford University Press,
New York [1988]; Biocomputing: Informatics and Genome Projects, ed. Smith,
D. W., Academic Press, New York [1993]; Computer Analysis of sequence
Data, Part 1, ed. A. M. Griffin, and H. G. Griffin, Humana Press, New
Jersey [1994]; Sequence Analysis in Molecular Biology, G. von Heinje,
Academic Press [1987]; and Sequence Analysis Primer, eds. M. Gribskov and
J. M. Devereux, Stockton Press, New York [1991]). In a preferred
embodiment, the percent identity between two amino acid sequences is
determined using the Needleman and Wunsch algorithm (J. Mol. Biol.
48:444-453 [1970]) which has been incorporated into the GAP program in the
GCG software package, using either a Blossom 62 matrix or a PAM250 matrix,
and a gap weight of 16, 14, 12, 10, 8, 6, or 4 and a length weight of 1,
2, 3, 4, 5, or 6.
In yet another preferred embodiment, the percent identity between two
nucleotide sequences is determined using the GAP program in the GCG
software package (J. Devereux et al., Nucleic Acids Res. 12[1]:387
[1984]), using a NWSgapdna.CMP matrix and a gap weight of 40, 50, 60, 70,
or 80 and a length weight of 1, 2, 3, 4, 5, or 6. In another embodiment,
the percent identity between two amino acid or nucleotide sequences is
determined using the algorithm of E. Myers and W. Miller (CABIOS, 4:11-17
[1989]) which has been incorporated into the ALIGN program (version 2.0),
using a PAM120 weight residue table, a gap length penalty of 12, and a gap
penalty of 4.
The nucleotide and amino acid sequences of the present invention can
further be used as a "query sequence" to perform a search against sequence
databases to, for example, identify other family members or related
sequences. Such searches can be performed using the NBLAST and XBLAST
programs (version 2.0) of Altschul et al. (J. Mol. Biol. 215:403-10
[1990]). BLAST nucleotide searches can be performed with the NBLAST
program, score=100, wordlength=12 to obtain nucleotide sequences
homologous to the nucleic acid molecules of the invention. BLAST protein
searches can be performed with the XBLAST program, score=50, wordlength=3
to obtain amino acid sequences homologous to the proteins of the
invention. To obtain gapped alignments for comparison purposes, Gapped
BLAST can be utilized as described in Altschul et al. (Nucleic Acids Res.
25[17]:3389-3402 [1997]). When utilizing BLAST and gapped BLAST programs,
the default parameters of the respective programs (e.g., XBLAST and NBLAST)
can be used. In addition to BLAST, examples of other search and sequence
comparison programs used in the art include, but are not limited to, FASTA
(Pearson, Methods Mol. Biol. 25, 365-389 [1994]) and KERR (Dufresne et
al., Nat. Biotechnol. 20[12]:1269-71 [December 2002]). For further
information regarding bioinformatics techniques, see Current Protocols in
Bioinformatics, John Wiley & Sons, Inc., New York.
The present invention further provides non-coding fragments of the nucleic
acid molecules disclosed in Table 1 and/or Table 2. Preferred non-coding
fragments include, but are not limited to, promoter sequences, enhancer
sequences, intronic sequences, 5' untranslated regions (UTRs), 3'
untranslated regions, gene modulating sequences and gene termination
sequences. Such fragments are useful, for example, in controlling
heterologous gene expression and in developing screens to identify
gene-modulating agents.
SNP Detection Reagents
In a specific aspect of the present invention, the SNPs disclosed in Table
1 and/or Table 2, and their associated transcript sequences (provided in
Table 1 as SEQ ID NOS:1-5), genomic sequences (provided in Table 2 as SEQ
ID NOS:16-19), and context sequences (transcript-based context sequences
are provided in Table 1 as SEQ ID NOS:11-15; genomic-based context
sequences are provided in Table 2 as SEQ ID NOS:20-31), can be used for
the design of SNP detection reagents. As used herein, a "SNP detection
reagent" is a reagent that specifically detects a specific target SNP
position disclosed herein, and that is preferably specific for a
particular nucleotide (allele) of the target SNP position (i.e., the
detection reagent preferably can differentiate between different
alternative nucleotides at a target SNP position, thereby allowing the
identity of the nucleotide present at the target SNP position to be
determined). Typically, such detection reagent hybridizes to a target SNP-containing
nucleic acid molecule by complementary base-pairing in a sequence specific
manner, and discriminates the target variant sequence from other nucleic
acid sequences such as an art-known form in a test sample. An example of a
detection reagent is a probe that hybridizes to a target nucleic acid
containing one or more of the SNPs provided in Table 1 and/or Table 2. In
a preferred embodiment, such a probe can differentiate between nucleic
acids having a particular nucleotide (allele) at a target SNP position
from other nucleic acids that have a different nucleotide at the same
target SNP position. In addition, a detection reagent may hybridize to a
specific region 5' and/or 3' to a SNP position, particularly a region
corresponding to the context sequences provided in Table 1 and/or Table 2
(transcript-based context sequences are provided in Table 1 as SEQ ID
NOS:11-15; genomic-based context sequences are provided in Table 2 as SEQ
ID NOS:20-31). Another example of a detection reagent is a primer which
acts as an initiation point of nucleotide extension along a complementary
strand of a target polynucleotide. The SNP sequence information provided
herein is also useful for designing primers, e.g. allele-specific primers,
to amplify (e.g., using PCR) any SNP of the present invention.
In one preferred embodiment of the invention, a SNP detection reagent is
an isolated or synthetic DNA or RNA polynucleotide probe or primer or PNA
oligomer, or a combination of DNA, RNA and/or PNA, that hybridizes to a
segment of a target nucleic acid molecule containing a SNP identified in
Table 1 and/or Table 2. A detection reagent in the form of a
polynucleotide may optionally contain modified base analogs, intercalators
or minor groove binders. Multiple detection reagents such as probes may
be, for example, affixed to a solid support (e.g., arrays or beads) or
supplied in solution (e.g., probe/primer sets for enzymatic reactions such
as PCR, RT-PCR, TaqMan assays, or primer-extension reactions) to form a
SNP detection kit.
A probe or primer typically is a substantially purified oligonucleotide or
PNA oligomer. Such oligonucleotide typically comprises a region of
complementary nucleotide sequence that hybridizes under stringent
conditions to at least about 8, 10, 12, 16, 18, 20, 22, 25, 30, 40, 50,
55, 60, 65, 70, 80, 90, 100, 120 (or any other number in-between) or more
consecutive nucleotides in a target nucleic acid molecule. Depending on
the particular assay, the consecutive nucleotides can either include the
target SNP position, or be a specific region in close enough proximity 5'
and/or 3' to the SNP position to carry out the desired assay.
Other preferred primer and probe sequences can readily be determined using
the transcript sequences (SEQ ID NOS:1-5), genomic sequences (SEQ ID
NOS:16-19), and SNP context sequences (transcript-based context sequences
are provided in Table 1 as SEQ ID NOS:11-15; genomic-based context
sequences are provided in Table 2 as SEQ ID NOS:20-31) disclosed in the
Sequence Listing and in Tables 1-2. It will be apparent to one of skill in
the art that such primers and probes are directly useful as reagents for
genotyping the SNPs of the present invention, and can be incorporated into
any kit/system format.
In order to produce a probe or primer specific for a target SNP-containing
sequence, the gene/transcript and/or context sequence surrounding the SNP
of interest is typically examined using a computer algorithm which starts
at the 5' or at the 3' end of the nucleotide sequence. Typical algorithms
will then identify oligomers of defined length that are unique to the
gene/SNP context sequence, have a GC content within a range suitable for
hybridization, lack predicted secondary structure that may interfere with
hybridization, and/or possess other desired characteristics or that lack
other undesired characteristics.
A primer or probe of the present invention is typically at least about 8
nucleotides in length. In one embodiment of the invention, a primer or a
probe is at least about 10 nucleotides in length. In a preferred
embodiment, a primer or a probe is at least about 12 nucleotides in
length. In a more preferred embodiment, a primer or probe is at least
about 16, 17, 18, 19, 20, 21, 22, 23, 24 or 25 nucleotides in length.
While the maximal length of a probe can be as long as the target sequence
to be detected, depending on the type of assay in which it is employed, it
is typically less than about 50, 60, 65, or 70 nucleotides in length. In
the case of a primer, it is typically less than about 30 nucleotides in
length. In a specific preferred embodiment of the invention, a primer or a
probe is within the length of about 18 and about 28 nucleotides. However,
in other embodiments, such as nucleic acid arrays and other embodiments in
which probes are affixed to a substrate, the probes can be longer, such as
on the order of 30-70, 75, 80, 90, 100, or more nucleotides in length (see
the section below entitled "SNP Detection Kits and Systems").
For analyzing SNPs, it may be appropriate to use oligonucleotides specific
for alternative SNP alleles. Such oligonucleotides which detect single
nucleotide variations in target sequences may be referred to by such terms
as "allele-specific oligonucleotides," "allele-specific probes," or
"allele-specific primers." The design and use of allele-specific probes
for analyzing polymorphisms is described in, e.g., Mutation Detection: A
Practical Approach, ed. Cotton et al., Oxford University Press [1998];
Saiki et al., Nature 324, 163-166 [1986]; Dattagupta, EP235,726; and
Saiki, WO 89/11548.
While the design of each allele-specific primer or probe depends on
variables such as the precise composition of the nucleotide sequences
flanking a SNP position in a target nucleic acid molecule, and the length
of the primer or probe, another factor in the use of primers and probes is
the stringency of the condition under which the hybridization between the
probe or primer and the target sequence is performed. Higher stringency
conditions utilize buffers with lower ionic strength and/or a higher
reaction temperature, and tend to require a more perfect match between
probe/primer and a target sequence in order to form a stable duplex. If
the stringency is too high, however, hybridization may not occur at all.
In contrast, lower stringency conditions utilize buffers with higher ionic
strength and/or a lower reaction temperature, and permit the formation of
stable duplexes with more mismatched bases between a probe/primer and a
target sequence. By way of example and not limitation, exemplary
conditions for high stringency hybridization conditions using an
allele-specific probe are as follows: Prehybridization with a solution
containing 5.times. standard saline phosphate EDTA (SSPE), 0.5%
NaDodSO.sub.4 (SDS) at 55.degree. C., and incubating probe with target
nucleic acid molecules in the same solution at the same temperature,
followed by washing with a solution containing 2.times.SSPE, and 0.1% SDS
at 55.degree. C. or room temperature.
Moderate stringency hybridization conditions may be used for
allele-specific primer extension reactions with a solution containing,
e.g., about 50 mM KCl at about 46.degree. C. Alternatively, the reaction
may be carried out at an elevated temperature such as 60.degree. C. In
another embodiment, a moderately stringent hybridization condition
suitable for oligonucleotide ligation assay (OLA) reactions wherein two
probes are ligated if they are completely complementary to the target
sequence may utilize a solution of about 100 mM KCl at a temperature of
46.degree. C.
In a hybridization-based assay, allele-specific probes can be designed
that hybridize to a segment of target DNA from one individual but do not
hybridize to the corresponding segment from another individual due to the
presence of different polymorphic forms (e.g., alternative SNP
alleles/nucleotides) in the respective DNA segments from the two
individuals. Hybridization conditions should be sufficiently stringent
that there is a significant detectable difference in hybridization
intensity between alleles, and preferably an essentially binary response,
whereby a probe hybridizes to only one of the alleles or significantly
more strongly to one allele. While a probe may be designed to hybridize to
a target sequence that contains a SNP site such that the SNP site aligns
anywhere along the sequence of the probe, the probe is preferably designed
to hybridize to a segment of the target sequence such that the SNP site
aligns with a central position of the probe (e.g., a position within the
probe that is at least three nucleotides from either end of the probe).
This design of probe generally achieves good discrimination in
hybridization between different allelic forms.
In another embodiment, a probe or primer may be designed to hybridize to a
segment of target DNA such that the SNP aligns with either the 5'-most end
or the 3'-most end of the probe or primer. In a specific preferred
embodiment which is particularly suitable for use in a oligonucleotide
ligation assay (U.S. Pat. No. 4,988,617), the 3'-most nucleotide of the
probe aligns with the SNP position in the target sequence.
Oligonucleotide probes and primers may be prepared by methods well known
in the art. Chemical synthetic methods include, but are not limited to,
the phosphotriester method described by Narang et al., Methods in
Enzymology 68:90 [1979]; the phosphodiester method described by Brown et
al., Methods in Enzymology 68:109 [1979], the diethylphosphoamidate method
described by Beaucage et al., Tetrahedron Letters 22:1859 [1981]; and the
solid support method described in U.S. Pat. No. 4,458,066.
Allele-specific probes are often used in pairs (or, less commonly, in sets
of 3 or 4, such as if a SNP position is known to have 3 or 4 alleles,
respectively, or to assay both strands of a nucleic acid molecule for a
target SNP allele), and such pairs may be identical except for a one
nucleotide mismatch that represents the allelic variants at the SNP
position. Commonly, one member of a pair perfectly matches a reference
form of a target sequence that has a more common SNP allele (i.e., the
allele that is more frequent in the target population) and the other
member of the pair perfectly matches a form of the target sequence that
has a less common SNP allele (i.e., the allele that is rarer in the target
population). In the case of an array, multiple pairs of probes can be
immobilized on the same support for simultaneous analysis of multiple
different polymorphisms.
In one type of PCR-based assay, an allele-specific primer hybridizes to a
region on a target nucleic acid molecule that overlaps a SNP position and
only primes amplification of an allelic form to which the primer exhibits
perfect complementarity (Gibbs, Nucleic Acid Res. 17:2427-2448 [1989]).
Typically, the primer's 3'-most nucleotide is aligned with and
complementary to the SNP position of the target nucleic acid molecule.
This primer is used in conjunction with a second primer that hybridizes at
a distal site. Amplification proceeds from the two primers, producing a
detectable product that indicates which allelic form is present in the
test sample. A control is usually performed with a second pair of primers,
one of which shows a single base mismatch at the polymorphic site and the
other of which exhibits perfect complementarity to a distal site. The
single-base mismatch prevents amplification or substantially reduces
amplification efficiency, so that either no detectable product is formed
or it is formed in lower amounts or at a slower pace. The method generally
works most effectively when the mismatch is at the 3'-most position of the
oligonucleotide (i.e., the 3'-most position of the oligonucleotide aligns
with the target SNP position) because this position is most destabilizing
to elongation from the primer (see, e.g., WO 93/22456). This PCR-based
assay can be utilized as part of the TaqMan assay, described below.
In a specific embodiment of the invention, a primer of the invention
contains a sequence substantially complementary to a segment of a target
SNP-containing nucleic acid molecule except that the primer has a
mismatched nucleotide in one of the three nucleotide positions at the
3'-most end of the primer, such that the mismatched nucleotide does not
base pair with a particular allele at the SNP site. In a preferred
embodiment, the mismatched nucleotide in the primer is the second from the
last nucleotide at the 3'-most position of the primer. In a more preferred
embodiment, the mismatched nucleotide in the primer is the last nucleotide
at the 3'-most position of the primer.
In another embodiment of the invention, a SNP detection reagent of the
invention is labeled with a fluorogenic reporter dye that emits a
detectable signal. While the preferred reporter dye is a fluorescent dye,
any reporter dye that can be attached to a detection reagent such as an
oligonucleotide probe or primer is suitable for use in the invention. Such
dyes include, but are not limited to, Acridine, AMCA, BODIPY, Cascade
Blue, Cy2, Cy3, Cy5, Cy7, Dabcyl, Edans, Eosin, Erythrosin, Fluorescein,
6-Fam, Tet, Joe, Hex, Oregon Green, Rhodamine, Rhodol Green, Tamra, Rox,
and Texas Red.
In yet another embodiment of the invention, the detection reagent may be
further labeled with a quencher dye such as Tamra, especially when the
reagent is used as a self-quenching probe such as a TaqMan (U.S. Pat. Nos.
5,210,015 and 5,538,848) or Molecular Beacon probe (U.S. Pat. Nos.
5,118,801 and 5,312,728), or other stemless or linear beacon probe (Livak
et al., PCR Method Appl. 4:357-362 [1995]; Tyagi et al., Nature
Biotechnology 14:303-308 [1996]; Nazarenko et al., Nucl. Acids Res.
25:2516-2521 [1997]; U.S. Pat. Nos. 5,866,336 and 6,117,635).
The detection reagents of the invention may also contain other labels,
including but not limited to, biotin for streptavidin binding, hapten for
antibody binding, and oligonucleotide for binding to another complementary
oligonucleotide such as pairs of zipcodes.
The present invention also contemplates reagents that do not contain (or
that are complementary to) a SNP nucleotide identified herein but that are
used to assay one or more SNPs disclosed herein. For example, primers that
flank, but do not hybridize directly to a target SNP position provided
herein are useful in primer extension reactions in which the primers
hybridize to a region adjacent to the target SNP position (i.e., within
one or more nucleotides from the target SNP site). During the primer
extension reaction, a primer is typically not able to extend past a target
SNP site if a particular nucleotide (allele) is present at that target SNP
site, and the primer extension product can be detected in order to
determine which SNP allele is present at the target SNP site. For example,
particular ddNTPs are typically used in the primer extension reaction to
terminate primer extension once a ddNTP is incorporated into the extension
product (a primer extension product which includes a ddNTP at the 3'-most
end of the primer extension product, and in which the ddNTP is a
nucleotide of a SNP disclosed herein, is a composition that is
specifically contemplated by the present invention). Thus, reagents that
bind to a nucleic acid molecule in a region adjacent to a SNP site and
that are used for assaying the SNP site, even though the bound sequences
do not necessarily include the SNP site itself, are also contemplated by
the present invention.
SNP Detection Kits and Systems
A person skilled in the art will recognize that, based on the SNP and
associated sequence information disclosed herein, detection reagents can
be developed and used to assay any SNP of the present invention
individually or in combination, and such detection reagents can be readily
incorporated into one of the established kit or system formats which are
well known in the art. The terms "kits" and "systems", as used herein in
the context of SNP detection reagents, are intended to refer to such
things as combinations of multiple SNP detection reagents, or one or more
SNP detection reagents in combination with one or more other types of
elements or components (e.g., other types of biochemical reagents,
containers, packages such as packaging intended for commercial sale,
substrates to which SNP detection reagents are attached, electronic
hardware components, etc.). Accordingly, the present invention further
provides SNP detection kits and systems, including but not limited to,
packaged probe and primer sets (e.g., TaqMan probe/primer sets),
arrays/microarrays of nucleic acid molecules, and beads that contain one
or more probes, primers, or other detection reagents for detecting one or
more SNPs of the present invention. The kits/systems can optionally
include various electronic hardware components; for example, arrays ("DNA
chips") and microfluidic systems ("lab-on-a-chip" systems) provided by
various manufacturers typically comprise hardware components. Other
kits/systems (e.g., probe/primer sets) may not include electronic hardware
components, but may be comprised of, for example, one or more SNP
detection reagents (along with, optionally, other biochemical reagents)
packaged in one or more containers.
In some embodiments, a SNP detection kit typically contains one or more
detection reagents and other components (e.g., a buffer, enzymes such as
DNA polymerases or ligases, chain extension nucleotides such as
deoxynucleotide triphosphates, and in the case of Sanger-type DNA
sequencing reactions, chain terminating nucleotides, positive control
sequences, negative control sequences, and the like) necessary to carry
out an assay or reaction, such as amplification and/or detection of a SNP-containing
nucleic acid molecule. A kit may further contain means for determining the
amount of a target nucleic acid, and means for comparing the amount with a
standard, and can comprise instructions for using the kit to detect the
SNP-containing nucleic acid molecule of interest. In one embodiment of the
present invention, kits are provided which contain the necessary reagents
to carry out one or more assays to detect one or more SNPs disclosed
herein. In a preferred embodiment of the present invention, SNP detection
kits/systems are in the form of nucleic acid arrays, or compartmentalized
kits, including microfluidic/lab-on-a-chip systems.
SNP detection kits/systems may contain, for example, one or more probes,
or pairs of probes, that hybridize to a nucleic acid molecule at or near
each target SNP position. Multiple pairs of allele-specific probes may be
included in the kit/system to simultaneously assay large numbers of SNPs,
at least one of which is a SNP of the present invention. In some
kits/systems, the allele-specific probes are immobilized to a substrate
such as an array or bead. For example, the same substrate can comprise
allele-specific probes for detecting at least 1; 10; 100; 1000; 10,000;
100,000 (or any other number in-between) or substantially all of the SNPs
shown in Table 1 and/or Table 2.
The terms "arrays," "microarrays," and "DNA chips" are used herein
interchangeably to refer to an array of distinct polynucleotides affixed
to a substrate, such as glass, plastic, paper, nylon or other type of
membrane, filter, chip, or any other suitable solid support. The
polynucleotides can be synthesized directly on the substrate, or
synthesized separate from the substrate and then affixed to the substrate.
In one embodiment, the microarray is prepared and used according to the
methods described in U.S. Pat. No. 5,837,832 (Chee et al.), PCT
application WO95/11995 (Chee et al.), Lockhart, D. J. et al. (Nat.
Biotech. 14:1675-1680 [1996]) and Schena, M. et al. (Proc. Natl. Acad.
Sci. 93:10614-10619 [1996]), all of which are incorporated herein in their
entirety by reference. In other embodiments, such arrays are produced by
the methods described by Brown et al., U.S. Pat. No. 5,807,522.
Nucleic acid arrays are reviewed in the following references: Zammatteo et
al., "New chips for molecular biology and diagnostics," Biotechnol. Annu.
Rev. 8:85-101 (2002); Sosnowski et al., "Active microelectronic array
system for DNA hybridization, genotyping and pharmacogenomic
applications," Psychiatr. Genet. 12(4): 181-92 (December 2002); Heller,
"DNA microarray technology: devices, systems, and applications," Annu.
Rev. Biomed. Eng. 4:129-53 (2002); Epub Mar. 22 2002; Kolchinsky et al.,
"Analysis of SNPs and other genomic variations using gel-based chips,"
Hum. Mutat. 19(4):343-60 (April 2002); and McGall et al., "High-density
genechip oligonucleotide probe arrays," Adv. Biochem. Eng. Biotechnol.
77:21-42 (2002).
Any number of probes, such as allele-specific probes, may be implemented
in an array, and each probe or pair of probes can hybridize to a different
SNP position. In the case of polynucleotide probes, they can be
synthesized at designated areas (or synthesized separately and then
affixed to designated areas) on a substrate using a light-directed
chemical process. Each DNA chip can contain, for example, thousands to
millions of individual synthetic polynucleotide probes arranged in a
grid-like pattern and miniaturized (e.g., to the size of a dime).
Preferably, probes are attached to a solid support in an ordered,
addressable array.
A microarray can be composed of a large number of unique, single-stranded
polynucleotides, usually either synthetic antisense polynucleotides or
fragments of cDNAs, fixed to a solid support. Typical polynucleotides are
preferably about 6-60 nucleotides in length, more preferably about 15-30
nucleotides in length, and most preferably about 18-25 nucleotides in
length. For certain types of microarrays or other detection kits/systems,
it may be preferable to use oligonucleotides that are only about 7-20
nucleotides in length. In other types of arrays, such as arrays used in
conjunction with chemiluminescent detection technology, preferred probe
lengths can be, for example, about 15-80 nucleotides in length, preferably
about 50-70 nucleotides in length, more preferably about 55-65 nucleotides
in length, and most preferably about 60 nucleotides in length. The
microarray or detection kit can contain polynucleotides that cover the
known 5' or 3' sequence of a gene/transcript or target SNP site,
sequential polynucleotides that cover the full-length sequence of a
gene/transcript; or unique polynucleotides selected from particular areas
along the length of a target gene/transcript sequence, particularly areas
corresponding to one or more SNPs disclosed in Table 1 and/or Table 2.
Polynucleotides used in the microarray or detection kit can be specific to
a SNP or SNPs of interest (e.g., specific to a particular SNP allele at a
target SNP site, or specific to particular SNP alleles at multiple
different SNP sites), or specific to a polymorphic gene/transcript or
genes/transcripts of interest.
Hybridization assays based on polynucleotide arrays rely on the
differences in hybridization stability of the probes to perfectly matched
and mismatched target sequence variants. For SNP genotyping, it is
generally preferable that stringency conditions used in hybridization
assays are high enough such that nucleic acid molecules that differ from
one another at as little as a single SNP position can be differentiated
(e.g., typical SNP hybridization assays are designed so that hybridization
will occur only if one particular nucleotide is present at a SNP position,
but will not occur if an alternative nucleotide is present at that SNP
position). Such high stringency conditions may be preferable when using,
for example, nucleic acid arrays of allele-specific probes for SNP
detection. Such high stringency conditions are described in the preceding
section, and are well known to those skilled in the art and can be found
in, for example, Current Protocols in Molecular Biology 6.3.1-6.3.6, John
Wiley & Sons, New York (1989).
In other embodiments, the arrays are used in conjunction with
chemiluminescent detection technology. The following patents and patent
applications, which are all hereby incorporated by reference, provide
additional information pertaining to chemiluminescent detection: U.S.
patent application Ser. Nos. 10/620,332 and 10/620,333 describe
chemiluminescent approaches for microarray detection; U.S. Pat. Nos.
6,124,478, 6,107,024, 5,994,073, 5,981,768, 5,871,938, 5,843,681,
5,800,999, and 5,773,628 describe methods and compositions of dioxetane
for performing chemiluminescent detection; and U.S. published application
US2002/0110828 discloses methods and compositions for microarray controls.
In one embodiment of the invention, a nucleic acid array can comprise an
array of probes of about 15-25 nucleotides in length. In further
embodiments, a nucleic acid array can comprise any number of probes, in
which at least one probe is capable of detecting one or more SNPs
disclosed in Table 1 and/or Table 2, and/or at least one probe comprises a
fragment of one of the sequences selected from the group consisting of
those disclosed in Table 1, Table 2, the Sequence Listing, and sequences
complementary thereto, said fragment comprising at least about 8
consecutive nucleotides, preferably 10, 12, 15, 16, 18, 20, more
preferably 22, 25, 30, 40, 47, 50, 55, 60, 65, 70, 80, 90, 100, or more
consecutive nucleotides (or any other number in-between) and containing
(or being complementary to) a novel SNP allele disclosed in Table 1 and/or
Table 2. In some embodiments, the nucleotide complementary to the SNP site
is within 5, 4, 3, 2, or 1 nucleotide from the center of the probe, more
preferably at the center of said probe.
A polynucleotide probe can be synthesized on the surface of the substrate
by using a chemical coupling procedure and an ink jet application
apparatus, as described in PCT application W095/251116 (Baldeschweiler et
al.) which is incorporated herein in its entirety by reference. In another
aspect, a "gridded" array analogous to a dot (or slot) blot may be used to
arrange and link cDNA fragments or oligonucleotides to the surface of a
substrate using a vacuum system, thermal, UV, mechanical or chemical
bonding procedures. An array, such as those described above, may be
produced by hand or by using available devices (slot blot or dot blot
apparatus), materials (any suitable solid support), and machines
(including robotic instruments), and may contain 8, 24, 96, 384, 1536,
6144 or more polynucleotides, or any other number which lends itself to
the efficient use of commercially available instrumentation.
Using such arrays or other kits/systems, the present invention provides
methods of identifying the SNPs disclosed herein in a test sample. Such
methods typically involve incubating a test sample of nucleic acids with
an array comprising one or more probes corresponding to at least one SNP
position of the present invention, and assaying for binding of a nucleic
acid from the test sample with one or more of the probes. Conditions for
incubating a SNP detection reagent (or a kit/system that employs one or
more such SNP detection reagents) with a test sample vary. Incubation
conditions depend on such factors as the format employed in the assay, the
detection methods employed, and the type and nature of the detection
reagents used in the assay. One skilled in the art will recognize that any
one of the commonly available hybridization, amplification and array assay
formats can readily be adapted to detect the SNPs disclosed herein.
A SNP detection kit/system of the present invention may include components
that are used to prepare nucleic acids from a test sample for the
subsequent amplification and/or detection of a SNP-containing nucleic acid
molecule. Such sample preparation components can be used to produce
nucleic acid extracts (including DNA and/or RNA), proteins or membrane
extracts from any bodily fluids (such as blood, serum, plasma, urine,
saliva, phlegm, gastric juices, semen, tears, sweat, etc.), skin, hair,
cells (especially nucleated cells), biopsies, buccal swabs or tissue
specimens. The test samples used in the above-described methods will vary
based on such factors as the assay format, nature of the detection method,
and the specific tissues, cells or extracts used as the test sample to be
assayed. Methods of preparing nucleic acids, proteins, and cell extracts
are well known in the art and can be readily adapted to obtain a sample
that is compatible with the system utilized. Automated sample preparation
systems for extracting nucleic acids from a test sample are commercially
available, and examples are Qiagen's BioRobot 9600, Applied Biosystems'
PRISM 6700, and Roche Molecular Systems' COBAS AmpliPrep System.
Another form of kit contemplated by the present invention is a
compartmentalized kit. A compartmentalized kit includes any kit in which
reagents are contained in separate containers. Such containers include,
for example, small glass containers, plastic containers, strips of
plastic, glass or paper, or arraying material such as silica. Such
containers allow one to efficiently transfer reagents from one compartment
to another compartment such that the test samples and reagents are not
cross-contaminated, or from one container to another vessel not included
in the kit, and the agents or solutions of each container can be added in
a quantitative fashion from one compartment to another or to another
vessel. Such containers may include, for example, one or more containers
which will accept the test sample, one or more containers which contain at
least one probe or other SNP detection reagent for detecting one or more
SNPs of the present invention, one or more containers which contain wash
reagents (such as phosphate buffered saline, Tris-buffers, etc.), and one
or more containers which contain the reagents used to reveal the presence
of the bound probe or other SNP detection reagents. The kit can optionally
further comprise compartments and/or reagents for, for example, nucleic
acid amplification or other enzymatic reactions such as primer extension
reactions, hybridization, ligation, electrophoresis (preferably capillary
electrophoresis), mass spectrometry, and/or laser-induced fluorescent
detection. The kit may also include instructions for using the kit.
Exemplary compartmentalized kits include microfluidic devices known in the
art (see, e.g., Weigl et al., "Lab-on-a-chip for drug development," Adv.
Drug Deliv. Rev. 24, 55[3]:349-77 [February 2003]). In such microfluidic
devices, the containers may be referred to as, for example, microfluidic
"compartments," "chambers," or "channels."
Microfluidic devices, which may also be referred to as "lab-on-a-chip"
systems, biomedical micro-electro-mechanical systems (bioMEMs), or
multicomponent integrated systems, are exemplary kits/systems of the
present invention for analyzing SNPs. Such systems miniaturize and
compartmentalize processes such as probe/target hybridization, nucleic
acid amplification, and capillary electrophoresis reactions in a single
functional device. Such microfluidic devices typically utilize detection
reagents in at least one aspect of the system, and such detection reagents
may be used to detect one or more SNPs of the present invention. One
example of a microfluidic system is disclosed in U.S. Pat. No. 5,589,136,
which describes the integration of PCR amplification and capillary
electrophoresis in chips. Exemplary microfluidic systems comprise a
pattern of microchannels designed onto a glass, silicon, quartz, or
plastic wafer included on a microchip. The movements of the samples may be
controlled by electric, electroosmotic or hydrostatic forces applied
across different areas of the microchip to create functional microscopic
valves and pumps with no moving parts. Varying the voltage can be used as
a means to control the liquid flow at intersections between the
micro-machined channels and to change the liquid flow rate for pumping
across different sections of the microchip. See, for example, U.S. Pat.
Nos. 6,153,073, Dubrow et al., and 6,156,181, Parce et al.
For genotyping SNPs, an exemplary microfluidic system may integrate, for
example, nucleic acid amplification, primer extension, capillary
electrophoresis, and a detection method such as laser induced fluorescence
detection. In a first step of an exemplary process for using such an
exemplary system, nucleic acid samples are amplified, preferably by PCR.
Then, the amplification products are subjected to automated primer
extension reactions using ddNTPs (specific fluorescence for each ddNTP)
and the appropriate oligonucleotide primers to carry out primer extension
reactions which hybridize just upstream of the targeted SNP. Once the
extension at the 3' end is completed, the primers are separated from the
unincorporated fluorescent ddNTPs by capillary electrophoresis. The
separation medium used in capillary electrophoresis can be, for example,
polyacrylamide, polyethyleneglycol or dextran. The incorporated ddNTPs in
the single nucleotide primer extension products are identified by
laser-induced fluorescence detection. Such an exemplary microchip can be
used to process, for example, at least 96 to 384 samples, or more, in
parallel.
Uses of Nucleic Acid Molecules
The nucleic acid molecules of the present invention have a variety of
uses, especially in the diagnosis and treatment of Alzheimer's Disease.
For example, the nucleic acid molecules are useful as hybridization
probes, such as for genotyping SNPs in messenger RNA, transcript, cDNA,
genomic DNA, amplified DNA or other nucleic acid molecules, and for
isolating full-length cDNA and genomic clones encoding the variant
peptides disclosed in Table 1 as well as their orthologs.
A probe can hybridize to any nucleotide sequence along the entire length
of a nucleic acid molecule provided in Table 1 and/or Table 2. Preferably,
a probe of the present invention hybridizes to a region of a target
sequence that encompasses a SNP position indicated in Table 1 and/or Table
2. More preferably, a probe hybridizes to a SNP-containing target sequence
in a sequence-specific manner such that it distinguishes the target
sequence from other nucleotide sequences which vary from the target
sequence only by which nucleotide is present at the SNP site. Such a probe
is particularly useful for detecting the presence of a SNP-containing
nucleic acid in a test sample, or for determining which nucleotide
(allele) is present at a particular SNP site (i.e., genotyping the SNP
site).
A nucleic acid hybridization probe may be used for determining the
presence, level, form, and/or distribution of nucleic acid expression. The
nucleic acid whose level is determined can be DNA or RNA. Accordingly,
probes specific for the SNPs described herein can be used to assess the
presence, expression and/or gene copy number in a given cell, tissue, or
organism. These uses are relevant for diagnosis of disorders involving an
increase or decrease in gene expression relative to normal levels. In
vitro techniques for detection of mRNA include, for example, Northern blot
hybridizations and in situ hybridizations. In vitro techniques for
detecting DNA include Southern blot hybridizations and in situ
hybridizations (Sambrook and Russell, Molecular Cloning: A Laboratory
Manual, Cold Spring Harbor Press, Cold Spring Harbor, N.Y. [2000]).
Probes can be used as part of a diagnostic test kit for identifying cells
or tissues in which a variant protein is expressed, such as by measuring
the level of a variant protein-encoding nucleic acid (e.g., mRNA) in a
sample of cells from a subject or determining if a polynucleotide contains
a SNP of interest.
Thus, the nucleic acid molecules of the invention can be used as
hybridization probes to detect the SNPs disclosed herein, thereby
determining whether an individual with the polymorphisms is at risk for
Alzheimer's Disease or has developed early stage Alzheimer's Disease.
Detection of a SNP associated with a disease phenotype provides a
diagnostic tool for an active disease and/or genetic predisposition to the
disease.
Furthermore, the nucleic acid molecules of the invention are therefore
useful for detecting a gene (gene information is disclosed in Table 2, for
example) which contains a SNP disclosed herein and/or products of such
genes, such as expressed mRNA transcript molecules (transcript information
is disclosed in Table 1, for example), and are thus useful for detecting
gene expression. The nucleic acid molecules can optionally be implemented
in, for example, an array or kit format for use in detecting gene
expression.
The nucleic acid molecules of the invention are also useful as primers to
amplify any given region of a nucleic acid molecule, particularly a region
containing a SNP identified in Table 1 and/or Table 2.
The nucleic acid molecules of the invention are also useful for
constructing recombinant vectors (described in greater detail below). Such
vectors include expression vectors that express a portion of, or all of,
any of the variant peptide sequences provided in Table 1. Vectors also
include insertion vectors, used to integrate into another nucleic acid
molecule sequence, such as into the cellular genome, to alter in situ
expression of a gene and/or gene product. For example, an endogenous
coding sequence can be replaced via homologous recombination with all or
part of the coding region containing one or more specifically introduced
SNPs.
The nucleic acid molecules of the invention are also useful for expressing
antigenic portions of the variant proteins, particularly antigenic
portions that contain a variant amino acid sequence (e.g., an amino acid
substitution) caused by a SNP disclosed in Table 1 and/or Table 2.
The nucleic acid molecules of the invention are also useful for
constructing vectors containing a gene regulatory region of the nucleic
acid molecules of the present invention.
The nucleic acid molecules of the invention are also useful for designing
ribozymes corresponding to all, or a part, of an mRNA molecule expressed
from a SNP-containing nucleic acid molecule described herein.
The nucleic acid molecules of the invention are also useful for
constructing host cells expressing a part, or all, of the nucleic acid
molecules and variant peptides.
The nucleic acid molecules of the invention are also useful for
constructing transgenic animals expressing all, or a part, of the nucleic
acid molecules and variant peptides. The production of recombinant cells
and transgenic animals having nucleic acid molecules which contain the
SNPs disclosed in Table 1 and/or Table 2 allow, for example, effective
clinical design of treatment compounds and dosage regimens.
The nucleic acid molecules of the invention are also useful in assays for
drug screening to identify compounds that, for example, modulate nucleic
acid expression.
The nucleic acid molecules of the invention are also useful in gene
therapy in patients whose cells have aberrant gene expression. Thus,
recombinant cells, which include a patient's cells that have been
engineered ex vivo and returned to the patient, can be introduced into an
individual where the recombinant cells produce the desired protein to
treat the individual.
SNP Genotyping Methods
The process of determining which specific nucleotide (i.e., allele) is
present at each of one or more SNP positions, such as a SNP position in a
nucleic acid molecule disclosed in Table 1 and/or Table 2, is referred to
as SNP genotyping. The present invention provides methods of SNP
genotyping, such as for use in screening for Alzheimer's Disease or
related pathologies, or determining predisposition thereto, or determining
responsiveness to a form of treatment, or in genome mapping or SNP
association analysis, etc.
Nucleic acid samples can be genotyped to determine which allele(s) is/are
present at any given genetic region (e.g., SNP position) of interest by
methods well known in the art. The neighboring sequence can be used to
design SNP detection reagents such as oligonucleotide probes, which may
optionally be implemented in a kit format. Exemplary SNP genotyping
methods are described in Chen et al., "Single nucleotide polymorphism
genotyping: biochemistry, protocol, cost and throughput," Pharmacogenomics
J. 3(2):77-96 (2003); Kwok et al., "Detection of single nucleotide
polymorphisms," Curr. Issues Mol. Biol. 5(2):43-60 (April 2003); Shi,
"Technologies for individual genotyping: detection of genetic
polymorphisms in drug targets and disease genes," Am. J. Pharmacogenomics
2(3):197-205 (2002); and Kwok, "Methods for genotyping single nucleotide
polymorphisms," Annu. Rev. Genomics Hum. Genet. 2:235-58 (2001). Exemplary
techniques for high-throughput SNP genotyping are described in Marnellos,
"High-throughput SNP analysis for genetic association studies," Curr. Opin.
Drug Discov. Devel. 6(3):317-21 (May 2003). Common SNP genotyping methods
include, but are not limited to, TaqMan assays, molecular beacon assays,
nucleic acid arrays, allele-specific primer extension, allele-specific PCR,
arrayed primer extension, homogeneous primer extension assays, primer
extension with detection by mass spectrometry, pyrosequencing, multiplex
primer extension sorted on genetic arrays, ligation with rolling circle
amplification, homogeneous ligation, OLA (U.S. Pat. No. 4,988,167),
multiplex ligation reaction sorted on genetic arrays, restriction-fragment
length polymorphism, single base extension-tag assays, and the Invader
assay. Such methods may be used in combination with detection mechanisms
such as, for example, luminescence or chemiluminescence detection,
fluorescence detection, time-resolved fluorescence detection, fluorescence
resonance energy transfer, fluorescence polarization, mass spectrometry,
and electrical detection.
Various methods for detecting polymorphisms include, but are not limited
to, methods in which protection from cleavage agents is used to detect
mismatched bases in RNA/RNA or RNA/DNA duplexes (Myers et al., Science
230:1242 [1985]; Cotton et al., PNAS 85:4397 [1988]; and Saleeba et al.,
Meth. Enzymol. 217:286-295 [1992]), comparison of the electrophoretic
mobility of variant and wild type nucleic acid molecules (Orita et al.,
PNAS 86:2766 [1989]; Cotton et al., Mutat. Res. 285:125-144 [1993]; and
Hayashi et al., Genet. Anal. Tech. Appl. 9:73-79 [1992]), and assaying the
movement of polymorphic or wild-type fragments in polyacrylamide gels
containing a gradient of denaturant using denaturing gradient gel
electrophoresis (DGGE) (Myers et al., Nature 313:495 [1985]). Sequence
variations at specific locations can also be assessed by nuclease
protection assays such as RNase and S1 protection or chemical cleavage
methods.
In a preferred embodiment, SNP genotyping is performed using the TaqMan
assay, which is also known as the 5' nuclease assay (U.S. Pat. Nos.
5,210,015 and 5,538,848). The TaqMan assay detects the accumulation of a
specific amplified product during PCR. The TaqMan assay utilizes an
oligonucleotide probe labeled with a fluorescent reporter dye and a
quencher dye. The reporter dye is excited by irradiation at an appropriate
wavelength, it transfers energy to the quencher dye in the same probe via
a process called fluorescence resonance energy transfer (FRET). When
attached to the probe, the excited reporter dye does not emit a signal.
The proximity of the quencher dye to the reporter dye in the intact probe
maintains a reduced fluorescence for the reporter. The reporter dye and
quencher dye may be at the 5'-most and the 3'-most ends, respectively, or
vice versa. Alternatively, the reporter dye may be at the 5'- or 3'-most
end while the quencher dye is attached to an internal nucleotide, or vice
versa. In yet another embodiment, both the reporter and the quencher may
be attached to internal nucleotides at a distance from each other such
that fluorescence of the reporter is reduced.
During PCR, the 5' nuclease activity of DNA polymerase cleaves the probe,
thereby separating the reporter dye and the quencher dye and resulting in
increased fluorescence of the reporter. Accumulation of PCR product is
detected directly by monitoring the increase in fluorescence of the
reporter dye. The DNA polymerase cleaves the probe between the reporter
dye and the quencher dye only if the probe hybridizes to the target SNP-containing
template which is amplified during PCR, and the probe is designed to
hybridize to the target SNP site only if a particular SNP allele is
present.
Preferred TaqMan primer and probe sequences can readily be determined
using the SNP and associated nucleic acid sequence information provided
herein. A number of computer programs, such as Primer Express (Applied
Biosystems, Foster City, Calif.), can be used to rapidly obtain optimal
primer/probe sets. It will be apparent to one of skill in the art that
such primers and probes for detecting the SNPs of the present invention
are useful in diagnostic assays for Alzheimer's Disease and related
pathologies, and can be readily incorporated into a kit format. The
present invention also includes modifications of the Taqman assay well
known in the art such as the use of Molecular Beacon probes (U.S. Pat.
Nos. 5,118,801 and 5,312,728) and other variant formats (U.S. Pat. Nos.
5,866,336 and 6,117,635).
Another preferred method for genotyping the SNPs of the present invention
is the use of two oligonucleotide probes in an OLA (see, e.g., U.S. Pat.
No. 4,988,617). In this method, one probe hybridizes to a segment of a
target nucleic acid with its 3'-most end aligned with the SNP site. A
second probe hybridizes to an adjacent segment of the target nucleic acid
molecule directly 3' to the first probe. The two juxtaposed probes
hybridize to the target nucleic acid molecule, and are ligated in the
presence of a linking agent such as a ligase if there is perfect
complementarity between the 3' most nucleotide of the first probe with the
SNP site. If there is a mismatch, ligation would not occur. After the
reaction, the ligated probes are separated from the target nucleic acid
molecule, and detected as indicators of the presence of a SNP.
The following patents, patent applications, and published international
patent applications, which are all hereby incorporated by reference,
provide additional information pertaining to techniques for carrying out
various types of OLA: U.S. Pat. Nos. 6,027,889, 6,268,148, 5,494,810,
5,830,711, and 6054564 describe OLA strategies for performing SNP
detection; WO 97/31256 and WO 00/56927 describe OLA strategies for
performing SNP detection using universal arrays, wherein a zipcode
sequence can be introduced into one of the hybridization probes, and the
resulting product, or amplified product, hybridized to a universal zip
code array; U.S. application Ser. No. 01/17329 (and Ser. No. 09/584,905)
describes OLA (or LDR) followed by PCR, wherein zipcodes are incorporated
into OLA probes, and amplified PCR products are determined by
electrophoretic or universal zipcode array readout; U.S. application
60/427,818, 60/445,636, and 60/445,494 describe SNPlex methods and
software for multiplexed SNP detection using OLA followed by PCR, wherein
zipcodes are incorporated into OLA probes, and amplified PCR products are
hybridized with a zipchute reagent, and the identity of the SNP determined
from electrophoretic readout of the zipchute. In some embodiments, OLA is
carried out prior to PCR (or another method of nucleic acid
amplification). In other embodiments, PCR (or another method of nucleic
acid amplification) is carried out prior to OLA.
Another method for SNP genotyping is based on mass spectrometry. Mass
spectrometry takes advantage of the unique mass of each of the four
nucleotides of DNA. SNPs can be unambiguously genotyped by mass
spectrometry by measuring the differences in the mass of nucleic acids
having alternative SNP alleles. MALDI-TOF (Matrix Assisted Laser
Desorption Ionization--Time of Flight) mass spectrometry technology is
preferred for extremely precise determinations of molecular mass, such as
SNPs. Numerous approaches to SNP analysis have been developed based on
mass spectrometry. Preferred mass spectrometry-based methods of SNP
genotyping include primer extension assays, which can also be utilized in
combination with other approaches, such as traditional gel-based formats
and microarrays.
Typically, the primer extension assay involves designing and annealing a
primer to a template PCR amplicon upstream (5') from a target SNP
position. A mix of dideoxynucleotide triphosphates (ddNTPs) and/or
deoxynucleotide triphosphates (dNTPs) are added to a reaction mixture
containing template (e.g., a SNP-containing nucleic acid molecule which
has typically been amplified, such as by PCR), primer, and DNA polymerase.
Extension of the primer terminates at the first position in the template
where a nucleotide complementary to one of the ddNTPs in the mix occurs.
The primer can be either immediately adjacent (i.e., the nucleotide at the
3' end of the primer hybridizes to the nucleotide next to the target SNP
site) or two or more nucleotides removed from the SNP position. If the
primer is several nucleotides removed from the target SNP position, the
only limitation is that the template sequence between the 3' end of the
primer and the SNP position cannot contain a nucleotide of the same type
as the one to be detected, or this will cause premature termination of the
extension primer. Alternatively, if all four ddNTPs alone, with no dNTPs,
are added to the reaction mixture, the primer will always be extended by
only one nucleotide, corresponding to the target SNP position. In this
instance, primers are designed to bind one nucleotide upstream from the
SNP position (i.e., the nucleotide at the 3' end of the primer hybridizes
to the nucleotide that is immediately adjacent to the target SNP site on
the 5' side of the target SNP site). Extension by only one nucleotide is
preferable, as it minimizes the overall mass of the extended primer,
thereby increasing the resolution of mass differences between alternative
SNP nucleotides. Furthermore, mass-tagged ddNTPs can be employed in the
primer extension reactions in place of unmodified ddNTPs. This increases
the mass difference between primers extended with these ddNTPs, thereby
providing increased sensitivity and accuracy, and is particularly useful
for typing heterozygous base positions. Mass-tagging also alleviates the
need for intensive sample-preparation procedures and decreases the
necessary resolving power of the mass spectrometer.
The extended primers can then be purified and analyzed by MALDI-TOF mass
spectrometry to determine the identity of the nucleotide present at the
target SNP position. In one method of analysis, the products from the
primer extension reaction are combined with light absorbing crystals that
form a matrix. The matrix is then hit with an energy source such as a
laser to ionize and desorb the nucleic acid molecules into the gas-phase.
The ionized molecules are then ejected into a flight tube and accelerated
down the tube towards a detector. The time between the ionization event,
such as a laser pulse, and collision of the molecule with the detector is
the time of flight of that molecule. The time of flight is precisely
correlated with the mass-to-charge ratio (m/z) of the ionized molecule.
Ions with smaller m/z travel down the tube faster than ions with larger
m/z and therefore the lighter ions reach the detector before the heavier
ions. The time-of-flight is then converted into a corresponding, and
highly precise, m/z. In this manner, SNPs can be identified based on the
slight differences in mass, and the corresponding time of flight
differences, inherent in nucleic acid molecules having different
nucleotides at a single base position. For further information regarding
the use of primer extension assays in conjunction with MALDI-TOF mass
spectrometry for SNP genotyping, see, e.g., Wise et al., "A standard
protocol for single nucleotide primer extension in the human genome using
matrix-assisted laser desorption/ionization time-of-flight mass
spectrometry," Rapid Commun. Mass Spectrom. 17(11):1195-202 (2003).
The following references provide further information describing mass
spectrometry-based methods for SNP genotyping: Bocker, "SNP and mutation
discovery using base-specific cleavage and MALDI-TOF mass spectrometry,"
Bioinformatics 19 Suppl 1:144-153 (July 2003); Storm et al., "MALDI-TOF
mass spectrometry-based SNP genotyping," Methods Mol. Biol. 212:241-62
(2003); Jurinke et al., "The use of MassARRAY technology for high
throughput genotyping," Adv. Biochem. Eng. Biotechnol. 77:57-74 (2002);
and Jurinke et al., "Automated genotyping using the DNA MassArray
technology," Methods Mol. Biol. 187:179-92 (2002).
SNPs can also be scored by direct DNA sequencing. A variety of automated
sequencing procedures can be utilized (Biotechniques 19:448 [1995]),
including sequencing by mass spectrometry (see, e.g., PCT International
Publication No. WO94/16101; Cohen et al., Adv. Chromatogr. 36:127-162
[1996]; and Griffin et al., Appl. Biochem. Biotechnol. 38:147-159 [1993]).
The nucleic acid sequences of the present invention enable one of ordinary
skill in the art to readily design sequencing primers for such automated
sequencing procedures. Commercial instrumentation, such as the Applied
Biosystems 377, 3100, 3700, 3730, and 3730.times.1 DNA Analyzers (Foster
City, Calif.), is commonly used in the art for automated sequencing.
Other methods that can be used to genotype the SNPs of the present
invention include single-strand conformational polymorphism (SSCP), and
denaturing gradient gel electrophoresis (DGGE) (Myers et al., Nature
313:495 [1985]). SSCP identifies base differences by alteration in
electrophoretic migration of single stranded PCR products, as described in
Orita et al., Proc. Nat. Acad. Single-stranded PCR products can be
generated by heating or otherwise denaturing double stranded PCR products.
Single-stranded nucleic acids may refold or form secondary structures that
are partially dependent on the base sequence. The different
electrophoretic mobilities of single-stranded amplification products are
related to base-sequence differences at SNP positions. DGGE differentiates
SNP alleles based on the different sequence-dependent stabilities and
melting properties inherent in polymorphic DNA and the corresponding
differences in electrophoretic migration patterns in a denaturing gradient
gel ("PCR Technology," Principles and Applications for DNA Amplification
Chapter 7, ed. Erlich, W.H. Freeman and Co., New York, [1992]).
Sequence-specific ribozymes (U.S. Pat. No. 5,498,531) can also be used to
score SNPs based on the development or loss of a ribozyme cleavage site.
Perfectly matched sequences can be distinguished from mismatched sequences
by nuclease cleavage digestion assays or by differences in melting
temperature. If the SNP affects a restriction enzyme cleavage site, the
SNP can be identified by alterations in restriction enzyme digestion
patterns, and the corresponding changes in nucleic acid fragment lengths
determined by gel electrophoresis.
SNP genotyping can include the steps of, for example, collecting a
biological sample from a human subject (e.g., sample of tissues, cells,
fluids, secretions, etc.), isolating nucleic acids (e.g., genomic DNA,
mRNA or both) from the cells of the sample, contacting the nucleic acids
with one or more primers which specifically hybridize to a region of the
isolated nucleic acid containing a target SNP under conditions such that
hybridization and amplification of the target nucleic acid region occurs,
and determining the nucleotide present at the SNP position of interest,
or, in some assays, detecting the presence or absence of an amplification
product (assays can be designed so that hybridization and/or amplification
will only occur if a particular SNP allele is present or absent). In some
assays, the size of the amplification product is detected and compared to
the length of a control sample; for example, deletions and insertions can
be detected by a change in size of the amplified product compared to a
normal genotype.
SNP genotyping is useful for numerous practical applications, as described
below. Examples of such applications include, but are not limited to, SNP-disease
association analysis, disease predisposition screening, disease diagnosis,
disease prognosis, disease progression monitoring, determining therapeutic
strategies based on an individual's genotype ("pharmacogenomics"),
developing therapeutic agents based on SNP genotypes associated with a
disease or likelihood of responding to a drug, stratifying a patient
population for clinical trial for a treatment regimen, predicting the
likelihood that an individual will experience toxic side effects from a
therapeutic agent, and human identification applications such as
forensics.
Analysis of Genetic Association Between SNPs and Phenotypic Traits
SNP genotyping for disease diagnosis, disease predisposition screening,
disease prognosis, determining drug responsiveness (pharmacogenomics),
drug toxicity screening, and other uses described herein, typically relies
on initially establishing a genetic association between one or more
specific SNPs and the particular phenotypic traits of interest.
Different study designs may be used for genetic association studies
(Modern Epidemiology 609-622, Lippincott Williams & Wilkins [1998]).
Observational studies are most frequently carried out in which the
response of the patients is not interfered with. The first type of
observational study identifies a sample of persons in whom the suspected
cause of the disease is present and another sample of persons in whom the
suspected cause is absent, and then the frequency of development of
disease in the two samples is compared. These sampled populations are
called cohorts, and the study is a prospective study. The other type of
observational study is case-control or a retrospective study. In typical
case-control studies, samples are collected from individuals with the
phenotype of interest (cases) such as certain manifestations of a disease,
and from individuals without the phenotype (controls) in a population
(target population) that conclusions are to be drawn from. Then the
possible causes of the disease are investigated retrospectively. As the
time and costs of collecting samples in case-control studies are
considerably less than those for prospective studies, case-control studies
are the more commonly used study design in genetic association studies, at
least during the exploration and discovery stage.
In both types of observational studies, there may be potential confounding
factors that should be taken into consideration. Confounding factors are
those that are associated with both the real cause(s) of the disease and
the disease itself, and they include demographic information such as age,
gender, ethnicity as well as environmental factors. When confounding
factors are not matched in cases and controls in a study, and are not
controlled properly, spurious association results can arise. If potential
confounding factors are identified, they should be controlled for by
analysis methods explained below.
In a genetic association study, the cause of interest to be tested is a
certain allele or a SNP or a combination of alleles or a haplotype from
several SNPs. Thus, tissue specimens (e.g., whole blood) from the sampled
individuals may be collected and genomic DNA genotyped for the SNP(s) of
interest. In addition to the phenotypic trait of interest, other
information such as demographic (e.g., age, gender, ethnicity, etc.),
clinical, and environmental information that may influence the outcome of
the trait can be collected to further characterize and define the sample
set. In many cases, these factors are known to be associated with diseases
and/or SNP allele frequencies. There are likely gene-environment and/or
gene-gene interactions as well. Analysis methods to address
gene-environment and gene-gene interactions (for example, the effects of
the presence of both susceptibility alleles at two different genes can be
greater than the effects of the individual alleles at two genes combined)
are discussed below.
After all the relevant phenotypic and genotypic information has been
obtained, statistical analyses are carried out to determine if there is
any significant correlation between the presence of an allele or a
genotype with the phenotypic characteristics of an individual. Preferably,
data inspection and cleaning are first performed before carrying out
statistical tests for genetic association. Epidemiological and clinical
data of the samples can be summarized by descriptive statistics with
tables and graphs. Data validation is preferably performed to check for
data completion, inconsistent entries, and outliers. Chi-squared tests and
t-tests (Wilcoxon rank-sum tests if distributions are not normal) may then
be used to check for significant differences between cases and controls
for discrete and continuous variables, respectively. To ensure genotyping
quality, Hardy-Weinberg disequilibrium tests can be performed on cases and
controls separately. Significant deviation from Hardy-Weinberg equilibrium
(HWE) in both cases and controls for individual markers can be indicative
of genotyping errors. If HWE is violated in a majority of markers, it is
indicative of population substructure that should be further investigated.
Moreover, Hardy-Weinberg disequilibrium in cases only can indicate genetic
association of the markers with the disease (Genetic Data Analysis, Weir
B., Sinauer [1990]).
To test whether an allele of a single SNP is associated with the case or
control status of a phenotypic trait, one skilled in the art can compare
allele frequencies in cases and controls. Standard chi-squared tests and
Fisher exact tests can be carried out on a 2.times.2 table (2 SNP
alleles.times.2 outcomes in the categorical trait of interest). To test
whether genotypes of a SNP are associated, chi-squared tests can be
carried out on a 3.times.2 table (3 genotypes.times.2 outcomes). Score
tests are also carried out for genotypic association to contrast the three
genotypic frequencies (major homozygotes, heterozygotes and minor
homozygotes) in cases and controls, and to look for trends using 3
different modes of inheritance, namely dominant (with contrast
coefficients 2, -1, -1), additive (with contrast coefficients 1, 0, -1)
and recessive (with contrast coefficients 1, 1, -2). Odds ratios for minor
versus major alleles, and odds ratios for heterozygote and homozygote
variants versus the wild type genotypes are calculated with the desired
confidence limits, usually 95%.
In order to control for confounders and to test for interaction and effect
modifiers, stratified analyses may be performed using stratified factors
that are likely to be confounding, including demographic information such
as age, ethnicity, and gender, or an interacting element or effect
modifier, such as a known major gene (e.g., APOE for Alzheimer's Disease
or HLA genes for autoimmune diseases), or environmental factors such as
smoking in lung cancer. Stratified association tests may be carried out
using Cochran-Mantel-Haenszel tests that take into account the ordinal
nature of genotypes with 0, 1, and 2 variant alleles. Exact tests by
StatXact may also be performed when computationally possible. Another way
to adjust for confounding effects and test for interactions is to perform
stepwise multiple logistic regression analysis using statistical packages
such as SAS or R. Logistic regression is a model-building technique in
which the best fitting and most parsimonious model is built to describe
the relation between the dichotomous outcome (for instance, getting a
certain disease or not) and a set of independent variables (for instance,
genotypes of different associated genes, and the associated demographic
and environmental factors). The most common model is one in which the
logit transformation of the odds ratios is expressed as a linear
combination of the variables (main effects) and their cross-product terms
(interactions) (Applied Logistic Regression, Hosmer and Lemeshow, Wiley
[2000]). To test whether a certain variable or interaction is
significantly associated with the outcome, coefficients in the model are
first estimated and then tested for statistical significance of their
departure from zero.
In addition to performing association tests one marker at a time,
haplotype association analysis may also be performed to study a number of
markers that are closely linked together. Haplotype association tests can
have better power than genotypic or allelic association tests when the
tested markers are not the disease-causing mutations themselves but are in
linkage disequilibrium with such mutations. The test will even be more
powerful if the disease is indeed caused by a combination of alleles on a
haplotype (e.g., APOE is a haplotype formed by 2 SNPs that are very close
to each other). In order to perform haplotype association effectively,
marker-marker linkage disequilibrium measures, both D' and R.sup.2, are
typically calculated for the markers within a gene to elucidate the
haplotype structure. Recent studies (Daly et al., Nature Genetics 29,
232-235 [2001]) in linkage disequilibrium indicate that SNPs within a gene
are organized in block pattern, and a high degree of linkage
disequilibrium exists within blocks and very little linkage disequilibrium
exists between blocks. Haplotype association with the disease status can
be performed using such blocks once they have been elucidated.
Haplotype association tests can be carried out in a similar fashion as the
allelic and genotypic association tests. Each haplotype in a gene is
analogous to an allele in a multi-allelic marker. One skilled in the art
can either compare the haplotype frequencies in cases and controls or test
genetic association with different pairs of haplotypes. It has been
proposed (Schaid et al., Am. J. Hum. Genet. 70, 425-434 [2002]) that score
tests can be done on haplotypes using the program "haplo.score." In that
method, haplotypes are first inferred by EM algorithm and score tests are
carried out with a generalized linear model (GLM) framework that allows
the adjustment of other factors.
An important decision in the performance of genetic association tests is
the determination of the significance level at which significant
association can be declared when the P value of the tests reaches that
level. In an exploratory analysis where positive hits will be followed up
in subsequent confirmatory testing, an unadjusted P value<0.1 (a
significance level on the lenient side) may be used for generating
hypotheses for significant association of a SNP with certain phenotypic
characteristics of a disease. It is preferred that a P value<0.05 (a
significance level traditionally used in the art) is achieved in order for
a SNP to be considered to have an association with a disease. It is more
preferred that a P value<0.01 (a significance level on the stringent side)
is achieved for an association to be declared. When hits are followed up
in confirmatory analyses in more samples of the same source or in
different samples from different sources, adjustment for multiple testing
will be performed as to avoid excess number of hits while maintaining the
experiment-wise error rates at 0.05. While there are different methods to
adjust for multiple testing to control for different kinds of error rates,
a commonly used but rather conservative method is Bonferroni correction to
control the experiment-wise or family-wise error rate (Westfall et al.,
Multiple comparisons and multiple tests, SAS Institute [1999]).
Permutation tests to control for the false discovery rates, FDR, can be
more powerful (Benjamini and Hochberg, Journal of the Royal Statistical
Society Series B 57, 1289-1300 [1995], Resampling-based Multiple Testing,
Westfall and Young, Wiley [1993]). Such methods to control for
multiplicity would be preferred when the tests are dependent and
controlling for false discovery rates is sufficient as opposed to
controlling for the experiment-wise error rates.
In replication studies using samples from different populations after
statistically significant markers have been identified in the exploratory
stage, meta-analyses can then be performed by combining evidence of
different studies (Modern Epidemiology 643-673, Lippincott Williams &
Wilkins [1998]). If available, association results known in the art for
the same SNPs can be included in the meta-analyses.
Since both genotyping and disease status classification can involve
errors, sensitivity analyses may be performed to see how odds ratios and P
values would change upon various estimates on genotyping and disease
classification error rates.
It has been well known that subpopulation-based sampling bias between
cases and controls can lead to spurious results in case-control
association studies (Ewens and Spielman, Am. J. Hum. Genet. 62, 450-458
[1995]) when prevalence of the disease is associated with different
subpopulation groups. Such bias can also lead to a loss of statistical
power in genetic association studies. To detect population stratification,
Pritchard and Rosenberg (Pritchard et al., Am. J. Hum. Gen. 65:220-228
[1999]) suggested typing markers that are unlinked to the disease and
using results of association tests on those markers to determine whether
there is any population stratification. When stratification is detected,
the genomic control (GC) method as proposed by Devlin and Roeder (Devlin
et al., Biometrics 55:997-1004 [1999]) can be used to adjust for the
inflation of test statistics due to population stratification. GC method
is robust to changes in population structure levels as well as being
applicable to DNA pooling designs (Devlin et al., Genet. Epidem.
21:273-284 [2001]).
While Pritchard's method recommended using 15-20 unlinked microsatellite
markers, it suggested using more than 30 biallelic markers to get enough
power to detect population stratification. For the GC method, it has been
shown (Bacanu et al., Am. J. Hum. Genet. 66:1933-1944 [2000]) that about
60-70 biallelic markers are sufficient to estimate the inflation factor
for the test statistics due to population stratification. Hence, 70
intergenic SNPs can be chosen in unlinked regions as indicated in a genome
scan (Kehoe et al., Hum. Mol. Genet. 8:237-245 [1999]).
Once individual risk factors, genetic or non-genetic, have been found for
the predisposition to disease, the next step is to set up a
classification/prediction scheme to predict the category (for instance,
disease or no-disease) that an individual will be in depending on his
genotypes of associated SNPs and other non-genetic risk factors. Logistic
regression for discrete trait and linear regression for continuous trait
are standard techniques for such tasks (Applied Regression Analysis,
Draper and Smith, Wiley [1998]). Moreover, other techniques can also be
used for setting up classification. Such techniques include, but are not
limited to, MART, CART, neural network, and discriminant analyses that are
suitable for use in comparing the performance of different methods (The
Elements of Statistical Learning, Hastie, Tibshirani & Friedman, Springer
[2002]).
Disease Diagnosis and Predisposition Screening
Information on association/correlation between genotypes and
disease-related phenotypes can be exploited in several ways. For example,
in the case of a highly statistically significant association between one
or more SNPs with predisposition to a disease for which treatment is
available, detection of such a genotype pattern in an individual may
justify immediate administration of treatment, or at least the institution
of regular monitoring of the individual. Detection of the susceptibility
alleles associated with serious disease in a couple contemplating having
children may also be valuable to the couple in their reproductive
decisions. In the case of a weaker but still statistically significant
association between a SNP and a human disease, immediate therapeutic
intervention or monitoring may not be justified after detecting the
susceptibility allele or SNP. Nevertheless, the subject can be motivated
to begin simple life-style changes (e.g., diet, exercise) that can be
accomplished at little or no cost to the individual but would confer
potential benefits in reducing the risk of developing conditions for which
that individual may have an increased risk by virtue of having the
susceptibility allele(s).
The SNPs of the invention may contribute to Alzheimer's Disease in an
individual in different ways. Some polymorphisms occur within a protein
coding sequence and contribute to disease phenotype by affecting protein
structure. Other polymorphisms occur in noncoding regions but may exert
phenotypic effects indirectly via influence on, for example, replication,
transcription, and/or translation. A single SNP may affect more than one
phenotypic trait. Likewise, a single phenotypic trait may be affected by
multiple SNPs in different genes.
As used herein, the terms "diagnose," "diagnosis," and "diagnostics"
include, but are not limited to any of the following: detection of
Alzheimer's Disease that an individual may presently have,
predisposition/susceptibility screening (i.e., determining the increased
risk of an individual in developing Alzheimer's Disease in the future, or
determining whether an individual has a decreased risk of developing
Alzheimer's Disease in the future), determining a particular type or
subclass of Alzheimer's Disease in an individual known to have Alzheimer's
Disease, confirming or reinforcing a previously made diagnosis of
Alzheimer's Disease, pharmacogenomic evaluation of an individual to
determine which therapeutic strategy that individual is most likely to
positively respond to or to predict whether a patient is likely to respond
to a particular treatment, predicting whether a patient is likely to
experience toxic effects from a particular treatment or therapeutic
compound, and evaluating the future prognosis of an individual having
Alzheimer's Disease. Such diagnostic uses are based on the SNPs
individually or in a unique combination or SNP haplotypes of the present
invention.
Haplotypes are particularly useful in that, for example, fewer SNPs can be
genotyped to determine if a particular genomic region harbors a locus that
influences a particular phenotype, such as in linkage disequilibrium-based
SNP association analysis.
Linkage disequilibrium (LD) refers to the co-inheritance of alleles (e.g.,
alternative nucleotides) at two or more different SNP sites at frequencies
greater than would be expected from the separate frequencies of random
occurrence of each allele in a given population. The expected frequency of
co-occurrence of two alleles that are inherited independently is the
frequency of the first allele multiplied by the frequency of the second
allele. Alleles that co-occur at expected frequencies are said to be in
"linkage equilibrium" In contrast, LD refers to any non-random genetic
association between allele(s) at two or more different SNP sites, which is
generally due to the physical proximity of the two loci along a
chromosome. LD can occur when two or more SNPs sites are in close physical
proximity to each other on a given chromosome and therefore alleles at
these SNP sites will tend to remain unseparated for multiple generations,
with the consequence that a particular nucleotide (allele) at one SNP site
will show a non-random association with a particular nucleotide (allele)
at another SNP site located nearby. Hence, genotyping one of the SNP sites
will give almost the same information as genotyping the other SNP site
that is in LD. The physical area of the chromosome that contains SNPs in
LD with each other is referred to as an LD block.
Various degrees of LD can be encountered between two or more SNPs with the
result being that some SNPs are more closely associated (i.e., in stronger
LD) than others. Furthermore, the physical distance over which LD extends
along a chromosome differs between different regions of the genome, and
therefore the degree of physical separation between two or more SNP sites
necessary for LD to occur can differ between different regions of the
genome.
For diagnostic purposes and similar uses, if a particular SNP site is
found to be useful for diagnosing Alzheimer's Disease (e.g., has a
significant statistical association with the condition and/or is
recognized as a causative polymorphism for the condition), then the
skilled artisan would recognize that other SNP sites which are in LD with
this SNP site would also be useful for diagnosing the condition. Thus,
polymorphisms (e.g., SNPs and/or haplotypes) that are not the actual
disease-causing (causative) polymorphisms, but are in LD with such
causative polymorphisms, are also useful. In such instances, the genotype
of the polymorphism(s) that is/are in LD with the causative polymorphism
is predictive of the genotype of the causative polymorphism and,
consequently, predictive of the phenotype (e.g., Alzheimer's Disease) that
is influenced by the causative SNP(s). Therefore, polymorphic markers that
are in LD with causative polymorphisms are useful as diagnostic markers,
and are particularly useful when the actual causative polymorphism(s)
is/are unknown.
Examples of polymorphisms that can be in LD with one or more causative
polymorphisms (and/or in LD with one or more polymorphisms that have a
significant statistical association with a condition) and therefore useful
for diagnosing the same condition that the causative/associated SNP(s) is
used to diagnose, include, for example, other SNPs in the same gene,
protein-coding, or mRNA transcript-coding region as the
causative/associated SNP, other SNPs in the same exon or same intron as
the causative/associated SNP, other SNPs in the same haplotype block as
the causative/associated SNP, other SNPs in the same intergenic region as
the causative/associated SNP, SNPs that are outside but near a gene (e.g.,
within 6 kb on either side, 5' or 3', of a gene boundary) that harbors a
causative/associated SNP, etc. Such useful LD SNPs can be selected from
among the SNPs disclosed in Tables 1-2, for example.
Linkage disequilibrium in the human genome is reviewed in: Wall et al., "Haplotype
blocks and linkage disequilibrium in the human genome", Nat Rev Genet.
2003 August; 4(8):587-97; Garner et al., "On selecting markers for
association studies: patterns of linkage disequilibrium between two and
three diallelic loci", Genet Epidemiol. 2003 January; 24(1):57-67; Ardlie
et al., "Patterns of linkage disequilibrium in the human genome", Nat Rev
Genet. 2002 April; 3(4):299-309 (erratum in Nat Rev Genet 2002 July;
3(7):566); and Remm et al., "High-density genotyping and linkage
disequilibrium in the human genome using chromosome 22 as a model"; Curr
Opin Chem. Biol. 2002 February; 6(1):24-30; Haldane J B S (1919) The
combination of linkage values, and the calculation of distances between
the loci of linked factors. J Genet 8:299-309; Mendel, G. (1866) Versuche
uber Pflanzen-Hybriden. Verhandlungen des naturforschenden Vereines in
Brunn [Proceedings of the Natural History Society of Brunn]; Lewin B
(1990) Genes IV Oxford University Press, New York, USA; Hartl D L and
Clark A G (1989) Principles of Population Genetics 2.sup.nd ed. Sinauer
Associates, Inc. Sunderland, Mass., USA; Gillespie J H (2004) Population
Genetics: A Concise Guide. 2.sup.nd ed. Johns Hopkins University Press.
USA; Lewontin R C (1964) The interaction of selection and linkage. I.
General considerations; heterotic models. Genetics 49:49-67; Hoel P G
(1954) Introduction to Mathematical Statistics 2.sup.nd ed. John Wiley &
Sons, Inc. New York, USA; Hudson R R (2001) Two-locus sampling
distributions and their application. Genetics 159:1805-1817; Dempster A P,
Laird N M, Rubin D B (1977) Maximum likelihood from incomplete data via
the EM algorithm. J R Stat Soc 39:1-38; Excoffier L, Slatkin M (1995)
Maximum-likelihood estimation of molecular haplotype frequencies in a
diploid population. Mol Biol Evol 12(5):921-927; Tregouet D A, Escolano S,
Tiret L, Mallet A, Golmard J L (2004) A new algorithm for haplotype-based
association analysis: the Stochastic-EM algorithm. Ann Hum Genet 68(Pt
2):165-177; Long A D and Langley C H (1999) The power of association
studies to detect the contribution of candidate genetic loci to variation
in complex traits. Genome Research 9:720-731; Agresti A (1990) Categorical
Data Analysis. John Wiley & Sons, Inc. New York, USA; Lange K (1997)
Mathematical and Statistical Methods for Genetic Analysis. Springer-Verlag
New York, Inc. New York, USA; The International HapMap Consortium (2003)
The International HapMap Project. Nature 426:789-796; The International
HapMap Consortium (2005) A haplotype map of the human genome. Nature
437:1299-1320; Thorisson G A, Smith A V, Krishnan L, Stein L D (2005), The
International HapMap Project Web Site. Genome Research 15:1591-1593;
McVean G, Spencer C C A, Chaix R (2005) Perspectives on human genetic
variation from the HapMap project. PLoS Genetics 1(4):413-418; Hirschhorn
J N, Daly M J (2005) Genome-wide association studies for common diseases
and complex traits. Nat Genet 6:95-108; Schrodi S J (2005) A probabilistic
approach to large-scale association scans: a semi-Bayesian method to
detect disease-predisposing alleles. SAGMB 4(1):31; Wang W Y S, Barratt B
J, Clayton D G, Todd J A (2005) Genome-wide association studies:
theoretical and practical concerns. Nat Rev Genet 6:109-118. Pritchard J
K, Przeworski M (2001) Linkage disequilibrium in humans: models and data.
Am J Hum Genet 69:1-14.
As discussed above, one aspect of the present invention is the discovery
that SNPs which are in certain LD distance with the interrogated SNP can
also be used as valid markers for identifying an increased or decreased
risks of having or developing VT. As used herein, the term "interrogated
SNP" refers to SNPs that have been found to be associated with an
increased or decreased risk of disease using genotyping results and
analysis, or other appropriate experimental method as exemplified in the
working examples described in this application. As used herein, the term
"LD SNP" refers to a SNP that has been characterized as a SNP associating
with an increased or decreased risk of diseases due to their being in LD
with the "interrogated SNP" under the methods of calculation described in
the application. Below, applicants describe the methods of calculation
with which one of ordinary skilled in the art may determine if a
particular SNP is in LD with an interrogated SNP. The parameter r.sup.2 is
commonly used in the genetics art to characterize the extent of linkage
disequilibrium between markers (Hudson, 2001). As used herein, the term
"in LD with" refers to a particular SNP that is measured at above the
threshold of a parameter such as r.sup.2 with an interrogated SNP.
It is now common place to directly observe genetic variants in a sample of
chromosomes obtained from a population. Suppose one has genotype data at
two genetic markers located on the same chromosome, for the markers A and
B. Further suppose that two alleles segregate at each of these two markers
such that alleles A.sub.1 and A.sub.2 can be found at marker A and alleles
B.sub.1 and B.sub.2 at marker B. Also assume that these two markers are on
a human autosome. If one is to examine a specific individual and find that
they are heterozygous at both markers, such that their two-marker genotype
is A.sub.1A.sub.2B.sub.1B.sub.2, then there are two possible
configurations: the individual in question could have the alleles
A.sub.1B.sub.1 on one chromosome and A.sub.2B.sub.2 on the remaining
chromosome; alternatively, the individual could have alleles
A.sub.1B.sub.2 on one chromosome and A.sub.2B.sub.1 on the other. The
arrangement of alleles on a chromosome is called a haplotype. In this
illustration, the individual could have haplotypes
A.sub.1B.sub.1/A.sub.2B.sub.2 or A.sub.1B.sub.2/A.sub.2B.sub.1 (see Hartl
and Clark (1989) for a more complete description). The concept of linkage
equilibrium relates the frequency of haplotypes to the allele frequencies.
Assume that a sample of individuals is selected from a larger population.
Considering the two markers described above, each having two alleles,
there are four possible haplotypes: A.sub.1B.sub.1, A.sub.1B.sub.2,
A.sub.2B.sub.1 and A.sub.2B.sub.2. Denote the frequencies of these four
haplotypes with the following notation. P.sub.11=freq(A.sub.1B.sub.1) (1)
P.sub.12=freq(A.sub.1B.sub.2) (2) P.sub.21=freq(A.sub.2B.sub.1) (3)
P.sub.22=freq(A.sub.2B.sub.2) (4) The allele frequencies at the two
markers are then the sum of different haplotype frequencies, it is
straightforward to write down a similar set of equations relating
single-marker allele frequencies to two-marker haplotype frequencies:
p.sub.1=freq(A.sub.1)=P.sub.11+P.sub.12 (5)
p.sub.2=freq(A.sub.2)=P.sub.21+P.sub.22 (6)
q.sub.1=freq(B.sub.1)=P.sub.11+P.sub.21 (7)
q.sub.2=freq(B.sub.2)=P.sub.12+P.sub.22 (8) Note that the four haplotype
frequencies and the allele frequencies at each marker must sum to a
frequency of 1. P.sub.11+P.sub.12+P.sub.21+P.sub.22=1 (9)
p.sub.1+p.sub.2=1 (10) q.sub.1+q.sub.2=1 (11) If there is no correlation
between the alleles at the two markers, one would expect that the
frequency of the haplotypes would be approximately the product of the
composite alleles. Therefore, P.sub.11.apprxeq.p.sub.1q.sub.1 (12)
P.sub.12.apprxeq.p.sub.1q.sub.2 (13) P.sub.21.apprxeq.p.sub.2q.sub.1 (14)
P.sub.22.apprxeq.p.sub.2q.sub.2 (15) These approximating equations
(12)-(15) represent the concept of linkage equilibrium where there is
independent assortment between the two markers--the alleles at the two
markers occur together at random. These are represented as approximations
because linkage equilibrium and linkage disequilibrium are concepts
typically thought of as properties of a sample of chromosomes; and as such
they are susceptible to stochastic fluctuations due to the sampling
process. Empirically, many pairs of genetic markers will be in linkage
equilibrium, but certainly not all pairs.
Having established the concept of linkage equilibrium above, applicants
can now describe the concept of linkage disequilibrium (LD), which is the
deviation from linkage equilibrium. Since the frequency of the
A.sub.1B.sub.1 haplotype is approximately the product of the allele
frequencies for A.sub.1 and B.sub.1 under the assumption of linkage
equilibrium as stated mathematically in (12), a simple measure for the
amount of departure from linkage equilibrium is the difference in these
two quantities, D, D=P.sub.11-p.sub.1q.sub.1 (16) D=0 indicates perfect
linkage equilibrium. Substantial departures from D=0 indicates LD in the
sample of chromosomes examined. Many properties of D are discussed in
Lewontin (1964) including the maximum and minimum values that D can take.
Mathematically, using basic algebra, it can be shown that D can also be
written solely in terms of haplotypes: D=P.sub.11P.sub.22-P.sub.12P.sub.21
(17) If one transforms D by squaring it and subsequently dividing by the
product of the allele frequencies of A.sub.1, A.sub.2, B.sub.1 and
B.sub.2, the resulting quantity, called r.sup.2, is equivalent to the
square of the Pearson's correlation coefficient commonly used in
statistics (e.g. Hoel, 1954).
As with D, values of r.sup.2 close to 0 indicate linkage equilibrium
between the two markers examined in the sample set. As values of r.sup.2
increase, the two markers are said to be in linkage disequilibrium. The
range of values that r.sup.2 can take are from 0 to 1. r.sup.2=1 when
there is a perfect correlation between the alleles at the two markers.
In addition, the quantities discussed above are sample-specific. And as
such, it is necessary to formulate notation specific to the samples
studied. In the approach discussed here, three types of samples are of
primary interest: (i) a sample of chromosomes from individuals affected by
a disease-related phenotype (cases), (ii) a sample of chromosomes obtained
from individuals not affected by the disease-related phenotype (controls),
and (iii) a standard sample set used for the construction of haplotypes
and calculation pairwise linkage disequilibrium. For the allele
frequencies used in the development of the method described below, an
additional subscript will be added to denote either the case or control
sample sets. p.sub.1,cs=freq(A.sub.1 in cases) (19)
P.sub.2,cs=freq(A.sub.2 in cases) (20) q.sub.1,cs=freq(B.sub.1 in cases)
(21) q.sub.2,cs=freq(B.sub.2 in cases) (22) Similarly,
p.sub.1,ct=freq(A.sub.1 in controls) (23) P.sub.2,ct=freq(A.sub.2 in
controls) (24) q.sub.1,ct=freq(B.sub.1 in controls) (25)
q.sub.2,ct=freq(B.sub.2 in controls) (26)
As a well-accepted sample set is necessary for robust linkage
disequilibrium calculations, data obtained from the International HapMap
project (The International HapMap Consortium 2003, 2005; Thorisson et al,
2005; McVean et al, 2005) can be used for the calculation of pairwise
r.sup.2 values. Indeed, the samples genotyped for the International HapMap
Project were selected to be representative examples from various human
sub-populations with sufficient numbers of chromosomes examined to draw
meaningful and robust conclusions from the patterns of genetic variation
observed. The International HapMap project website (hapmap.org) contains a
description of the project, methods utilized and samples examined. It is
useful to examine empirical data to get a sense of the patterns present in
such data.
Haplotype frequencies were explicit arguments in equation (18) above.
However, knowing the 2-marker haplotype frequencies requires that phase to
be determined for doubly heterozygous samples. When phase is unknown in
the data examined, various algorithms can be used to infer phase from the
genotype data. This issue was discussed earlier where the doubly
heterozygous individual with a 2-SNP genotype of
A.sub.1A.sub.2B.sub.1B.sub.2 could have one of two different sets of
chromosomes: A.sub.1B.sub.1/A.sub.2B.sub.2 or
A.sub.1B.sub.2/A.sub.2B.sub.1. One such algorithm to estimate haplotype
frequencies is the expectation-maximization (EM) algorithm first
formalized by Dempster et al (1977). This algorithm is often used in
genetics to infer haplotype frequencies from genotype data (e.g. Excoffier
and Slatkin, 1995; Tregouet et al, 2004). It should be noted that for the
two-SNP case explored here, EM algorithms have very little error provided
that the allele frequencies and sample sizes are not too small. The impact
on r.sup.2 values is typically negligible.
As correlated genetic markers share information, interrogation of SNP
markers in LD with a disease-associated SNP marker can also have
sufficient power to detect disease association (Long and Langley, 1999).
The relationship between the power to directly find disease-associated
alleles and the power to indirectly detect disease-association was
investigated by Pritchard and Przeworski (2001). In a straight-forward
derivation, it can be shown that the power to detect disease association
indirectly at a marker locus in linkage disequilibrium with a
disease-association locus is approximately the same as the power to detect
disease-association directly at the disease-association locus if the
sample size is increased by a factor of 1/r.sup.2 (the reciprocal of
equation 18) at the marker in comparison with the disease-association
locus.
Therefore, if one calculated the power to detect disease-association
indirectly with an experiment having N samples, then equivalent power to
directly detect disease-association (at the actual disease-susceptibility
locus) would necessitate an experiment using approximately r.sup.2N
samples. This elementary relationship between power, sample size and
linkage disequilibrium can be used to derive an r.sup.2 threshold value
useful in determining whether or not genotyping markers in linkage
disequilibrium with a SNP marker directly associated with disease status
has enough power to indirectly detect disease-association.
To commence a derivation of the power to detect disease-associated markers
through an indirect process, define the effective chromosomal sample size
as
-- see Original Patent.
For example, .PHI.(1.644854)=0.95. The value of r.sup.2 may be derived to
yield a pre-specified minimum amount of power to detect disease
association though indirect interrogation. Noting that the LD SNP marker
could be the one that is carrying the disease-association allele,
therefore that this approach constitutes a lower-bound model where all
indirect power results are expected to be at least as large as those
interrogated.
Denote by .beta. the error rate for not detecting truly disease-associated
markers. Therefore, 1-.beta. is the classical definition of statistical
power. Substituting the Pritchard-Pzreworski result into the sample size,
the power to detect disease association at a significance level of .alpha.
is given by the approximation
-- see Original Patent.
Suppose that r.sup.2 is calculated
between an interrogated SNP and a number of other SNPs with varying levels
of LD with the interrogated SNP. The threshold value r.sub.T.sup.2 is the
minimum value of linkage disequilibrium between the interrogated SNP and
the potential LD SNPs such that the LD SNP still retains a power greater
or equal to T for detecting disease-association. For example, suppose that
SNP rs200 is genotyped in a case-control disease-association study and it
is found to be associated with a disease phenotype. Further suppose that
the minor allele frequency in 1,000 case chromosomes was found to be 16%
in contrast with a minor allele frequency of 10% in 1,000 control
chromosomes. Given those measurements one could have predicted, prior to
the experiment, that the power to detect disease association at a
significance level of 0.05 was quite high--approximately 98% using a test
of allelic association. Applying equation (32) one can calculate a minimum
value of r.sup.2 to indirectly assess disease association assuming that
the minor allele at SNP rs200 is truly disease-predisposing for a
threshold level of power. If one sets the threshold level of power to be
80%, then r.sub.T.sup.2=0.489 given the same significance level and
chromosome numbers as above. Hence, any SNP with a pairwise r.sup.2 value
with rs200 greater than 0.489 is expected to have greater than 80% power
to detect the disease association. Further, this is assuming the
conservative model where the LD SNP is disease-associated only through
linkage disequilibrium with the interrogated SNP rs200.
The contribution or association of particular SNPs and/or SNP haplotypes
with disease phenotypes, such as Alzheimer's Disease, enables the SNPs of
the present invention to be used to develop superior diagnostic tests
capable of identifying individuals who express a detectable trait, such as
Alzheimer's Disease, as the result of a specific genotype, or individuals
whose genotype places them at an increased or decreased risk of developing
a detectable trait at a subsequent time as compared to individuals who do
not have that genotype. As described herein, diagnostics may be based on a
single SNP or a group of SNPs. Combined detection of a plurality of SNPs
(for example, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,
19, 20, 24, 25, 30, 32, 48, 50, 64, 96, 100, or any other number
in-between, or more, of the SNPs provided in Table 1 and/or Table 2)
typically increases the probability of an accurate diagnosis. For example,
the presence of a single SNP known to correlate with Alzheimer's Disease
might indicate a probability of 20% that an individual has or is at risk
of developing Alzheimer's Disease, whereas detection of five SNPs, each of
which correlates with Alzheimer's Disease, might indicate a probability of
80% that an individual has or is at risk of developing Alzheimer's
Disease. To further increase the accuracy of diagnosis or predisposition
screening, analysis of the SNPs of the present invention can be combined
with that of other polymorphisms or other risk factors of Alzheimer's
Disease, such as disease symptoms, pathological characteristics, family
history, diet, environmental factors or lifestyle factors.
It will, of course, be understood by practitioners skilled in the
treatment or diagnosis of Alzheimer's Disease that the present invention
generally does not intend to provide an absolute identification of
individuals who are at risk (or less at risk) of developing Alzheimer's
Disease, and/or pathologies related to Alzheimer's Disease, but rather to
indicate a certain increased (or decreased) degree or likelihood of
developing the disease based on statistically significant association
results. However, this information is extremely valuable as it can be used
to, for example, initiate preventive treatments or to allow an individual
carrying one or more significant SNPs or SNP haplotypes to foresee warning
signs such as minor clinical symptoms, or to have regularly scheduled
physical exams to monitor for appearance of a condition in order to
identify and begin treatment of the condition at an early stage.
Particularly with diseases that are extremely debilitating or fatal if not
treated on time, the knowledge of a potential predisposition, even if this
predisposition is not absolute, would likely contribute in a very
significant manner to treatment efficacy.
The diagnostic techniques of the present invention may employ a variety of
methodologies to determine whether a test subject has a SNP or a SNP
pattern associated with an increased or decreased risk of developing a
detectable trait or whether the individual suffers from a detectable trait
as a result of a particular polymorphism/mutation, including, for example,
methods which enable the analysis of individual chromosomes for
haplotyping, family studies, single sperm DNA analysis, or somatic
hybrids. The trait analyzed using the diagnostics of the invention may be
any detectable trait that is commonly observed in pathologies and
disorders related to Alzheimer's Disease.
Another aspect of the present invention relates to a method of determining
whether an individual is at risk (or less at risk) of developing one or
more traits or whether an individual expresses one or more traits as a
consequence of possessing a particular trait-causing or trait-influencing
allele. These methods generally involve obtaining a nucleic acid sample
from an individual and assaying the nucleic acid sample to determine which
nucleotide(s) is/are present at one or more SNP positions, wherein the
assayed nucleotide(s) is/are indicative of an increased or decreased risk
of developing the trait or indicative that the individual expresses the
trait as a result of possessing a particular trait-causing or
trait-influencing allele.
In another embodiment, the SNP detection reagents of the present invention
are used to determine whether an individual has one or more SNP allele(s)
affecting the level (e.g., the concentration of mRNA or protein in a
sample, etc.) or pattern (e.g., the kinetics of expression, rate of
decomposition, stability profile, Km, Vmax, etc.) of gene expression
(collectively, the "gene response" of a cell or bodily fluid). Such a
determination can be accomplished by screening for mRNA or protein
expression (e.g., by using nucleic acid arrays, RT-PCR, TaqMan assays, or
mass spectrometry), identifying genes having altered expression in an
individual, genotyping SNPs disclosed in Table 1 and/or Table 2 that could
affect the expression of the genes having altered expression (e.g., SNPs
that are in and/or around the gene(s) having altered expression, SNPs in
regulatory/control regions, SNPs in and/or around other genes that are
involved in pathways that could affect the expression of the gene(s)
having altered expression, or all SNPs could be genotyped), and
correlating SNP genotypes with altered gene expression. In this manner,
specific SNP alleles at particular SNP sites can be identified that affect
gene expression.
Claim 1 of 36 Claims
1. A method of determining whether a
human has an increased risk for late onset Alzheimer's disease, comprising
testing nucleic acid from said human to determine the nucleotide content
at polymorphism rs4878104 in gene DAPK1, and determining that said human
has an increased risk for late onset Alzheimer's disease if said human has
C or G at rs4878104, wherein said increased risk for late onset
Alzheimer's disease is relative to being homozygous for T or A at
rs4878104.
____________________________________________
If you want to learn more
about this patent, please go directly to the U.S.
Patent and Trademark Office Web site to access the full
patent.
|