|
|
Title:
Systems for expressing toxic proteins, vectors and method of producing
toxic proteins
United States Patent: 7,544,775
Issued: June 9, 2009
Inventors: Falson; Pierre
(Sainte Foy les Lyon, FR), Penin; Francois (Decines, FR), Montigny; Cedric
(Gif sur Yvette, FR)
Assignee: Centre National
de la Recherche Scientifique (Paris Cedex, FR)
Appl. No.: 10/528,344
Filed: September 19, 2003
PCT Filed: September 19,
2003
PCT No.: PCT/FR03/02763
371(c)(1),(2),(4) Date: November
08, 2005
PCT Pub. No.: WO2004/027068
PCT Pub. Date: April 01,
2004
|
|
|
George Washington University's Healthcare MBA
|
Abstract
The present invention relates to a system
for expressing toxic proteins, to an expression vector comprising this
system, to a prokaryotic cell transformed with this system, and also to a
method for synthesizing a toxic protein using this expression system. The
expression system of the invention is characterized in that it comprises
successively, in the 5'-3' direction, a nucleotide sequence encoding the
Asp-Pro dipeptide and a nucleotide sequence encoding a toxic protein.
According to a preferred embodiment of the invention, the expression
system also comprises, upstream of the Asp-Pro sequence, a nucleotide
sequence encoding a soluble protein. The expression system of the
invention makes it possible to construct an expression vector that is
useful for transforming a prokaryotic cell such as E. coli, for example in
a method for synthesizing the toxic protein.
Description of the
Invention
FIELD OF THE INVENTION
The present invention relates to systems for expressing toxic proteins, to
expression vectors comprising one of these systems, to prokaryotic cells
transformed with these systems, and also to a method for synthesizing a
toxic protein using these expression systems.
It enables, for example, the overproduction in a prokaryotic cell, for
example Escherichia coli (E. coli), of toxic hydrophobic proteins or
peptides, for example the overproduction of transmembrane domains of viral
envelope proteins.
It finds many applications in particular in research concerning the
mechanisms of viral infections, and in the search for and development of
novel active principles for combating viral infections.
In the description which follows, the references between square brackets [
] refer to the attached reference list.
BACKGROUND OF THE INVENTION
Determination of the three-dimensional (3D) structure is a decisive step
in the structural and functional understanding of proteins.
Very great efforts and means have been, and are being, used to achieve
this aim, and have been amplified with the accumulation of data provided
by the genome sequencing programmes [1].
The two main techniques for establishing these protein structures are
X-ray diffraction, carried out using crystallized proteins, and nuclear
magnetic resonance (NMR) carried out using proteins in solution. NMR,
which is very suitable for studying proteins with a molecular mass of less
than 20 kDa, requires however, like X-ray diffraction, the production of
large amounts of material. It also means, in most cases, that material
enriched in .sup.15N and/or .sup.13C must be prepared.
In this context, the bacterium is a means of production that is widely
used by the scientific community [2]. The overexpression of proteins in
bacteria does not, however, occur without problems. In fact, it gives rise
to three situations: The first case, which is ideal, is that where the
protein is overproduced in a form that is correctly spatially folded
during its synthesis in vivo. This is not a rare situation, but neither is
it frequent. It concerns essentially soluble proteins that are small, i.e.
approximately 20 to 50 kDa. The second case, the most common, is that
where the protein is overproduced and aggregated in the form of inclusion
bodies. This concerns polytopic and/or large proteins. In this case, the
kinetics of folding of the protein are clearly slower than its rate of
biosynthesis. This promotes exposure of the hydrophobic regions of the
protein, that are normally buried in the core thereof, to the aqueous
solvent and generates non-specific interactions that result in the
formation of insoluble aggregates. According to the degree of disorder of
this folding, the inclusion bodies can be solubilized/unfolded under
non-native conditions, with urea or guanidine. The solubilized protein is
then subjected to various treatments, such as dialysis or dilution, so as
to promote, successfully in certain cases, a native 3D folding. The third
case is that where the expression engenders a varying degree of toxicity.
This goes from an absence of expression product if the bacterium manages
to adapt itself, to death of the bacterium if the product is too toxic. It
is a case which occurs quite frequently and most commonly with membrane
proteins or membrane protein domains, for instance those of the envelope
proteins of the hepatitis C virus [5] or of the human immunodeficiency
virus [6].
The problem of toxicity relates essentially to the expression of membrane
proteins, i.e. proteins having a hydrophobic domain. Now, these proteins
are of growing interest. Firstly, they are relatively numerous since the
establishment of the various genomes confirms that they represent
approximately 30% of the proteins potentially encoded by these genomes
[7]. Secondly, they constitute 70% of the therapeutic targets and their
alteration is the cause of many genetic diseases [8].
It is therefore essential to develop methods that facilitate or allow the
expression of such proteins or of their membrane portion.
Efforts have been made in this respect with, for example, the development
of bacterial strains that either show better tolerance to the expression
of membrane proteins [9, 10], or have a stricter regulation of the
mechanism in the expression, as in the case of the E. coli strain
BL21(DE3)pLysS developed by Stratagene. However, these improvements do not
make it possible to eliminate the toxicity phenomenon in all cases, in
particular in the expression of hydrophobic peptides corresponding to
membrane anchors.
The treatment of hepatitis C currently represents one of the major
high-stakes areas of medicine. Hepatitis C is caused by the hepatitis C
virus (HCV) of the family of flaviviridae and which specifically infects
hepatic cells [11]. This virus consists of a positive RNA of approximately
9500 bases which encodes a polyprotein of 3033 residues [13], symbolized
in the attached FIG. 1 (see Original Patent) by the rectangle 1A. This
polyprotein is cleaved, after expression, by endogenous and exogenous
proteases, so as to give rise to 10 different proteins. Two of them,
called E1 and E2, are glycosylated and form the envelope of the virus.
They each have membrane domains called TM, in particular TME1 for the E1
protein and TME2 for the E2 protein. The cleavage positions that generate
them are indicated in FIG. 1 by arrows with, mentioned below, a number
which corresponds to the position in the polyprotein of the first amino
acid of sequence resulting from the cleavage. The E1 and E2 proteins are
symbolized by a rectangle. The white portion of each rectangle corresponds
to the ectodomain (ed) and the shaded domain to the transmembrane region
(TM). The primary sequence of the TMs is indicated at the bottom of the
figure in one-letter-code, with numbers corresponding to the position of
the amino acids in the polyprotein located at the ends of these domains.
The stars indicate the hydrophobic amino acids. These membrane domains or
membrane regions of the virus have particular association properties that
condition the structuring of the viral envelope [12]. In this respect,
they constitute potential therapeutic targets. An understanding of the
mechanism of association of the virus requires studies of the 3D structure
of these domains, in particular by means of the abovementioned techniques,
which involves producing these peptides in abundant amounts, and also
preferably via the biosynthetic pathway in order to allow .sup.15N and/or
.sup.13C isotope labelling.
The various E1 expression trials of the prior art, in particular in E.
coli [14][5] or in sf9 insect cells infected with baculoviruses [15], have
not made it possible to overproduce this E1 protein, in particular due to
the toxicity induced by its expression, including in the "resistant" E.
coli BL21(DE3)pLysS strains described above. There has been no E2 protein
overexpression trial in bacteria. These toxicity problems are essentially
due to the C-terminal region of the two proteins, that is rich in
hydrophobic amino acids which form transmembrane domains that provide the
anchoring to the membrane of the endoplasmic reticulum.
There is therefore a real need for a system for expressing toxic proteins
which does not have the drawbacks, and limitations, deficiencies and
disadvantages of the techniques of the prior art.
In addition, there is a real need for an expression vector comprising such
a system for expressing toxic proteins, making it possible to carry out a
method for producing toxic proteins which does not have the drawbacks,
limitations, deficiencies and disadvantages of the techniques of the prior
art.
SUMMARY OF THE INVENTION
The aim of the present invention is precisely to provide a system for
expressing a toxic protein, which satisfies, inter alia, the needs
indicated above.
This aim, and others, are achieved, in accordance with the invention, by
means of an expression system characterized in that it comprises
successively, in the 5'-3' direction, a nucleotide sequence encoding the
dipeptide Asp-Pro, referred to below as dp sequence, and a nucleotide
sequence (pt) encoding a toxic protein (Pt). This system will be
identified below by: dp-pt.
DETAILED DESCRIPTION OF THE INVENTION
According to a particularly preferred embodiment of the present invention,
the expression system also comprises, upstream of the dp sequence, a
nucleotide sequence (ps) encoding a soluble protein (Ps). This soluble
protein may be, for example, glutathione S-transferase (GST) or
thioredoxin (TrX) or another equivalent soluble protein. This expression
system according to the invention will be identified below by: ps-dp-pt.
The dp-pt expression system of the present invention, which comprises a
sequence encoding Asp-Pro (DP in one-letter code) placed upstream of the
nucleotide sequence of the toxic protein, makes it possible, entirely
unexpectedly, to suppress the toxic effect of the protein for the host
cell. In addition, the inventors have noted that, entirely surprisingly,
the suppression of toxicity of the protein in the host is even more
effective with the ps-dp-pt expression system when the toxic peptide is
produced as a C-terminal fusion with a soluble protein, for example
glutathione S-transferase or thioredoxin, with the sequence Asp-Pro
inserted between the soluble protein and the toxic peptide.
The dp-pt or ps-dp-pt expression system of the present invention makes it
possible to overproduce toxic proteins in host cells, in particular
hydrophobic proteins, especially peptides which correspond to, or which
comprise, hydrophobic domains of membrane-anchored proteins which may
involve, for example, a membrane protein or a domain of a membrane
protein. It may involve, for example, a protein of a virus, for example of
a hepatitis C virus, of an AIDS virus, or of any other virus that is
pathogenic for humans and, in general, for mammals.
For example, the dp-pt or ps-dp-pt system of the invention makes it
possible to overproduce, in a host such as E. coli, the transmembrane
domains of the E1 and E2 proteins of the hepatitis C virus, called TME1
and TME2, corresponding respectively to the sequences
-- see Original Patent.
The nucleotide sequences that can be used
for constituting the dp-pt system of the invention encoding the TME1 (dp-pt.sub.(TME1))
or TME2 (dp-pt.sub.(TME2)) proteins can be any of the possible sequences
encoding respectively the DP-TME1 and DP-TME2 fusion proteins. The
sequences encoding the TME1 and TME2 proteins may advantageously be, for
example, SEQ ID NO: 3 and SEQ ID NO: 4, respectively, of the attached
sequence listing. To obtain the dp-pt system, the dp sequence encoding the
dipeptide Asp-Pro (DP) is added to these sequences.
The nucleotide sequences that can be used for constituting the ps-dp-pt
system of the invention encoding the TME1 (ps-dp-pt.sub.(TME1)) or TME2 (ps-dp-pt.sub.(TME2))
proteins may be any of the possible sequences encoding the Ps-DP-TME1 and
Ps-DP-TME2 fusion proteins, respectively. They may advantageously be, for
example, the sequences ID No. 34, ID No. 35 and ID No. 36 of the attached
sequence listing for TME1, making it possible to obtain a Ps-DP-TME1
chimeric protein. They may advantageously be, for example, the sequences
ID No. 37, ID No. 38 and ID No. 39 of the attached sequence listing for
TME2, making it possible to obtain a Ps-DP-TME2 chimeric protein.
In fact, the abovementioned nucleotide sequences have optimized codons for
the expression of TME1 and TME2 in a bacterium, for example in E. coli.
A large number of HCV RNA sequences producing an infectious phenotype
exist: these sequences can also be used in the present invention.
The sequence encoding the dipeptide Asp-Pro may be, for example: gacccg,
or any other sequence encoding this dipeptide.
The sequence encoding GST may be, for example, that present in the pGEXKT
plasmids, the sequence of which corresponds to SEQ ID NO: 29 of the
attached sequence listing, or any equivalent sequence, i.e. encoding this
soluble protein. The sequence encoding TrX may be, for example, that
present in the pET32a+ expression plasmid, the sequence of which
corresponds to SEQ ID NO: 30 of the attached sequence listing, or any
equivalent sequence, i.e. encoding this soluble protein.
For the production of the toxic protein, the dp-pt or ps-dp-pt expression
system of the invention is placed inside a host cell, for example by
cloning in an appropriate plasmid, by means of the usual techniques for
transforming a host in genetic recombination techniques.
The plasmid into which the expression system of the present invention may
be cloned so as to form this vector will be chosen in particular according
to the host cell. It may be, for example, the pT7-7 plasmid (SEQ TD NO: 33
of the attached sequence listing), a plasmid of the pGEX series (for
example of SEQ ID NO: 31 of the attached sequence listing), sold for
example by the company Pharmacia, or a plasmid of the pET32 series (for
example of sequence ID No: 32 of the attached sequence listing), sold for
example by the company Novagen.
The plasmids of the pGEX series and of the pET32 series will
advantageously be used for implementing the present invention. In fact,
they already comprise a ps sequence encoding a soluble protein (Ps),
respectively glutathione S-transferase and thioredoxin. Thus,
advantageously, the dp-pt system will be cloned into these plasmids
downstream of this ps sequence encoding the soluble protein.
The present invention therefore also relates to an expression vector
comprising a dp-pt or ps-dp-pt expression system according to the
invention; in particular, a vector comprising a dp-pt expression system
according to the invention and the oligonucleotide sequence of the pT7-7
plasmid, or a vector comprising a ps-dp-pt expression system according to
the invention and the oligonucleotide sequence of a pGEX plasmid or of a
pET32 plasmid.
For example, the expression vectors of the present invention that are
suitable for a bacterial host such as E. coli and that allow
overexpression of the abovementioned TME1 membrane protein may
advantageously have an oligonucleotide sequence chosen from the sequences
ID No. 40 (with pGEXKT), ID No. 42 (with pET32a+) and ID No. 44 (with
PT7-7) of the attached sequence listing.
For example, the expression vectors of the present invention that are
suitable for a bacterial host such as E. coli and that allow
overexpression of the abovementioned TME2 membrane protein may
advantageously have an oligonucleotide sequence chosen from the sequences
ID No. 41 (with pGEXKT), ID No. 43 (with pET32a+) and ID No. 45 (with
pT7-7) of the attached sequence listing.
In fact, the abovementioned expression vectors have codons that are
optimized for the expression of the chimeric proteins of the present
invention, including TME1 and TME2, in a bacterium, for example in E.
coli.
The present invention also relates to a prokaryotic cell transformed with
an expression vector according to the invention. This prokaryotic cell
transformed with the expression vector of the present invention should
preferably allow overexpression of the toxic protein for which the vector
codes. Thus, any host cell capable of expressing the expression vector of
the present invention can be used, for example E. coli, advantageously the
E. coli strain BL21(DE3)pLysS.
The present invention also relates to a method for producing a toxic
protein by genetic recombination, comprising the following steps:
transforming a host cell with an expression vector according to the
invention, culturing the transformed host cell under culture conditions
such that it produces a fusion protein comprising the dipeptide Asp-Pro
followed by the peptide sequence of the toxic protein from said expression
vector, and isolating said fusion protein, and cleaving said fusion
protein so as to recover the toxic protein.
The steps for transforming, culturing and isolating the chimeric protein
produced can be carried out by means of the usual techniques of genetic
recombination, for example by means of techniques such as those that are
described in document [25].
The step consisting in isolating the fusion protein can be carried out by
means of the usual techniques known to those skilled in the art for
isolating a protein from a cell extract.
The fusion protein produced by means of the method of the invention has a
"soluble protein-Asp-Pro-toxic protein" sequence. In the present
description, the dipeptide Asp-Pro is also called DP according to the
one-letter amino acid code.
For example, when the toxic protein is TME1, the fusion protein may have
the SEQ ID NO: 46 of the attached sequence listing, which corresponds to
the GST-DP-TME1 fusion protein; the SEQ ID NO: 48 of the attached sequence
listing, which corresponds to the TrX-DP-TME1 fusion protein; or the SEQ
ID NO: 50 of the attached sequence listing, which corresponds to the
M-DP-TME1 fusion protein of the attached sequence listing.
For example, when the toxic protein is TME2, the fusion protein may have
the SEQ ID NO: 47 of the attached sequence listing, which corresponds to
the GST-DP-TME2 fusion protein; the SEQ ID NO: 49 of the attached sequence
listing, which corresponds to the TrX-DP-TME2 fusion protein; or the SEQ
ID NO: 51 of the attached sequence listing, which corresponds to the
M-DP-TME2 fusion protein of the attached sequence listing.
The step consisting of cleavage of this fusion protein can advantageously
be carried out by means of formic acid, which cleaves the fusion protein
at the dipeptide Asp-Pro. It may be carried out, moreover, by means of any
appropriate technique known to those skilled in the art for recovering a
protein from a sample using a fusion protein.
The inventors are the first to have found a system that is really
effective for producing and even overproducing, in particular in the
Escherichia coli (E. coli) bacterium, hydrophobic peptides corresponding
to the membrane domains of the E1 and E2 proteins of the hepatitis C virus
envelope, the expression of which is lethal for the microorganism.
The field of application of the present invention concerns mainly the
production of hydrophobic peptides on a large scale, in particular for
fundamental and industrial research. Tn addition, the production of the
chimeric protein consisting of the soluble protein, of the dipeptide
Asp-Pro and of the hyrophohic peptide can be used for a functional
purpose, in particular for obtaining information on the degree of
oligomerization of the membrane domain or else on its heteropolymerization
capacity.
The fusion proteins, or chimeric proteins, are produced via their coding
DNA present, for example, in commercial plasmids and following which is
introduced, in phase, the DNA encoding the Asp-Pro sequence followed by
that encoding the toxic peptide. This application can be commercialized in
the form of bacterial expression plasmids which will include the sequence
of the Asp Pro site, downstream of that of the soluble proteins already
present. The corresponding plasmid will be described, for example, as a
tool that facilitates the production, via the biological pathway, of toxic
membrane peptides or proteins.
Thus, the present invention is applicable to any system for overexpressing
recombinant proteins, with or without fusion to a soluble protein such as,
for example, GST or thioredoxin, including a non-natural Asp-Pro sequence
inserted upstream of a sequence encoding a toxic domain of the protein,
for example a membrane domain of a protein.
Claim 1 of 13 Claims
1. An expression system comprising a DNA
sequence, wherein said DNA sequence encodes a fusion protein comprising a
sequence selected from the group consisting of SEQ ID NOS: 46-51. ____________________________________________
If you want to learn more
about this patent, please go directly to the U.S.
Patent and Trademark Office Web site to access the full
patent.
|