Title of Invention

AN ISOLATED NUCLEIC ACID SEQUENCE COMPRISING A POLYNUCLEOTIDE ENCODING A PLANT STEROL: CHOLESTEROL ACYLTRANSFERASE-LIKE POLYPEPTIDE

Abstract The present invention is directed to lecithin: cholesterol acyltransferase-like polypeptides (LCAT) and acyl CoA: cholesterol acyltransferases-like polypeptides (ACAT). The invention provides polynucleotides encoding such cholesterol: acyltransferases-like polypeptides, polypeptides encoded by such polynucleotides, and the use of such polynucleotides to alter sterol composition and oil production in plants and host cells. Also provided are oils produced by the plants and host cells containing the polynucleotides and food products, nutritional supplements, and pharmaceutical composition containing plants or oils of the present invention. The polynucleotides of the present invention include those derived from plant sources.
Full Text This application claims priority to U.S. provisional application Serial No. 60/152,493, filed August 30, 1999 and herein incorporated by reference in its entirety for all purposes.
BACKGROUND
Technical Field
The present invention is directed to plant acyltransferase-like nucleic acid and amino acid sequences and constructs, and methods related to their use in altering sterol composition and/or content, and oil composition and/or content in host cells and plants.
Related An
Through the development of plant genetic engineering techniques, it is now possible to produce transgenic varieties of plant species to provide plants which have novel and desirable characteristics. For example, it is now possible to genetically engineer plants for tolerance to environmental stresses, such as resistance to pathogens and tolerance to herbicides. It is also possible to improve the nutritional characteristics of the plant, for example to provide improved fatty acid, carotenoid, sterol and tocopherol compositions. However, the number of useful nucleotide sequences for the engineering of such characteristics is thus far limited.
There is a need for improved means to obtain or manipulate compositions of sterols from biosynthetic or natural plant sources. The ability to increase sterol production or alter the sterol compositions in plants may provide for novel sources of sterols for use in human and animal nutrition.
Sterol biosynthesis branches from the famesy! diphosphate intermediate in the isoprenoid pathway. Sterol biosynthesis occurs via a mevalonate dependent pathway in mammals and higher plants (Goodwin,(1981) Biosynthesis of Isoprenoid Compounds, vol \ (Porter, J.W, & Spurgeon, S.L., eds) pp.443-480, John Wiley and Sons, New York), while in green algae sterol biosynthesis is thought to occur via a mevalonate independent pathway (Schwender, et al. (1997) Physiology, Biochemistry, and Molecular Biology of

Plant Lipids, (Williams, J.P„ Khan, M.U., and Lem, N.W., eds) pp. 180-182, Kluwer Academic Publishers. Norweil, MA).
The solubility characteristics of sterol esters suggests that this is the storage form of sterols (Chang, era/., 0997)/(n/iu. Rev. Biochem., 66:612-638). Sterol O-acyltransferase enzymes such as acyl CoA:cholesterol acyltransferase (ACAT) and lecithin:cholesterol acyltransferase (LCAT) catalyze the formation of cholesterol esters, and thus are key to controlling the intracellular cholesterol storage. In yeast, it has been reported that overexpression ofLROl, a homolog of human LCAT, and phospho lipid; di acyl glycerol acyltransferase increased lipid synthesis (Oelkers et al., (2000) J. Biol. Chem., 26:15609-15612; Dahiqvist et al., (2000) Proc. Natl Acad. Sci. USA, 97-,6487-6492),
The characterization of various acylfransferase proteins is useful for the fiirther study of plant sterol synthesis systems and for the development of novel and/or alternative sterol sources. Studies of plant mechanisms may provide means to fixriher enhance, control, modify, or otherwise alter the sterol composition of plant ceils. Funhennore, such alterations in sterol content and/or composition may provide a means for obtaining tolerance to stress and insect damage. Of particular interest are the nucleic acid sequences of genes encoding proteins which may be useful for applications in genetic engineering.
SUMMARY OF THE INVENTION
The present invention is directed to lecithin; cholesterol acyltransferase-like polypeptides (also refened to herein as LCAT) and acyl CoAicholesterol acyliransferase-like polypeptides (also referred to herein as ACAT). In particular the invention is related to polynucleotides encoding such sterol :acyltransferases, polypeptides encoded by such polynucleotides, and the use of such polynucleotides to alter sterol composition and oil production. The polynucleotides of the present invention include those derived from plant sources.
One aspect of the invention, therefore, is an isolated nucleic acid sequence encoding a plant lecithin:cholesterol acyltransferase-like polypeptide, a fragment of a plant lecithin:cholesterol acyltransferase-like polypeptide, a plant acyl CoA:cholesterol acyltransferase-like polypeptide or a fragment of a plant acyl CoAicholesterol acyltransferase-like polypeptide.
Another aspect provides an isolated nucleic acid sequence consisting essentially of SEQ ID NO: 2, 4, 6, 8, 10-29, 43-51, 73 or 75. Also provided is an isolated nucleic acid sequence consisting of SEQ ID NO; 2,4, 6, 8,10-29, 43-51, 73 or 75.

Still another aspect provides an isolated nucleic acid sequence comprising a polynucleotide selected from the group consisting of an isolated polynucleotide encoding a polypeptide of SEQ ID NO: 3 or SEQ ID NO: 3 with at least one conservative amino acid substitution; SEQ ID NO: 2; an isolated polynucleotide that has at least 70%, 80%, 90%, or 95% sequence identity with SEQ ID NO: 2; an isolated polynucleotide of at least 10 amino acids that hybridizes under stringent conditions to SEQ ID NO: 2; an isolated polynucleotide complementary to any of the foregoing; and an isolated polynucleotide that hybridizes under stringent conditions to SEQ ID NO: 2 and encodes a plant lecithin:cholesterol acyltransferase-like polypeptide.
Still another aspect provides an isolated nucleic acid sequence consisting essentially of a polynucleotide of the formula 5" X-(R,)„-(R2)n-(R3}„-Y 3" where X is a hydrogen, Y is a hydrogen or a metal, R, and Rj are any nucleic acid, n is an integer between 0-3000, and R, is selected from the group consisting of an isolated polynucleotide encoding a polypeptide of SEQ ID NO". 3 or SEQ ID NO: 3 with at least one conservative amino acid substitution; SEQ ID NO: 2; an isolated polynucleotide that has at least 70%, 80%, 90%, or 95% sequence identity with SEQ ID NO: 2; an isolated polynucleotide of at least 10 amino acids that hybridizes under stringent conditions to SEQ ID NO: 2; an isolated polynucleotide complementary to any of the foregoing; and an isolated polynucleotide that hybridizes under stringent conditions to SEQ ID NO: 2 and encodes a plant lecithin;cholesteroI acyltransferase-like polypeptide.
Another aspect provides an isolated nucleic acid sequence comprising a polynucleotide selected from the group consisting of an isolated polynucleotide encoding a polypeptide of SEQ ID N0:5 or SEQ ID NO: 5 with at least one conservative amino acid substitution; SEQ ID NO: 4; an isolated polynucleotide that has at least 70%, 80%, 90%, or 95% sequence identity with SEQ ID NO: 4; an isolated polynucleotide of at least 10 amino acids that hybridizes under stringent conditions to SEQ ID NO: 4; an isolated polynucleotide complementaiy to any of the foregoing; and an isolated polynucleotide that hybridizes under stringent conditions to SEQ ID NO: 4 and encodes a plant lecithin:cholesterol acyltransferase-like polypeptide.
Another aspect provides an isolated nucleic acid sequence consisting essentially of a polynucleotide of the formula 5" X-{R,X-(RjV(R3)„-Y 3" where X is a hydrogen, Y is a hydrogen or a metal, R; and Rj are any nucleic acid, n is an integer between 0-3000, and R; is selected from the group consisting of an isolated polynucleotide encoding a polypeptide of SEQ ID NO: 5 or SEQ ID NO: 5 with at least one conservative amino acid

substitution; SEQ ID NO: 4; an isolated polynucleotide that has at least 70%, 80%, 90%, or 95% sequence identity with SEQ H) NO: 4; an isolated polynucleotide of at least 10 amino acids that hybridizes under stringent conditions to SEQ ID NO: 4; an isolated polynucleotide complementary to any of the foregoing; and an isolated polynucleotide that hybridizes under stringent conditions to SEQ ID NO: 4 and encodes a plant lecithin:cholesterol acyltransferase-like polypeptide.
Another aspect provides an isolated nucleic acid sequence comprising a polynucleotide selected from the group consisting of an isolated polynucleotide encoding a polypeptide of SEQ ED N0:7 or SEQ ID NO: 7 with at least one conservative amino acid substitution; SEQ ID NO: 6; an isolated polynucleotide that has at least 70%, 80%, 90%, or 95% sequence identity with SEQ ID NO: 6; an isolated polynucleotide of at least 10 amino acids that hybridizes under stringent conditions to SEQ ED NO: 6; an isolated polynucleotide complementary to any of the foregoing; and an isolated polynucleotide that hybridizes under stringent conditions to SEQ ID NO: 6 and encodes a plant iecithin:cholesterol acyltransferase-like polypeptide.
Another aspect provides an isolated nucleic acid sequence consisting essentially of a polynucleotide of the fomiula 5" X-(R|)„-(R2)„-(R3VY 3" where X is a hydrogen, Y is a hydrogen or a metal, R, and Rj are any nucleic acid, n is an integer between 0-3000, and Rj is selected from the group consisting of an isolated polynucleotide encoding a polypeptide of SEQ ID NO: 7 or SEQ ID NO: 7 with at least one conservative amino acid substitution; SEQ ID NO: 6; an isolated polynucleotide that has at least 70%, 80%, 90%, or 95% sequence identity with SEQ ED NO: 6; an isolated polynucleotide of at least 10 amino acids that hybridizes under sttingent conditions to SEQ ID NO: 6; an isolated polynucleotide complementary to any of the foregoing; and an isolated polynucleotide that hybridizes under stringent conditions to SEQ ID NO: 6 and encodes a plant lecithin:cholesterol acyltransferase-like polypeptide.
Another aspect provides an isolated nucleic acid sequence comprising a polynucleotide selected from the group consisting of an isolated polynucleotide encoding a polypeptide of SEQ ID N0:9 or SEQ ID NO: 9 with at least one conservative amino acid substimtion; SEQ ID NO: 8; an isolated polynucleotide that has at least 70%, 80%, 90%, or 95% sequence identity with SEQ ID NO: 8; an isolated polynucleotide of at least 10 amino acids that hybridizes under stringent conditions to SEQ ID NO: 8; an isolated polynucleotide complementary to any of the foregoing; and an isolated polynucleotide that

hybridizes under stringent conditions to SEQ ID NO: 8 and encodes a plant lecithin-.cholesterol acyitransferase-like polypeptide.
Another aspect provides an isolated nucleic acid sequence consisting essentially of a polynucleotide of the formula 5" X-(R,X"(RX-(RiX-Y 3" where X is a hydrogen, Y is a hydrogen or a metal, R, and Rj are any nucleic acid, n is an integer between 0-3000, and R is selected from the group consisting of an isolated polynucleotide encoding a polypeptide of SEQ ID NO: 9 or SEQ ID NO: 9 with at least one conservative amino acid substitution; SEQ ID NO: 8; an isolated polynucleotide that has at least 70%, 80%, 90%, or 95% sequence idenrity with SEQ ID NO: 8; an isolated polynucleotide of al least 10 amino acids that hybridizes under stringent conditions to SEQ ID NO: 8; an isolated polynucleotide complementary to any of the foregoing; and an isolated polynucleotide that hybridizes under stringent conditions to SEQ iD NO: 8 and encodes a plant lecithin;cholesterol acyltransferase-like polypeptide.
Another aspect provides an isolated nucleic acid sequence comprising a polynucleotide selected from the group consisting of an isolated polynucleotide encoding a polypeptide of SEQ ID NO:74 or SEQ ED NO: 74 with at least one conservative amino acid substitution; SEQ ID NO: 73; an isolated polynucleotide that has at least 70%, 80%, 90%, or 95% sequence identity with SEQ ID NO: 73; an isolated polynucleotide of at least 10 amino acids that hybridizes under stringent conditions to SEQ ID NO: 73; an isolated polynucleotide complementary to any of the foregoing; and an isolated polynucleotide that hybridizes under stringent conditions to SEQ ID NO: 73 and encodes a plant lecithinxholesterol acyltransferase-like polypeptide.
Another aspect provides an isolated nucleic acid sequence consisting essentially of a polynucleotide of the formula 5" X-(R|)„-{R2)„-(R3)„-Y 3" where X is a hydrogen, Y is a hydrogen or a metal, Ri and R are any nucleic acid, n is an integer between 0-3000, and R; is selected from the group consisting of an isolated polynucleotide encoding a polypeptide of SEQ ID NO: 74 or SEQ ID NO: 74 with at least one conservative amino icid substitution; SEQ ID NO: 73; an isolated polynucleotide that has at least 70%, 80%, ?0%, or 95% sequence identity with SEQ ID NO: 73; an isolated polynucleotide of at least 10 amino acids that hybridizes under stringent conditions to SEQ ID NO: 73; an isolated jolynucleotide complementary to any of fiie foregoing; and an isolated polynucleotide that lybridizes under stringent conditions to SEQ ID NO: 73 and encodes a plant 2cithin:choleslerol acyitransferase-like polypeptide.

Another aspect provides an isolated nucleic acid sequence comprising a polynucleotide selected from the group consisting of an isolated polynucleotide encoding a polypeptide of SEQ ID NO:76 or SEQ ID NO: 76 with at least one conservative amino -acid substitution; SEQ ID NO: 75; an isolated polynucleotide that has at least 70%, 80%, 90%, or 95% sequence identity with SEQ ID NO: 75; an isolated polynucleotide of at least 10 amino acids that hybridizes under stringent conditions to SEQ ID NO: 75; an isolated polynucleotide complementary to any of the foregoing; and an isolated polynucleotide that hybridizes under stringent conditions to SEQ ID NO: 75 and encodes a plant lecithin:cholesterol acyltransferase-like polypeptide.
Another aspect provides an isolated nucleic acid sequence consisting essentially of a polynucleotide ofthe formula 5" X-(R,)n-(Rj)„-(R3X-Y3" where X is a hydrogen. Visa hydrogen or a metal, Rj and Rj are any nucleic acid, n is an integer between 0-3000, and R, is selected from the group consisting of an isolated polynucleotide encoding a polypeptide of SEQ ID NO: 76 or SEQ ID NO: 76 with at least one conservative amino acid substimtion; SEQ ID NO: 75; an isolated polynucleotide that has at least 70%, 80%, 90%, or 95% sequence identity with SEQ ID NO: 75; an isolated polynucleotide of at least 10 amino acids that hybridizes under stringent conditions to SEQ ID NO: 75; an isolated polynucleotide complementary to any ofthe foregoing; and an isolated polynucleotide that hybridizes under stringent conditions to SEQ ID NO: 75 and encodes a plant lecithin:cholesterol acyltransferase-like polypeptide.
Another aspect provides an isolated nucleic acid sequence comprising a polynucleotide selected from the group consisting of SEQ ID NO: 42 or a degenerate variant thereof; an isolated polynucleotide that has at least 70%, 80%, 90%, or 95% sequence identity with SEQ ID NO: 42; an isolated polynucleotide of at least 10 amino acids that hybridizes under stringent conditions to SEQ ID NO: 42; an isolated polynucleotide compiementary to any ofthe foregoing; and an isolated polynucleotide that hybridizes under stringent conditions to SEQ ID NO: 42 and encodes an acyl Co A; cholesterol acyltransferase-like polypeptide.
Another aspect provides an isolated nucleic acid sequence consisting essentially of a polynucleotide of the formula 5" X-(R,)„-(Ri)n-(R3)n-Y 3" where X is a hydrogen, Y is a hydrogen or a metal, R and Rj are any nucleic acid, n is an integer between 0-3000, and R3 is selected from the group consisting of SEQ ID NO: 42 or a degenerate variant thereof; an isolated polynucleotide that has at least 70%, 80%, 90%, or 95% sequence identity with SEQ ID NO: 42; an isolated polynucleotide of at least 10 amino acids that hybridizes

under stringent conditions to SEQ ID NO: 42; an isolated polynucleotide complementary to any of the foregoing; and an isolated polynucleotide that hybridizes under stringent conditions to SEQ ID NO: 42 and encodes a acyl CoA:cholesterol acyltransferase-like . polypeptide.
Also provided is a recombinant nucleic acid construct comprising a regulatory sequence operably linked to a polynucleotide encoding a lecithin:choleslerol acyltransferase-like polypeptide and/or an acyl Co A: cholesterol acyltransferase-like polypeptide. In one embodiment, the sterol acyl transferases are plant sterol acyl transferases. In another embodiment, the recombinant nucleic acid constructs further comprises a termination sequence. The regulatory sequence can be a constitutive promoter, an inducible promoter, a developmentally regulated promoter, a tissue specific promoter, an organelle specific promoter, a seed specific promoter or a combination of any of the foregoing. Also provided is a plant containing this recombinant nucleic acid construct and the seed and progeny from such a plant. This recombinant nucleic acid construct can also be introduced into a suitable host cell to provide yet another aspect of the invention. If the host cell is a plant host ceil, the cell can be used to generate a plant to provide another aspect of the invention. Further aspects include seed and progeny from such a plant.
Yet another aspect is a purified polypeptide comprising, SEQ ID NO: 3, SEQ ID NO: 5, SEQ ID NO: 7, SEQ ID NO: 9, SEQ ID NO: 74, SEQ ID NO: 76, or any of the preceding sequences with at least one conservative amino acid substitution.
Still another aspect provides a purified immunogenic polypeptide comprising at least 10 consecutive amino acids fiom an amino acid sequence selected from the group consisting of SEQ ID NO: 3, 5, 7, 9, 74, 76 and any of the preceding sequences containing at least one conservative amino acid substitution. Also provided are antibodies, either polyclonal or monoclonal, that specifically bind the preceding immunogenic polypeptides.
One aspect provides a method for producing a lecithin:cholesterol acyltransferase-like polypeptide or an acyl Co A: cholesterol acyltransferase-like polypeptide comprising culturing a host cell containing any recombinant nucleic acid construct of the present invention under condition permitting expression of said lecithin :choIesterol acyltransferase-like polypeptide or acyl CoAicholesterol acyltransferase-like polypeptide.
Another aspect provides a method for modiiying the sterol content ofa host cell, comprising transforming a host cell with a recombinant construct containing a regulatory sequence operably linked to a polynucleotide encoding a lecithin:cholesteroI

acyltransferase-Uke polypeptide and culturing said host cell under conditions wherein said host-cell expresses a lecithinxholesterol acyitransferase-like polypeptide such that said host cell has a modified sterol composition as compared to host cells without the recombinant construct.
An additional aspect is a method for modifying the sterol content of a host cell comprising transforming a host cell with a recombinant construct containing a regulatory sequence operably linked to a polynucleotide encoding an acyl CoA; cholesterol acyltransferase-Iike polypeptide and culturing said host cell under conditions wherein said host cell expresses an acyl CoA; cholesterol acyltransferase-Iike polypeptide such that said host cell has a modified sterol composition as compared to host cells without the recombinant construct.
A fiirther aspect is a plant comprising a recombinant construct containing a regulatory sequence operably linked to a polynucleotide encoding a lecithinxholesterol acyltransferase-Uke polypeptide wherein expression of said recombinant coi\struct results in modified sterol composition of said plant as compared to the same plant without said recombinant construct.
Another aspect provides a plant comprising a recombinant construct containing a regulatory sequence operably linked to a polynucleotide encoding an acyl CoAxholesteroi acyltransferase-Uke polypeptide wherein expression of said recombinant construct results in modified sterol composition of said plant as compared to the same plant without said recombinant construct.
In a further aspect is provided an oil obtained from any of the plants or host cells of the present invention.
In still another aspect is provided a method for producing an oil with a modified sterol composition comprising providing any of the plants or host ceils of the present invention and extracting oil from the plant by any known method. Also provided is an oil produced by the preceding method.
Still another aspect provides a method for altering oil production by a host cell comprising, transforming a host cell with a recombinant construct containing a regulatory sequence operably linked to a polynucleotide encoding a lecithinxholesterol acyltransferase-Uke polypeptide and culturing the host cell under conditions wherein the host cell expresses a lecithinxholesterol acyltransferase-Uke polypeptide such that the host cell has an altered oil production as compared to host cells without the recombinant construct.

Another aspect provides a method for ahering oil production by a host ceil comprising, transforming a host cell with a recombinant construct containing a regulatory sequence operably linked to a polynucleotide encoding an acyl Co A: cholesterol acyltransferase-like polypeptide and culturing the host cell under conditions wherein the host cell expresses an acyl CoAxholesterol acyltransferase-like polypeptide such that the host ceil has an altered oil production as compared to host cells without the recombinant construct.
Also provided is a plant comprising a recombinant construct containing a regulatory sequence operably linked to a polynucleotide encoding a leciihinxholesterol acyltransferase-like polypeptide wherein expression of said recombinant construct results in an altered production of oil by said plant as compared to the same plant without said recombinant construct.
In a fiirther aspect is provided a plant comprising a recombinant construct containing a regulatory sequence operably linked to a polynucleotide encoding an acyl CoAxholesterol acyltransferase-like polypeptide wherein expression of said recombinant construct results in an altered production of oil by said plant as compared to the same plant without said recombinant construct.
Additional aspects provide a food, food ingredient or food product comprising any oil, plant or host cell of the present invention; a nutritional or dietary supplement comprising any oil, plant or host cell of the present invention; and a pharmaceutical composition comprising any oil, plant or host cell of the present invention along with a suitable diluent, carrier or excipient.
Additional aspects will be apparent from the descriptions and examples that follow.
BRIEF DESCRIPTION OF THE DRAWINGS
These and other features, aspects, and advantages of the present invention will become better understood with regard to the following description, appended claims and accompanying figures where;
Figure 1 shows an alignment of yeast, human and rat lecithin:cholesterol acyltransferase protein sequences with Arabidopsis LCATl, LCAT2, LCAT3, and LCAT4 deduced amino acid sequences.
Figure 2 shows the results of NMR sterol ester analysis on T2 seed from plant expressing LCAT4 under the control of the napin promoter (pCGN9998).

Figure 3 shows the results of hi"U vMS sterol analysis on oil extracted from T2 seed from control lines (pCGN8640) and lines expressing LCAT3 (pCGN9968) under the control of the napin proraoter.
Figure 4 shows the results of HPLC/MS sterol analysis on oil extracted from T2 seed from control lines (pCGN8640), and plant line expressing LCATl (pCGN9962), LCAT2 (pCGN99S3), LCAT3 (pCGN9968), and LCAT4 {pCGN9998) under the control of the napin promoter. Additionally, data from 3 lines expressing LCAT4 under the control of the 35S promoter (pCGN9996) are sho>wn.
Figure 5 shows the results of Nir analysis of the oil content of T2 seed from control lines (pCGN8640), and plant lines expressing LCATl (pCGN9962), LCAT2 (pCGN9983), and LCAT3 (pCGN9968) under the control of the napin promoter. Additionally, data from 16 lines expressing LCAT2 under the control of the 35S promoter {pCGN9981) are shown.
DETAILED DESCRIPTION
The following detailed description is provided to aid those skilled in the art in practicing the present invention. Even so, this detailed description should not be construed to unduly hmii the present invention as modifications and variations in the embodiments discussed herein can be made by those of ordinary skill in the art without departing from the spirit or scope of the present inventive discovery.
All publications, patents, patent applications and other references cited in this application are herein incorporated by reference in their entirety as if each individual publication, patent, patent application or other reference were specifically and individuaUy indicated to be incorporated by reference.
The present invention relates to lecithinxholesterol acyltransferase, particularly the isolated nucleic acid sequences encoding lecithin: cholesterol-like polypeptides (LCAT) from plant sources and acyl CoAxhoieslerohacyltransferase, particularly the isolated nucleic acid sequences encodmg acyl CoAxholesterol acyltransferase-like polypeptides (ACAT) from plant sources. Lecithinxholesierol acyltransferase-Iike as used herein includes any nucleic acid sequence encoding an amino acid sequence from a plant source, such as a protein, polypeptide or peptide, obtainable fitjm a ceil source, which demonstrates the ability to utilize lecithin (phosphatidyl choline) as an acyl donor for acylation of sterols or glycerides to fonn esters tmder enzyme reactive conditions along with such proteins polypeptides and peptides. Acyl Co A: cholesterol acyltransferase-like

as used herein includes any nucleic acid sequence encoding an amino acid sequence from a plant source, uch as a protein, polypeptide or peptide, obtainable from a ceil source, which demonstrates the ability to utilize acyl CoA as an acyl donor for acylation of sterols or glycerides to form esters under enzyme reactive conditions along with such proteins polypeptides and peptides. By "enzyme reactive conditions" is meant that any necessary conditions are available in an environment (i.e., such factors as temperature, pH, lack of inhibiting substances) which will permit the enzyme to ftmction.
The term "sterol" as apphed to plants refers to any chirai tetracyclic isopentenoid which may be formed by cycUzation of squaiene oxide through the transition state possessing stereochemistry similar to the trans-syn~trans-anti-trans-anti configuration, for example, protosteroid cation, and which retains a polar group at C-3 (hydroxyl or keto), an all-trans-anii stereochemistry in the ring system, and a side-chain 20R-configuration (Parker, etal. (1992)/n Nes, e/a/., Eds., Regulation of Isopentenoid Metabolism, ACS Symposium Series No. 497, p. 110; American Chemical Society, Washington, D.C).
Sterols may or may not contain a C-5-C-6 double bond, as this is a feature introduced late in the biosynthetic pathway. Sterols contain a Cj-Cio side chain at the C-17 position.
The term "phytosterol," which applies to sterols found uniquely in plants, refers to a sterol containing a C-5, and in some cases a C-22, double bond. Phytosterols are fiirther characterized by alkylation of die C-17 side-chain with a methyl or ethyl substituent at the C-24 position. Major phytosterols include, but are not limited to, sitosterol, stigmasterol, campesterol, brassicasterol, etc. Cholesterol, which lacks a C-24 methyl or ethyl side-chain, is found in plants, but is not unique thereto, and is not a "phytosterol."
"Phytostanols" are saturated forms of phytosterols wherein the C-5 and, when present, C-22 double bond(s) is (are) reduced, and include, but are not Ihnited to, sitostanol, campestanol, and 22-dihydrobrassicastanol-
"Sterol esters" are further characterized by the presence of a fatty acid or phenolic acid moiety rather than a hydroxyl group at the C-3 position.
The terra "sterol" includes sterols, phytosterols, phytosterol esters, phytostanols, and phytostanol esters.
The term "sterol compounds" includes sterols, phyotsterols, phytosterol esters, phytostanols, and phytostanol esters.
The term "phytosterol compound" refers to at least one phytosterol, at least one phytosterol ester, or a mixture thereof.

The term "phytostanol compound" refers to at least one phytostanol, at least one phytostanol ester, or a mixture thereof.
The temi "giyceride" refers to a fatty acid ester of glycerol and includes mono-, di-, and tri- acylgiycerols.
As used herein, "recombinant construct" is defined either by its method of production or its structure. In reference to its method of production, e.g., a product made by a process, the process is use of recombinant nucleic acid techniques, e.g., involving human intervention in the nucleotide sequence, typically selection or production. Alternatively, in terms of structure, it can be a sequence comprising ftision of two or more nucleic acid sequences which are not naturally contiguous or operatjveiy linked to each other
As used herein, "regulatory sequence" means a sequence of DNA concerned with controlling expression of a gene; e.g. promoters, operators and attenuators. A " heterologous regulatory sequence" is one which differs from the regulatory sequence naturally associated with a gene.
As used herein, "polynucleotide" and "oligonucleotide" are used interchangeably and mean a polymer of at least two nucleotides joined together by a phosphodiester bond and may consist of either ribonucleotides or deoxynucleotides.
As used herein, " sequence" means the linear order in which monomers appear in a polymer, for example, the order of amino acids in a polypeptide or the order of nucleotides in a polynucleoride.
As used herein, "polypeptide", "peptide", and "protein" ar: used interchangeably and mean a compound that consist of two or more amino acids that are linked by means of peptide bonds.
As used herein, the tenns "complementary" or "complementarity" refer to the pairing of bases, purines and pyrimidines, that associate through hydrogen bonding in double stranded nucleic acids. For example, the following base pairs are complementary: guanine and cyiosine; adenine and thymine; and adenine and uracil. The terms, as used herein, include complete and partial complementarity.

Isolated proteins. Polypeptides and Polynucleotides
A first aspect of the present invention relates to isolated LCAT polynucleotides. The polynucleotide sequences of the present invention include isolated polynucleotides that encode the polypeptides of the invention having a deduced amino acid sequence selected from the group of sequences set forth in the Sequence Listing and to other polynucleotide sequences closely related to such sequences and variants thereof
The invention provides a polynucleotide sequence identical over its entire length to each coding sequence as set forth in the Sequence Listing. The invention also provides the coding sequence for the mature polypeptide or a fragment thereof, as well as the coding sequence for the mature polypeptide or a fragment thereof in a reading frame with other coding sequences, such as those encoding a leader or secretory sequence, a pie-, pro-, or prepro- protein sequence. The polynucleotide can also include non-coding sequences, including for example, but not limited to, non-coding 5" and 3" sequences, stjch as the transcribed, untranslated sequences, termination signals, ribosome binding sites, sequences that stabilize mRNA, introns, polyadenylation signals, and additional coding sequence that encodes additional amino acids. For example, a marker sequence can be included to facilitate the purification of the fused polypeptide. Polynucleotides of the present invention also include polynucleotides comprising a structural gene and the tiaturally associated sequences that control gene expression.
The invention also includes polynucleotides of the formula: X-(R,).-(R-{R3).-Y wherein, at the 5" end, X is hydrogen, and at the 3" end, Y is hydrogen or a metal, R, and R3 are any nucleic acid residue, n is an integer between 0 and 3000, preferably between 1 and 1000 and Rj is a nucleic acid sequence of the invention, particularly a nucleic acid sequence selected from the group set forth in the Sequence Listing and preferably SEQ ID NOs: 2, 4, 6, 8, 10-29, 33, 42-51, 73 and 75. In the formula, R is oriented so that its 5" end residue is at the left, bound to R,, and its 3" end residue is at the rit, bound to R3. Any stretch of nucleic acid residues denoted by either R group, where R is greater than I, may be either a heteropolymer or a homopolymer, preferably a heteropolymer.
The invention also relates to variants of the polynucleotides described herein that encode for variants of the polypeptides of the invention. Variants that are fragments of the polynucleotides of the invention can be used to synthesize full-length polynucleotides of the invention. Preferred embodiments are polynucleotides encoding polypeptide variants wherein 5 to 10, 1 to 5,1 to 3,2,1 or no amino acid residues of a polypeptide sequence of

the invention are substituted, added or deleted, in any combination. Particularly preferred are substitutions, additions, and deletions that are silent such that they do not alter the properties or activities of the polynucleotide or polypeptide.
Further preferred embodiments of the invention that are at least 50%, 60%, or 70% identical over their entire length to a polynucleotide encoding a polypeptide of the invention, and polynucleotides that are complementary to such polynucleotides. More preferle are polynucleotides that comprise a region that is at least 80% identical over its entire length to a polynucleotide encoding a polypeptide of the invention and polynucleotides that are complementary thereto. In this regard, polynucleotides at least 90% identical over their entire length are particularly preferred, those at least 95% identical are especially preferred. Further, those with at least 91% identity are highly preferred and those with at least 98% and 99% identity are particularly highly preferred, with those at least 99% being the most highly preferred.
Preferred embodiments are polynucleotides that encode polypeptides that retain substantially the same biological function or activity as determined by the methods described herein as the mature polypeptides encoded by the polynucleotides set forth in the Sequence Listing.
The invention further relates to polynucleotides that hybridize to the above-described sequences. In particular, the invention relates to polynucleotides that hybridize under stringent conditions to the above-described polynucleotides. An example of stringent hybridization conditions is overnight incubarion at 42°C in a solution comprising 50% formamide, 5x SSC (150 mM NaCl, 15 mM trisodium citrate), 50 mM sodium phosphate (pH 7.6), 5x Denhardt"s solution, 10% dextran sulfate, and 20 micrograms/milliliter denatured, sheared salmon sperm DNA, followed by washing the hybridization support in Olx SSC at approximately 65°C. Also included are polynucleotides that hybridize under a wash stringency of 0. IX SSC or O.IX SSPE (at SO"C- Other hybridization and wash conditions are well known and are exemplified in Sambrook, et ai. Molecular Cloning: A Laboratory Manual, Second Edition, Cold Spring Harbor, NY (1989), particularly Chapter 11.
The invention also provides a polynucleotide consisting essentially of a polynucleotide sequence obtainable by screening an appropriate library containing the complete gene for a polynucleotide sequence set for in the Sequence Listing under stringent hybridization conditions with a probe having the sequence of said polynucleotide

seiquence or a fragment thereof; and isolating said polynucleotide sequence. Methods for screening libraries are well known in the art and can be found for example in Sambrook, ei ai. Molecular Cloning: A Laboratory Manual, Second Edition, Cold Spring Harbor, NY (1989), particularly Chapter 8 and Ausubel et al. Short Protocols in Molecular Biology, 3" ed, Wiley and Sons, 1995, chapter 6. Nucleic acid sequences useful for obtaining such a polynucleotide include, for example, probes and primers as described herein and in particular SEQ ID NO: 2, 4, 6, 8,10-29, 33,42-51, 73 and 75. These sequences are particularly useful in screening libraries obtained from Arabidopsis, soybean and com for sequences encoding lecithin:cholesterol acyltransferase and lecithin:cholesterol acyitransferase-Uke polypeptides and for screening libraries for sequences encoding acyt CoA:cholesterol acyl transferase and acyl CoAxhoIesterol acyl transferase-like polypeptides.
As discussed herein regarding polynucleotide assays of the invention, for example, polynucleotides of the invention can be used as a hybridization probe for RNA, cDNA, or genomic DNA to isolate full length cDNAs or genomic clones encoding a polypeptide and to isolate cDNA or genomic clones of other genes that have a high sequence similarity to a polynucleotide set forth in the Sequence Listing and in particular SEQ ID NO: 2, 4, 6, S, 10-29,33, 42-51, 73 and 75. Suchprobes will generEdly comprise at least 15 bases. Preferably such probes will have at least 30 bases and can have at least 50 bases. Particularly preferred probes wiH have between 30 bases and 50 bases, inclusive.
The coding region of each gene that comprises oris comprised by a polynucleotide sequence set forth in the Sequence Listing may be isolated by screening using a DNA sequence provided in the Sequence Listing to synthesize an oUgonucleolide probe. A labeled oligonucleotide having a sequence complementary to that of a gene of the invention is then used to screen a library of cDNA, genomic DNA or mRNA to identify members of the library which hybridize to the probe. For example, synthetic oligonucleotides are prepared which correspond to the LCAT EST sequences. The oiigonucieotides are used as primers in polymerase chain reaction (PCR) techniques to obtain 5" and 3" terminal sequence of LCAT genes. Alternatively, where oligonucleotides of low degeneracy can be prepared from particular LCAT peptides, such probes may be used directly to screen gene libraries for LCAT gene sequences. In particular, screening of cDNA libraries in phage vectors is useful in such methods due to lower levels of background hybridization.

Typically, a LCAT sequence obtainable from the use of nucleic acid probes will show 60-70% sequence identity between the target LCAT sequence and the encoding sequence used as a probe. However, lengthy sequences with as little as 50-60% sequence identity may also be obtained. The nucleic acid probes may be a lengthy fragment of the nucleic acid sequence, or may also be a shorter, oligonucleotide probe. When longer nucleic acid fragments are employed as probes (greater than about 100 bp), one may screen at lower stringencies in order to obtain sequences from the target sample which have 20-50% deviation (i.e., 50-80% sequence homology) from the sequences used as probe. Oligonucleotide probes can be considerably shorter than the entire nucleic acid sequence encoding an LCAT enzyme, but should be at least about 10, preferably at least about 15, and more preferably at least about 20 nucleotides. A higher degree of sequence identity is desired when shorter regions are used as opposed to longer regions. It may thus be desirable to identify regions of highly conserved amino acid sequence to design oligonucleotide probes for delecting and recovering other related LCAT genes. Shorter probes are often particularly useful for polymerase chain reactions (PCR), especially when highly conserved sequences can be identified. (See, Gou\d, et al., PNAS USA (1989) 56:1934-1938.).
Another aspect of the present mvention relates to LCAT polypeptides. Such polypeptides include isolated polypeptides set forth in the Sequence Listing, as well as polypeptides and fragments thereof, particularly those polypeptides which exhibit LCAT activity and also those polypeptides which have at least 50%, 60% or 70% identity, ■ preferably at least 80% identity, more preferably at least 90% identity, and most preferably at least 95% identity to a polypeptide sequence selected from the group of sequences set forth in the Sequence Listing, and also include portions of such polypeptides, wherein such portion of the polypeptide preferably includes at least 30 amino acids and more prefersdaly includes at least 50 amino acids.
"Identity", as is well understood in the art, is a relationship between two or more solypeptide sequences or two or more polynucleotide sequences, as determined by ;omparing the sequences. In the art, "identity" also means the degree of sequence ■elatedness between polypeptide or polynucleotide sequences, as determined by the match jetween strings of such sequences. "Identity" can be readily calculated by known methods ncluding, but not limited to, those described in Computational Molecular Biology, Lesk, KM., ed., Oxford University Press, New York (1988); Biocomputing: Informatics and Jenome Projects, Smith, D.W., ed.. Academic Press, New York, 1993; Computer Analysis

of Sequence Data, Part I, Griffin, A.M. and Griffin, H.G., eds., Humana Press, New Jersey (1994); Sequence Analysis in Molecular Biology, von Heinje, G., Academic Press (1987); Sequence Analysis Primer, Gribskov, M. and Devereux, J., eds., Stockton Press, New York (1991); and Carillo, H., and Lipman, D., SIAM J Applied Math, 48:1073 (1988). Methods to determine identity are designed to give the largest match between the sequences tested. Moreover, methods to determine identity are codified in publicly available programs. Computer programs which can be used to determine identity between two sequences iBcIude, but are not limited to, GCG (Devereux, J., et al.. Nucleic Acids Research 12(1):387 (1984); suite of five BLAST programs, three designed for nucleotide sequences queries (BLASTN, BLASTX, and TBLASTX) and two designed for protein sequence queries (BLASTP and TBLASTN) (Coulson, Trends in Biotechnology, J2: 76-80 (1994); Birren, et al.. Genome Analysis, J: 543-559 (1997)). The BLAST X program is publicly available from NCBI and other sources (BLASTManual, Altschul, S., el al., NCBINLM NIH, Bethesda, MD 20894; Altschul, S., et al., J. Mol. Biol., 215:403-410 (1990)). The well known Smith Waterman algorithm can also be used to detennine identity.
Parameters for polypeptide sequence comparison typically include the following:
Algorithm: Needleman and Wunsch, J". Mol. Biol. 48:443-453 (1970)
Comparison matrix: BLOSSUM62 from Hentikoff and Hentikoff, Proc. Natl. Acad. Sci USA 89:10915-10919 (1992)
Gap Penalty: 12
Gap Length Penalty: 4
A program which can be used with these parameters is publicly available as the "gap" program from Genetics Computer Group, Madison Wisconsin. The above parameters along with no penalty for end gap are the default parameters for peptide comparisons.
Parameters for polynucleotide sequence comparison include the following:
Algorithm: Needleman and Wunsch, ]. Mol. Biol. 48:443-453 (1970)
Comparison matrix: matches = +10; mismatches = 0
Gap Penalty: 50
Gap Length Penalty: 3
A program which can be used with these parameters is publicly available as the "gap" program from Genetics Computer Group, Madison Wisconsin. The above parameters are the default parameters for nucleic acid comparisons.

The invention also includes polypeptides of the fonnula;
x-(R,)n-{R:)-(R3VY
wherein, at the amino terminus, X is hydrogen, and at the carboxyl temiinus, Y is hydrogen or a metal, R, and R, are any amino acid residue, n is an integer between 0 and 1000, and Rj is an amino acid sequence of the invention, particularly an amino acid sequence selected from the group set forth in the Sequence Listing and preferably SEQ ID NOs: 3, 5, 7, 9, 74 and 76. In the formula, R, is oriented so that its amino terminal residue is at the left, bound to Rj, and its carboxy terminal residue is at the right, bound to R3. Any stretch of amino acid residues denoted by either R group, where R is greater than 1, may be either a heteropolymer or a homopolymer, preferably a heteropoiymer.
Polypeptides of die present invention include isolated polypeptides encoded by a polynucleotide comprising a sequence selected from the group of a sequence contained m SEQ ID NOs: 2, 4, 6, 8, 73 and 75.
The polypeptides of the present invention can be mature protein or caii be part of a fusion protein.
Fragments and variants of the polypeptides are also considered to be a part of the invention. A fragment is a variant polypeptide which has an amino acid sequence that is entirely the same as part but not all of the amino acid sequence of the previously described polypeptides. The fragments can be "free-standing" or comprised within a lafger polypeptide of which the fragment forms a part or a region, most preferably as a single continuous region. Preferred fragments are biologically active fragments which are those fragments that mediate activities of the polypeptides of the invention, including those with similar activity or improved activity or with a decreased activity. Also included are those polypeptides and polypeptide fragments that are antigenic or immunogenic in an animal, particuiarly a human and antibodies, either polyclonal or monoclonal that specifically bind the antigenic fragments. In one preferred embodiment, such antigenic or immunogenic fragments comprise at least 10 consecutive amino acids from the amino acid sequences disclosed herein or such sequences with at least one conservative amino acid substitution. In additional embodiments, such antigenic or immunogenic fragments comprise at least 15, at least 25, at least 50 or at least 100 consecutive amino acids from the amino acid sequences disclosed herein or such sequences with at least one conservative amino acid substitution. Methods for the production of antibodies from polypeptides and polypeptides conjugated to carrier molecules are well known in the art and can be found

for example in Ausubel et al.. Short Protocols in Molecular Biology, 3"* ed., Wiley & Sons, 1995, particularly chapter 11.
Variants of the polypeptide also include polypeptides that vary from the sequences set forth in the Sequence Listing by conservative amino acid substitutions, substitution of a residue by another with like characteristics. Those of ordinary skill in the art are aware that modifications in the amino acid sequence of a peptide, polypeptide, or protein can result in equivalent, or possibly improved, second generation peptides, etc., that display equivalent or superior fimctional characteristics when compared to the original amino acid sequence. The present invention accordingly encompasses such modified amino acid sequences. Alterations can include amino acid insertions, deletions, substitutions, truncations, fusions, shuffling of subunit sequences, and the like, provided that the peptide sequences produced by such modifications have substantially the same functional properties as the naturally occurring counterpart sequences disclosed herein.
One factor that can be considered in making such changes is the hydropathic index of amino acids. The importance of the hydropathic amino acid index in conferring interactive biological fimction on a protein has been discussed by Kyte and Doohttle (J. Mol. BioL, 157: 105-132,1982). It is accepted that the relative hydropathic character of amino acids contributes to the secondary structure of the resultant protein. This, in turn, affects the interaction of the protein with molecules such as enzymes, substrates, receptors, DNA, antibodies, antigens, etc.
Based on its hydrophobicity and charge characteristics, each amino acid has been assigned a hydropathic index as follows: isoleucine (+4.5); valine (+4.2); leucine (+3.8); phenylalanine (+2.8); cysteine/cystine (+2.5); methionine (+1.9); alanine (+1.8); glycine (-0.4); threonine (-0.7); serine (-0.8); tryptophan (-0.9); tyrosine (-1.3); proline (-1.6); histidine (-3.2); glutamate/glulamine/aspartate/asparagine (-3.5); lysine (-3.9); and arginine (-4.5).
As is known in the art, certain amino acids in a peptide or protein can be substituted for other amino acids having a similar hydropathic index or score and produce a resultant peptide or protein having similar biological activity, i.e., which still retains biological functionality. In making such changes, it is preferable that amino acids having hydropathic indices within ±2 are substituted for one another. More preferred substitutions are those wherein the amino acids have hydropathic indices within ±1. Most preferred substitutions are those wherein the amino acids have hydropathic indices within ±0.5.

Like amino acids can also be substituted on the basis of hydrophilicity. U.S. Patent No. 4,554,101 discloses that the greatest local average hydrophilicity ofa protein, as governed by the hydrophihcity of its adjacent amino acids, correlates with a biological property of the protein. The following hydrophilicity values have been assigned to amino acids: arginine/lysine (+3.0); aspartate/glulamate (+3.0 ±1); serine (+0.3); asparagine/glutamine (+0.2); glycine (0); threonine (-0.4); proline (-0.5 ±1); alanine/histidine (-0.5); cysteine (-1.0); methionine (-1.3); valine (-1.5); leucine/isoleucine (-1.8); tyrosme (-2.3); phenylalanine (-2.5); and tryptophan (-3.4). Thus, one amino acid in a peptide, polypeptide, or protein can be substituted by another amino acid having a similar hydrophilicity score and still produce a resultant protein having similar biological activity, i.e., still retaining correct biological fimction. In making such changes, amino acids having hydropathic indices within ±2 are preferably substituted for one another, those within ±1 are more preferred, and those within iO.5 are most preferred.
As outlined above, amino acid substitutions in the peptides of the present invention can be based on e relative similarity of the amino acid side-chain substituents, for example, their hydrophobicity, hydrophilicity, charge, size, etc. Exemplary substitutions that take various of the foregoing characteristics into consideration in order to produce conservative amino acid changes resulting in silent changes within the present peptides, etc., can be selected from other members of the class to which the naturally occurring amino acid belongs. Ar. ino acids can be divided into the following four groups: (1) acidic amino acids; (2) basic amino acids; (3) neutral polar amino acids; and (4) neutral non-polar amino acids. Representative amino acids within these various groups include, but are not limited to: (1) acidic (negatively charged) amino acids such as aspartic acid and glutamic acid; (2) basic (positively charged) amino acids such as arginine, histidine, and lysine; (3) neutral polar amino acids such as glycine, serine, threonine, cysteine, cystine, tyrosine, asparagine, and glutamine; and (4) neutral non-polar amino acids such as alanine, leucine, isoleucine, valine, proline, phenylalanine, tryptophan, and methionine. It should be noted that changes which are not expected to be advantageous can also be useful if these result in the production of functional sequences.
Variants that are fragments of the polypeptides of the invention can be used to produce the corresponding full length polypeptide by peptide synthesis. Therefore, these variants can be used as iniemiediates for producing the full-length polypeptides of the invention.

The polynucleotides and polypeptides of the invention can be used, for example, in the transformation of host cells, such as plant cells, animal cells, yeast cells, bacteria, bacteriophage, and viruses, as fimher discussed herein.
The invention also provides polynucleotides that encode a polypeptide that is a mature protein plus additional amino or carboxyl-terminai amino acids, or amino acids within the mature polypeptide (for example, when the mature form of the protein has more than one polypeptide chain). Such sequences can, for example, play a role in the processing of a protein from a precursor to a mature form, allow protein transport, shorten or lengthen protein half-life, or facilitate manipulation of the protein in assays or production. It is contemplated that cellular enzymes can be used to remove any additional amino acids from the mature protein.
A precursor protein, having the mature fonn of the polypeptide fused to one or more prosequences may be an inactive form of the polypeptide. The inactive precursors generally are activated when the prosequences are removed. Some or all of the prosequences may be removed prior to activation. Such precursor protein are generally called proproteins.
Preparatioa of Expression Constructs and Methods of Use
Of interest is the use of the nucleotide sequences in recombinant DNA constructs to direct the transcription or transcription and translation (expression) of the acyltransferase sequences of the present invention in a host cell. Of particular interest is the use of the polynucleotide sequences of the present invention in recombinant DNA constructs to direct the transcription or transcription and translation (expression) of the acyldransferase sequences of the present invention in a host plant cell.
The expression constructs generally comprise a regulatory sequence functional in a host cell operably linked to a nucleic acid sequence encoding a lecithin:cholesterol acyltransferase-like polypeptide or acyl CoA:cholesterol acyltransferase-like polypeptide of the present invention and a transcriptional termination region fionctional in a host plant cell. Of particular interest is the use of promoters (also referred to as transcriptional initiation regions) functional in plant host cells.
Those skilled in the art will recognize that there are a number of promoters which are fimctional in plant cells, and have been described in the Hterature including constitutive, inducible, tissue specific, organelle specific, developmentally regulated and environmentally regulated promoters. Chloroplast and plastid specific promoters,

chioropiasl or plastid functional promoters, and chloroplast or plastid operable promoters are also envisioned.
One set of promoters are constitutive promoters such as the CaMV35S or FMV35S promoters that yield high levels of expression in most plant organs. Enhanced or duplicated versions of the CaMV35S and FMV35S promoters are useful in the practice of this invention (Odell, el ai. (1985) Nature 313:810-812; Rogers, U.S. Patent Number 5,378, 619). Other usefiil corutitutive promoters include, but are not limited to, the mannopine synthase (mas) promoter, the nopaline synthase (nos) promoter, and the octopine synthase (ocs) promoter.
Useiiil inducible promoters include heat-shock promoters (Ou-Lee et ai. (1986) Proc. Natl. Acad. Sci. USA S3: 6815; Ainley et al. (1990) Plant Mol BioL 14: 949), a nitrate-inducible promoter derived from the spinach nitrite reductase gene (Baclc el al.
(1991) Plant Mol. Biol. 17: 9), hormone-inducible promoters (Yanmguchi-ShinozEdd et al
(1990) Plant Mol. Biol. 15: 905; Kares et al. (1990) Plant Mol. Biol 13: 905), and
lighl-inducible promoters associated with the small subunit of RuBP carboxylase and
LHCP gene families (Kuhlemeier et al. (1989) Plant Cell \: All; Feinbaum et al. (1991)
Moi Gen. Gener. 226: 449; Weisshaar etai. (1991)£Jl/5C)7: 10: 1777; Lam and Chua
(1990) Science 248: 471; Castresana el al. (1988) EMSOJ. 7:1929; Schuize-Lefert et al.
{199) EMBO J. %: 651).
In addition, it may also be preferred to bring about expression of the acyitransferase gene in specific tissues of the plant, such as leaf, stem, root, tuber, seed, fruit, etc., and the promoter chosen should have the desired tissue and developmental specificity. Examples of useful tissue-specific, developmenlally-regulated promoters include fruit-specific promoters such as the E4 promoter (Cordes et al. (1989) Plant Cell 1:1025), the E8 promoter (Deikman et al. (1988) EMBO J. 7: 3315), the kiwifruit actinidin promoter (Lin et al. (1993) PNAS 90: 5939), the 2A11 promoter (Houck et al., U.S. Patent 4,943,674), and the tomato pZI30 promoter (U.S. Patents 5,175, 095 and 5,530,185); the p-congiycinin 7S promoter (Doyle et al. (1986) J. Biol. Chem. 261: 9228; Shghton and Beachy (1987) Planta 172: 356), and seed-specific promoters (Knufzon et al.
(1992) Proc. Natl. Acad Sci. USA 89: 2624; Bnstos et al. (1991) EMBO J. 10: 1469; Lam
and Chua (1991) J. Biol. Chem. 266:17131; Stayton et al. (1991) Aust. J. Plant. Physiol.
18: 507). Fruit-specific gene regulation Is discussed in U.S. Patent 5,753,475. Other
useful seed-specific promoters include, but are not limited to, the napin, phaseoiin, zein,
soybean trypsin inhibitor, 7S, ADR12, ACP, stearoyl-ACP desaturase, oleosin,

Lasquerella hydroxylase, and barley aldose reductase promoters (Bartels (1995) Plant J. 7: 809-822), the EA9 promoter (U.S. Patent 5,420,034), and the Bce4 promoter (U.S. Patent 5,530,194). Useful embryo-specific promoters include the com globulin 1 and oieosin promoters- Useful endosperm-specific promoters include the rice glutelin-l promoter, the promoters for the low-pl 3 amylase gene (Amy32b) (Rogers et al. (1984) J. Biol Chem. 259: 12234), the high-pl j3 amylase gene (Amy 64) (Khurseed et al. (1988) J. Biol. Chem. 263:18953), and the promoter for abarley thiol protease gene ("Aleurain") (Whittier et al. (1987) Nucleic Acids Res. 15:2515).
Of particular interest is the expression of the nucleic acid sequences of the present invention from transcription initiation regions which are preferentiaily expressed in a plant seed tissue. Examples of such seed preferential transcription initiation sequences include those sequences derived from sequences encoding plant storage protein genes or from genes involved in fatty acid biosynthesis in oilseeds. Examples of such promoters include the 5" regulatory regions from such genes asnapin (Kridl ei al., Seed Set Res. /;209:219 (1991)), phaseolin, zein, soybean trypsin inhibitor, ACP, slearoyl-ACP desaturase, soybean a" subunitof P-conglycinin (soy 7s, (Chen era/., Proc. Natl. Acad. Sci., 83:8560-8564 (1986))) and oieosin. Seed-specific gene regulation is discussed in EP 0 255 378 BI and U.S. Patents 5,420,034 and 5,608,152 . Promoter hybrids can also be constructed to enhance transcriptional activity (Hoffinan, U.S. Patent No. 5,106,739), or to combine desired transcriptional activity and tissue specificity.
It may be advantageous to direct the localization of proteins conferring LCAT to a particular subcellular compartment, for example, to the mitochondrion, endoplasmic reticulum, vacuoles, chloroplast or other plastidic compartment. For example, where the genes of interest of the present invention will be targeted to plastids, such as chloroplasts, for expression, the constructs will also employ the use of sequences to direct the gene to the plastid. Such sequences are referred to herein as chloroplast transit peptides (CTP) or plastid a-ansit peptides (PTP). hi this manner, where the gene of interest is not directly inserted into the plastid, the expression construct will additionally contain a gene encoding a transit peptide to direct the gene of interest to the plastid. The chloroplast transit peptides may be derived from the gene of interest, or may be derived from a heterologous sequence having a CTP. Such transit peptides are known in the art. See, for example, Von Heijnee/a/. (1991)PlantMoL Biol. Rep. 9:104-126; Clarke?a/. {19S9) J. Biol. Chem. 264:17544-17550; della-Cioppa et al. (1987) Plant Physiol. 54:965-968; Romer el al.

(1993) Biochem. Biophys. Res Commun. /96:1414-1421; and. Shah et al. (1986) Science 253:478-481.
Depending upon the intended use, the constructs may contain the nucleic acid sequence which encodes the entire LCAT protein, a portion of the LCAT protein, the entire ACAT protein, or a portion of the ACAT protem. For example, where antisense inhibition of a given LCAT or ACAT protein is desired, the entire sequence is not required. Furthermore, where LCAT or ACAT sequences used in constructs are intended for use as probes, it may be advantageous to prepare constructs containing only a particular portion of a LCAT or ACAT encoding sequence, for example a sequence which is discovered to encode a highly conserved region.
The skilled artisan will recognize that there are various methods for the inhibition of expression of endogenous sequences in a host cell. Such methods include, but are not limited to antisense suppression (Smith, et al. (1988) Nature 334:724-726), co-suppression (NapoU, et a/. (1989) P/ant Ce//2-.279-289), ribozymes (PCT Publication WO 97/10328), and combinations of sense and antisense Waterhouse, el al. (1998) Proc. Natl. Acad. Sci. USA 95:13959-13964. Methods for the suppression of endogenous sequences in a host cell typically employ the transcription or transcription and translation of at least a portion of the sequence to be suppressed. Such sequences may be homologous to coding as well as non-coding regions of the endogenous sequence.
Regulatory transcript termination regions may be provided in plant expression constructs of this mvention as well. Transcript termination regions may be provided by the DNA sequence encoding the diacylglycerol acyllransferase or a convenient transcription termination region derived from a different gene source, for example, the n-anscript termination region which is naturally associated with the transcript initiation region. The skilled artisan will recognize that any convenient transcript termination region which is capable of terminating transcription in a plant cell may be employed in the constructs of the present invention.
Alternatively, constructs may be prepared to direct the expression of the LCAT or ACAT sequences directly from the host plant cell plastid. Such constructs and methods are known in the art and are generally described, for example, in Svab, et al. (1990) Proc. Natl. Acad. Sci. USA 87:8526-8530 and Svab and Maliga (1993) Proc. Natl. Acad Sci. USA 90:913-917 and in U.S. Patent Number 5,693,507.
A plant cell, tissue, organ, or plant into which the recombinant DNA coi\stnicts containing the expression constructs have been introduced is considered transformed.

transfected, or transgenic. A transgenic or transformed cell or plant also includes progeny of the cell or plant and progeny produced from a breeding program employing such a transgenic plant as a parent in a cross and exhibiting an altered phenotype resulting from the presence of a LCAT nucleic acid sequence.
Plant expression or transcription constructs having a plant LCAT as the DNA sequence of interest for increased or decreased expression thereof may be employed with a wide variety of plant life, particularly, plant life involved in the production of vegetable oils for edible and industrial uses. Most especially preferred are temperate oilseed crops. Plants of interest include, but are not limited to, rapeseed (Canola and High Erucic Acid varieties), sunflower, safilower, cotton, soybean, peanut, coconut and oil palms, and com. Depending on the method for introducing the recombinant constructs into the host cell, other DNA sequences may be required. Importantly, this invention is applicable to dicotyledyons and monocotyledons species alike and will be readily applicable to new and/or improved transformation and regulation techniques.
Of particular interest, is the use of plant LCAT and ACAT constructs in plants to produce plants or plant parts, including, but not limited to leaves, stems, roots, reproductive, and seed, with a modified content of lipid and/or sterol esters and to alter the oil production by such plants.
Of particular interest in the present invention, is the use of ACAT genes in conjunction with the LCAT sequences to increase the sterol content of seeds. Thus, overexpression of a nucleic acid sequence encoding an ACAT and LCAT in an oilseed crop may find use in the present invention to increase sterol levels in plant tissues and/or increase oil production.
It is contemplated that the gene sequences may be synthesized, either completely or in part, especially where it is desirable to provide plant-preferred sequences. Thus, all or a portion of the desired structural gene (that portion of the gene which encodes the LCAT or ACAT protein) may be synthesized using codons preferred by a selected host. Host-preferred codons may be determined, for example, from the codons used most frequently in the proteins expressed in a desired host species.
One skilled in the art will readily recognize that antibody preparations, nucleic acid probes (DNA and RNA) and the like may be prepared and used to screen and recover "homologous" or "related" sequences from a variety of plant sources. Homologous sequences are found when there is an identity of sequence, which may be determined upon-comparison of sequence information, nucleic acid or amino acid, or through hybridization

reactions between a known LCAT and a candidate source, Conservative clianges, such as Glu/Asp, Val/rie, Ser/Thr, Arg/Lys and Gln/Asn may also be considered in determining sequence homology. Amino acid sequences are considered homologous by as lirtie as 25% sequence identity between the two complete mature proteins. (See generally, Doolittle, R.F., OF URFS and ORFS (Umversity Science Books, CA, 1986.)
Thus, other LCATs may be obtained &om the specific sequences provided herein. Furthermore, it will be apparent that one can obtain natural and synthetic sequences, including modified amino acid sequences and starting materials for synthetic-protein modeling from the exemplified LCAT and ACAT sequences and from sequences which are obtained through the use of such exemplified sequences. Modified amino acid sequences include sequences which have been mutated, truncated, increased and the like, whether such sequences were partially or wholly synthesized. Sequences which are actually purified fiom plant preparations or are identical or encode identical proteins thereto, regardless of the method used to obtain the protein or sequence, are equally considered naturally derived.
For immimologicai screening, antibodies to the protein can be prepared by injecting rabbits or mice with the purified protein or portion thereof, such methods of preparing antibodies being well known to those in the art. Either monoclonal or polyclonal antibodies can be produced, although typically polyclonal antibodies are more useful for gene isolation. Western analysis may be conducted to determine that a related protein is present in a crude extract of the desired plant species, as detertnined by cross-reaction with the antibodies to the encoded proteins. When cross-reactivity is observed, genes encoding the related proteins are isolated by screening expression libraries representing the desired plant species. Expression hbraries can be constructed in a variety of commercially available vectors, including lambda gtl 1, as described in Sambrook, et ai (Molecular Cloning: A Laboratory Manual, Second Edition (1989) Cold Spring Harbor Laboratory, Cold Spring Harbor, New York).
To confirm the activity and specificity of the proteins encoded by the identified nucleic acid sequences as acyltransferase enzymes, in vitro assays are performed in insect cell cultures using baculovinis expression systems. Such baculovirus expression systems are known in the art and are described by Lee, et ai U.S. Patent Number 5,348,886, the entirety of which is herein incorporated by reference.
In addition, other expression constructs may be prepared to assay for protein activity utilizing different expression systems. Such expression constructs are transformed

into yeast or prokaryotic host and assayed for acyltransferase activity. Such expression systems are known in the art and are readily available through commercial sources.
The method of transfomiation in obtaining such transgenic plants is not critical to the instant invention, and various methods of plant transfonnation are currently available. Furthermore, as newer methods become available to transform crops, they may also be directly applied hereunder. For example, many plant species naturally susceptible to Agrobacterium infection may be successfully transformed via tripartite or binary vector methods of Agrobacierium mediated transformation, hi many instances, it will be desirable to have the construct bordered on one or both sides by T-DNA, particularly having the left and right borders, more particularly the right border. This is particularly useful when the construct uses A. tumefaciens ox A. rhizogenes as a mode for transformation, although the T-DNA borders may find use with other modes of transformation, hi addition, techniques of microinjection, DNA particle bombardment, and electroporation have been developed which allow for the transformation of various monocot and dicot plant species.
Normally, included with the DMA construct will be a structural gene having the necessary regulatory regions for expression in a host and providing for selection of transformani ceils. The gene may provide for resistance to a cytotoxic agent, e.g. antibiotic, heavy metal, toxin, etc., complementation providing prototrophy to an auxotrophic host, viral immimity or the like. Depending upon the number of different host species the expression construct or components thereof are introduced, one or more markers may be employed, where different conditions for selection are used for the different hosts.
Non-hmiting examples of suitable selection markers include genes that confer resistance to bleomycin, gentamycin, glyphosate, hygromycin, kanamycin, methotrexate, phleomycin, phosphinotricin, spectinomycin, streptomycin, sulfonamide and sulfonylureas. Maliga et al.. Methods in Plant Molecular Biology, Cold Spring Harbor Laboratory Press, 1995, p. 39. Examples of markers include, but are not limited to, alkaline phosphatase (AP), myc, hemagglutinin (HA), p glucuronidase (GUS), luciferase, and green fluorescent protein (GFP).
Where Agrobacterium is used for plant cell transformation, a vector may be used which may be introduced Into the Agrobacterium host for homologous recombination with T-DNA or the Ti- or Ri-plasmid present in the Agrobacterium host. The Ti- or Ri-plasmid containing the T-DNA for recombination may be armed (capable of causing gall

formation) or disarmed (incapable of causing gall formation), the latter being pemiissible, so long as the vir genes are present in the transformed Agrobacterium host. The armed plasmid can give a mixture of normal plant ceils and gall.
In some instances where Agrobacterium is used as the vehicle for transforming ? host plant cells, the expression or transcription construct bordered by the T-DNA border region(s) will be inserted into a broad host range vector capable of replication in E. coii and Agrobacterium, there being broad host range vectors described in the literature. Conamonly used is pRK2 or derivatives thereof See, for example, Ditta, et al, (Proc. Nat. Acad. ScL. U.S.A. (1980) 77:7347-7351) and EPA 0 120 515, which are incorporated I herein by reference. Alternatively, one may insert the sequences to be expressed in plant cells into a vector containing separate replication sequences, one of which stabilizes the vector in E. coli, and the other in Agrobacterium. See, for example, McBride and Summerfelt (Plant Mol. Biol. (1990) 14:269"276), wherein the pRiHRI (Jouanin, et al.. Mol. Gen. Genet. (1985) 201:370-374) origin of replication is utilized and provides for added stability of the plant expression vectors Ln host Agrobacterium cells.
Included with the expression construct and the T-DNA can be one or more markera, which allow for selection of transformed Agrobacterium and transformed plant cells. A
number of markers have been developed for use with plant cells, such as resistance to chloramphenicol, kanamycin, the aminoglycoside G4-I8, hygromycin, or the like. The particular marker employed is not essential to this invention, one or another marker being prefened depending on the particular host and the manner of construction.
For transformation of plant cells using Agrobacterium, explants may be combined and incubated with the transformed Agrobacterium for sufficient time for transformation, the bacteria killed, and the plant ccUs cultured in an appropriate selective medium. Once callus forms, shoot formadon can be encouraged by employing the appropriate plant hormones in accordance with known methods and the shoots transferred to rooting medium for regeneration of plants. The plants may then be grown to seed and the seed used to estabUsh repetitive generations and for isolation of vegetable oils.
Thus, in another aspect of the present invention, methods for modifying the sterol and/or stanol composition of a host cell. Of particular interest are methods for modifying the sterol and/or stanol composition of a host plant cell. In general the methods involve eidier increasing the levels of sterol ester compounds as a proportion of the total sterol

compounds. The method generally comprises the use of expression constructs to direct the expression of the polynucleotides of the present invention in a host ceil.
Also provided are methods for reducing the proportion of sterol ester compounds as a percentage of total sterol compounds in a host plant cell. The method generally comprises the use of expression constructs to direct the suppression of endogenous acyltransferase proteins in a host cell.
Of particular interest is the use of expression constructs to modify the levels of sterol compounds in a host plant cell. Most particular, the methods find use in modifying the levels of sterol compounds in seed oils obtained from plant seeds.
Also of interest is the use of expression constructs of the present invention to alter oil production in a host cell and in particular to increase oil production. Of particular interest is the use of expression constructs containing nucleic acid sequences encoding LCAT and/or ACAT polypeptides to transform host plant cells and to use these host cells to regenerate whole planis having increase oil production as compared to the same plant not containing the expression construct.
The oils obtained from transgenic plants having modified sterol compound content find use in a wide variety of applications. Of particular interest in the present invention is the use of Uie oils containing modified levels of sterol compounds in applications involved in improving human nutrition and cardiovascular health. For example, phytostanols are beneficial for lowering serum cholesterol (Ling, et al (1995) Life Sciences 57:195-206).
Cholesterol-lowering compositions comprise the oils and sterol ester compound compositions obtained using the methods of the present invention. Such cholesterol lowering compositions include, but are not limited to foods, food products, processed foods, food ingredients, food additive compositions, or dietary/nutritional supplements that contain oils and/or fats. Non-limiting examples include margarines; butters; shortenings; cooking oils; frying oils; dressings, such as salad dressings; spreads; mayonnaises; aiKi vitamin/mineral supplements. Patent documents relating to such compositions include, U.S. Patents 4,588,717 and 5,244,887, and PCT International Publication Nos. WO 96/38047, WO 97/42830, WO 98/06405, and WO 98/06714. Additional non-limiting examples include toppings; dairy products such as cheese and processed cheese; processed meat; pastas; sauces; cereals; desserts, mcludlng frozen and shelf-stable desserts; dips; chips; baked goods; pastries; cookies; snack bars; confections; chocolates; beverages; unextracted seed; and unextracted seed that has been

ground, cracked, milled, rolled, extruded, pelleted, defatted, dehydrated, or oiherwise processed, but which still contains the oils, etc., disclosed herein.
The cholesterol-lowering compositions can also take the form of pharmaceuricai compositions comprising a cholesterol-lowering effective amount of the oils or sterol compound compositions obtained using the methods of the present invention, along with a pharmaceutically acceptable carrier, excipient, or diluent. These pharmaceutical compositions can be in the form of a liquid or a solid. Liquids can be solutions or suspensions; solids can be in the form of a powder, a granule, a pill, a tablet, a gel, or an extrudate. U.S. Patent 5,270,041 relates to sterol-containing pharmaceutical compositions.
Thus, by expression of the nucleic acid sequences encoding acyltransferase-iike sequences of the present mvention in a host cell, it is possible to modify the lipid content and/or composition as well as the sterol content and/or composition of the host ceil.
The inventioa now being generally described, it will be more readily understood by reference to the following examples which are included for purposes of illustration only and are not intended to limit the present invention.
EXAMPLES Example 1: RKA Isolations
Total RNA from the inflorescence and developing seeds of Arabidopsis thaUana was isolated for ixse in construction of complementary (cDNA) libraries. The procedure was an adaptation of the DNA isolation protocol of Webb and Knapp (D.M. Webb and S.J. Knapp, (1990) Plant Molec. Reporter, 8,180-185). The following description assumes the use of Ig fresh weight of tissue. Frozen seed tissue was powdered by grinding under liquid nifrogen. The powder was added to 10ml REC buffer (50mM Tris-HCl, pH 9, 0.8M NaCl, lOmM EDTA, 0.5% w/v CTAB (cetyltrimethyl-ammonium bromide)) along wiih 0.2g insoluble polyvinylpolypyrrolidone, and ground at room temperature. The homogenate was centrifiiged for 5 minutes at 12,000 xg to pellet msoluble material. The resulting supernatant fraction was extracted with chlorofonn, and the top phase was recovered.
The RNA was then precipitated by addition of 1 volume RecP (SOmM Tris-HCL pH9, lOmM EDTA and 0.5% (w/v) CTAB) and collected by brief centrifiigation as before. The RKA pellet was redissoWed in 0.4 ml of IM NaCl. The RNA pellet was redissolved in water and extracted with phenol/chloroform. Sufficient 3M potassium acetate (pH 5) ws added to make the mixture 0.3M in acetate, followed by addition of two volumes of

ethanoi to precipitate the RNA. After washing with ethanol, this final RNA precipitate was dissolved in water and stored frozen.
Alternatively, total RNA may be obtained using TRIzol reagent (BRL-Lifetechnologies, Gaithersburg, MD) following the manufacturer"s protocol. The RNA precipitate was dissolved in water and stored frozen.
Example 2: Identification of LCAT Sequences
Searches were performed on a Silicon Graphics Unix computer using additional Bioaccellerator hardware and Gen Web software supplied by Compugen Ltd. This software and hardware enabled the use of the Smith-Waterman algorithm in searching DNA and protein databases using profiles as queries. The program used to query protein databases was profilesearch. This is a search where the query is not a single sequence but a profile based on a multiple alignment of amino acid or nucleic acid sequences. The profile was used to query a sequence data set, i.e., a sequence database. The profile contained all the pertinent information for scoring each position in a sequence, in effect replacing the "scoring matrix" used for the standard query searches. The program used to query nucleotide databases with a protein ptofiie was tprofilesearch. Tprofilesearch searches nucleic acid databases using an amino acid profile query. As the search is running, sequences in the database are translated to amino acid sequences in six reading fi-ames. The output file for tprofilesearch is identical to the output file for profilesearch except for an additional column that indicates liie frame in which the best alignment occurred.
The Smith-Waterman algorithm, (Smith and Waterman (1981)/ Molec. Biol. 147:195-197), was used to search for similarities between one sequence from the query and a group of sequences contained in the database.
A protein sequence of Lecithin: cholesterol acyltransferase from human (McLean J, etal. (1986) Nucleic Acids Res. 14(23):93 97-406 SEQ ID N0:1)) was used to search the NCBI non-redundant protein database using BLAST. Three sequences were identified from Arabidopsis, GenBank accessions AC004557 (referred to herein as LCATl, SEQ ID N0:2), AC003027 (referred to herein as LCAT2, SEQ ID N0:4), and AL024486 (referred to herein as LCAT3, SEQ K) N0:6). The deduced amino acid sequences are provided in SEQ ID NOs: 3, 5, and 7, respectively.
The profile generated fiom the queries using PSI-BLAST was excised from the hyper text markup language (html) file. The worldwide web (www)/html interface to

psiblast at ncbi stores the current generated profile matrix in a hidden field in the html file that is returned after each iteration of psiblast. However, this matrix has been encoded into string62 (s62) format for ease of transport through html. String62 format is a simple conversion of the values of the matrix into html legal ascii characters.
The encoded matrix width (x axis) is 26 characters, and comprise the consensus characters, the probabilities of each amino acid in the order A3,C J>,E,F,G Ji,I,K.,L,M,N, P,Q,R,S,T,V,W,X,Y,Z (where B represents D and N, and Z represents Q and E, and X represents any amino acid), gap creation value, and gap extension value.
The length (y axis) of the matrix corresponds to the length of the sequences identified by PSI-BLAST. The order of the amino acids corresponds to the conserved amino acid sequence of the sequences identified using PSI-BLAST, widi the N-terminal end at the top of the matrix. The probabilities of otiier amino acids at that position are represented for each amino acid along the x axis, below the respective single letter amino acid abbreviation.
Thus, each row of the profile consists of the highest scoring (consensus) amino acid, followed by the scores for each possible amino acid at that position in sequence matrix, the score for opening a gap that that position, and the score for continuing a gap at that position.
The string62 file is converted back into a profile for use in subsequent searches. The gap open field is set to U and the gap extension field is set to 1 along the x axis. The gap creation and gap extension values are known, based on the settings given to the PSI¬BLAST algorithm. The matrix is exported to the standard GCG profile fonn. This format can be read by Gen Web.
The algorithm used to convert the string62 formatted file to the matrix is outlined
in Table 1. "

Table 1
1. if encoded characrer 2 then the value is blast score min
2. if encoded character Z then the value is blast score max
3. else if the encoded character is uppercase then its value is (64-(asczi # of char))
4. else if the encoded character is a digit the value is ((ascii # of char)-48)
5. else if the encoded character is not uppercase then the value is ((ascii # of char) - 87)
6. ALL B positions are set to min of D and N amino acids at that row in sequence matrix
7. ALL Z positions are set to min of Q amd E amino acids at that row in sequence matrix
8. ALL X positions are set to min of all amino acids at that row in sequence matrix
9. kBLAST_SCORE_MAX=999;
10. kBLAST_SCORE_MIN=-999;
U. all gap opens are set to 11
12. all gap lens are set to 1
The protein sequences ofLCATl, LCAT2, and LCAT3 as well as the PSI-BLAST profile were used to search public and proprietary databases for additional LCAT sequences. Two EST sequences were identified which appear to be identical to LCATi and LCAT3, respectively. One additional Arabidopsis sequence was identified from the proprietary databases, LCAT4 (SEQ ID NO".S). The deduced protein sequence of LCAT4 is provided in SEQ 10 N0-.9. Two additional genomic sequences were identified using the PSI-BLAST profile from libraries of Arabidopsis ecotypes Columbia and Landsberg, LCAT7 (SEQ ID NO-.IO) and LCAT8 (SEQ ID NO:Ii). The LCAT7 sequence was present in both the Columbia and Landsberg genomic libraries, while the LCAT8 sequence was only present in the Columbia library.
An open reading frame was predicted from the genomic sequence of LCAT7 in the Arabidopsis public database and this sequence was called MSH12 (referred to herein as LCAT5, SEQ ID NO; 73). The deduced protein sequence of LCAT5 is provided in SEQ ID NO". 74.
The PSI-BLAST profile and the LCAT sequences were used to query the pubhc yeast database and proprietary libraries containing com and soy EST sequences. The yeast genome contains only one gene, LROl (LCAT Related Open reading frame, YNR008W, Figure I) with distinct similarity to the human LCAT. The DNA sequence of LROl is

provided in SEQ ID NO: 75 and the protein sequence is provided in SEQ ID NO: 76. Seven EST sequences were identified fi-om soybean libraries as being LCAT sequences. Two sequences from soy (SEQ ID NOs: 12 and 13) are most closely related to the Arabidopsis LCATl sequence, a single sequence was identified as being most closely related to LCAT2 (SEQ ID NO; 14), three were closely related to LCAT3 (SEQ ID NOs: 15-17), and an additional single sequence was identified (SEQ ID NO: 18). A total of 11 com EST sequences were identified as being related to \hs Arabidopsis LCAT sequences. Two com EST sequences (SEQ ID NOs: 19 and 20) were most closely related to LCATl, two sequences were identified as closely related to LCAT2 (SEQ ID NOs: 21 and 22), four com EST sequences were identified as closely related to LCAT3 (SEQ ID NOs: 23-26), and an additional three com EST sequences were also identified (SEQ ID NOs: 27-29).
Example 3: Identification of AC AT Sequences
Since plant ACATs are unknown in the art, searches were performed to identify known and related ACAT sequences fi"om mammahan sources fi-om public databases. These sequences were then used to search public and proprietary EST databases to identify plant ACAT-like sequences.
A public database containing mouse Expressed Sequence Tag (EST) sequences (dBEST) was searched for ACAT-like sequences. The search identified two sequences (SEQ ID 30and3I) which were related (approximately 20% identical), but divergent, to known ACAT sequences.
In order to identify ACAT-like sequences fim other organisms, the two mouse
ACAT sequences were used to search public and proprietary databases containing EST
/ sequences fi"om human and rat tissues. Results of the search identified several sequences
fi-om the human database and from the rat database which were closely related to the
mouse sequences. The human and rat ACAT-like EST sequences were assembled, using
the GCG assembly program, to construct a complete inferred cDNA sequence by
identifying overlapping sequences (SEQ ID NOs: 32 and 33, respectively).
/ The protein sequence of the human ACAT-like sequence was aligned with known
ACAT sequences from human (Chang, ei al. (1993) J. Biol. Chem. 268:20747-20755, SEQ ID NO:34), mouse (Uehnen, et al. (1995) J. Biol Chem. 270:26192-26201 SEQ ID NO;35) and yeast (Yu, et al (1996) J. Biol Chem. 271:24157-24163, SEQ ID NO:36 and Yang, et al (1996) Science 272:1353-1356, SEQ ED NO:37) using MacVector (Oxford Molecular, Inc.). Results of the alignment demonstrated that the sequence was related to

the known sequences, however the related sequence was only about 25% similar to the known sequences.
The protein sequence of the human sterol O-acyltransferase (ACAT, Acyl Co A; Cholesterol acyltransferase, Accession number A48026) related sequence was used to search protein and nucleic acid Genbank databases. A single plant homologue was identified in the public Arabidopsis EST database (Accession A042298, SEQ ID NO:38). The protein sequence (SEQ ID NO:39)was translated from the EST sequence, and was found to contain a peptide sequence conserved in both mammalian and yeast ACATs (Chang et ai, (1997) Ann. Rev. Biochem., 66:613-638).
To obtain the entire coding region corresponding to the Arabidopsis ACAT-like EST, synthetic oligo-nucleotide primers were designed to amplify the 5" and 3" ends of partial cDNA clones containing ACAT-like sequences. Primers were designed according to the Arabidopsis ACAT-like EST sequence and were used in Rapid Amplification of cDNA Ends (RACE) reactions (Frohman et ai (1988) Proc. Natl. Acad. Sci. USA 85:8998-9002).
Primers were designed (5"-TGCAAATTGACGAGCACACCAACCCCTTC-3" (SEQ ID NO:40) and 5"-AAGGATGCTTTGAGTTCCTGACAATAGG-3" (SEQ ID N0:4I)) to amplify the 5" end from the Arabidopsis ACAT EST sequence. Amplification of flanking sequences from cDNA clones were performed using the Marathon cDNA Amplification kit (Clontech, CA).
The sequence derived from the 5"-RACE amphfication was used to search proprietary v4ra6ic?op5w EST libraries. A smgle EST accession, LIB25-088-C7 (SEQ ID NO:42), was identified which contamed a sequence identical to the 5"-RACE sequence. Furthermore, LIB25-088-C7 was found to contain the complete putative coding sequence for the Arabidopsis ACAT-like product.
The nucleic acid as well as the putative translation product sequences of A042298 were used to search public and proprietary databases. Four EST sequences were identified in both soybean (SEQ ID NOs:43-46) and maize (SEQ ID NOs:47-50) proprietary databases, and a single ACAT-like sequence was identified fixjm Mortierrella alpina EST sequences (SEQ ID N0:51).
Sequence alignments between ACAT sequences from several different sources were compared to identify the similarity between the sequences. Nucleotide sequences from known human and mouse ACATs, as well as nucleotide sequences from known yeast ACATs were compared to the ACAT-like EST sequences fiim human and Arabidopsis,

Analysis of the sequence ahgmnents revealed several classes of ACATs based on sequence similarity. The known human and mouse ACATs, being 88% similar in the nucleotid"e sequence, formed one class of ACATs. Another class of ACATs included the yeast ACATs which are less than 20% similar to the known human and mouse class ACATs.
The final class of ACATs included the Arabidopsis and human sequences disclosed in the present invention. This class is approximately 22%o similar to the known human and mouse ACAT class and approximately 23% similar to the yeast class of ACATs. Thus, the ACAT sequences disclosed in the present invention represent a novel class of ACAT enzymes. Partial mouse sequences of this class ate also provided.
Example 4: Expression Construct Preparation
Constructs were prepared to direct expression of the LCATI, LCAT2, LCAT3, LQAT4, LCAT5 and the yeast LROl sequences in plants and cultured insect cells. The entire coding region of each LCAT was amplified from the appropriate EST clone or an Arabidopsis genomic cDNA library using the following oligonucleotide primers in a polymerase chain reactions (PCR). The LCATI coding sequence was amplified from the EST clone Lifa25-082-Ql-El-G4 using the primers
5"-GGATCCGCGGCCGCACAATGAAAAAAATATCTTCACATTATTCGG-3"(SEQ ID NO:52) and 5"-GGATCCCCTGCAGGTCATTCATTGACGGCATTAACATTGG-3" (SEQ ID NO:53). The LCAT2 coding sequence was amplified fim an Arabidopsis genomic cDNA library using the synthetic oligo nucleotide primers 5" -GGATCCGCGGCCGCACAATGGGAGCGAATTCGAAATCAGTAACG-3" (SEQ ID NO:54) and 5"-GGATCCCCTGCAGGTTAATACCCACTrrTATCAAGCTCCC-3" (SEQ ID NO:55). The LCAT3 coding sequence was amplified fix)m the EST clone LIB22-004-Q1-E1-B4 using the synthetic oligo nucleotide primers 5"-GGATCCGCGGCCGCACAATGTCTCTATTACTGGAA GAGATC-3" (SEQ ID NO:56) and 5"-GGATCCCCTGCAGGTTATGCATC AACAGAGACACTTACAGC-3" " (SEQ ID NO-.57) . The LCAT4 coding sequence was amplified from the EST clone LIB23-007-Q1-E1-B5 using the synthetic oligo nucleotide primers 5"-GGATCCGCGGCCGCACAATGGGCTGGATTCCGTGTCCGTGC-3" (SEQ ID NO:58) and 5"-GGATCCCCTGCAGGTTAACCAGAATCAACTACTTTGTG-3" (SEQ ID NO:59). The LCAT5 coding sequence was amplified from LIB23-053-Q1-E1-E3 using the synthetic oUgo nucleotide primers

5"-GGATCCGCGGCCGCACAATGCCCCTTATTCATCGG-3" (SEQ ID NO;77) and 5"-GGATCCCCTGCAGGTCACAGCTTCAGGTCAATACG-3" (SEQ ID NO:78).
The yeast LROI coding sequence was amplified from genomic yeast DNA using the synthetic oligo nucleotide primers
5"GGATCCGCGGCCGCACAATGGGCACACTGTTTCGAAG3" (SEQ ID NO:79) and 5"GGATCCCCTGCAGGTTACATTGGGAAGGGCATCTGAG3" (SEQ ID NO:80).
The entire coding region ofHicArabidopsis ACAT sequence (SEQ ID NO: 42) was amplified from the EST clone LIB25-088-C7 using oligonucleotide primers 5"-TCGACCTGCAGGAAGCTTAGAAATGGCGATTTTGGATTC-3" (SEQ ID NO: 60) and S"-GGATCCGCGGCCGCTCATGACATCGATCCTTTTCGG-S" (SEQ ID NO: 61) in a polymerase chain reaction (PCR).
Each resulting PCR product was subcloned into pCR2.1Topo (Invitrogen) and labeled pCGN9964 (LCATl), pCGN9985 (LCAT2), pCGN9965 (LCAT3), pCGN9995 (LCAT4), pCGN10964 (LCAT5), pCGN10963 (LROI), and pCGN8626 (ACAT). Double stranded DNA sequence was obtained to verify that no errors were introduced by the PCR amplification.
4A. Baculovinis Expression Constracts
Constructs are prepared to direct the expression of the Arabidopsis LCAT and yeast LCAT sequences in cultured insect cells. The entire coding region of the LCAT proteins was removed from the respective constructs by digestion with Notl and 5se8387I, followed by gel electrophoresis and gel purification. The fragments containing the LCAT coding sequences were cloned into Noil and Pstl digested baculovinis expression vector pFastBl (Gibco-BRL, Gaithersburg, MD). The resuUing baculovinis expression constructs were referred to as pCGN9992 (LCATl), pCGN9993 (LCAT2), pCGN9994 (LCAT3), pCGN10900 (LCAT4), pCGNI0967 (LCAT5), and pCGNl0962 {LROI).
4B. Plant Expression Construct Preparation
A plasmid containing the napin cassette derived from pCGN3223 (described in U.S. Patent No. 5,639,790, the entirety of which is incorporated herein by reference) was modified to make it more usefiil for cloning large DNA fiagments containing multiple restriction sites, and to allow the cloning of multiple napin fusion genes into plant binary transformation vectors. An adapter comprised of the self annealed oligonucleotide of sequence 5"-

/"

CGCGATTTAAATGGCGCGCCCTGCAGGCGGCCGCCTGCAGGGCGCGCCATTTA AAT-3" (SEQ ID NO:62) was ligated into the cloning vector pBC SK+ (Stratagene) after digestion with the restriclion endonuclease BssHU to construct vector pCGN7765. Plamids pCGN3223 and pCGN7765 were digested with Noa and ligaied together. The resultant vector, pCGN7770, contained the pCGN7765 backbone with the napin seed specific expression cassette from pCGN3223.
The cloning cassette, pCGN7787, contained essentially the same regulatory elements as pCGN7770, with the exception of the napin regulatory regions of pCGN7770 have been replaced with the double CAMV 35S promoter and the tml polyadenylation and transcriptional termination region.
A binary vector for plant transfonnation, pCGN5139, was constructed from pCGNl558 (McBride and Summerfeit, (1990) Plant Molecular Biology, 14:269-276). In pCGN5139, the polylinker of pCGN1558 was replaced as a HindIII/Asp718 fragment with a polylinker containing unique restriction endonuclease sites, AscI, Pad, Xbal, Swal, BamHI,and Notl. The Asp718 and HindHI restriction endonuclease sites are retained in pCGN5139.
A series of turbo binary vectors was constructed to allow for the rapid cloning of DNA sequences into binary vectors containing transcriptional initiation regions (promoters) and transcriptional termination regions.
The plasmid pCGNS618 was constructed by ligaling oligonucleotides 5"-TCGAGGATCCGCGGCCGCAAGCTTCCTGCAGG-3" (SEQrDNO:63) and 5"-TCGACCTGCAGGAAGCTTGCGGCCGCGGATCC-3" (SEQ ID NO:64) into Sall/XhoI-digested pCGN7770. A fragment containing the napin promoter, polylinker and napin 3" region was excised from pCGN86l8 by digestion with Asp71SI; the fragment was blunt-ended by filling in the 5" overhangs with Klenow fragment then ligated into pCGN5139 that had been digested with Asp718I and HindUI and blunt-ended by filling in the 5" overhangs with Klenow fragment. A plasmid containing the insert oriented so that the napin promoter was closest to the blunted Asp71 SI site ofpCGN5139 and the napin 3" was closest to the blunted Hindlll site was subjected to sequence analysis to confirm both the insert orientation and the integrity of cloning junctions. The resulting plasmid was designated pCGN8622.
The plasmid pCGN8619 was constructed by ligating oligonucleotides 5"-TCGACCTGCAGGAAGCTTGCGGCCGCGGATCC-3" (SEQ ID NO:65) and

5"-TCGAGGATCCGCGGCCGCAAGCTTCCTGCAGG-3" (SEQ ID NO:66) into Sall/XhoI-digested pCGN7770. A fragment containing the napin promoter, poiylinker and napin 3" region was removed from pCGN86I9 by digestion with Asp718I; the fragment was blunt-ended by filling in the 5" overhangs with Klenow fragment then ligated into pCGN5139 that had been digested with Asp718I and Hindlll and blunt-ended by filling in the 5" overhangs with KJenow fragment. A plasmid containing the insen oriented so that the napin promoter was closest to the blunted Asp7181 site of pCGN5139 and the napin 3" was closest to the blunted HindlH site was subjected to sequence analysis to confirm both the insert orientation and the integrity of cloning junctions. The resulting plasmid was designated pCGN8623.
The plasmid pCGN8620 was constructed by ligating oligonucleotides 5"-TCGAGGATCCGCGGCCGCAAGCTTCCTGCAGGAGCT -3" (SEQ ID NO:67) and 5"-CCTGCAGGAAGCTTGCGGCCGCGGATCC-3" (SEQ ID NO:68) into Sall/Sacl-digested pCGN7787. A fragment containing the d35S promoter, poiylinker and tml 3" region was removed from pCGNS620 by complete digestion with Asp7181 and partial digestion with Notl. The fragment was blunt-ended by filling in the 5" overhangs with Klenow fragment then ligated into pCGN5139 that had been digested with Asp7181 and Hindm and blunt-ended by filling in the 5" overiiangs with Klenow fragment. A plasmid containing tiie insert oriented so that the d35S promoter was closest to the blunted Asp7181 site of pCGN5139 and the tml 3" was closest to the blunted Hindlll site was subjected to sequence analysis to confirm both the insert orientation and the integrity of cloning jimctions. The resulting plasmid was designated pCGNS624.
The plasmid pCGN8621 was constructed by ligating oligonucleotides 5"-TCGACCTGCAGGAAGCTTGCGGCCGCGGATCCAGCT -3" (SEQ ID H0:€9) and 5"-GGATCCGCGGCCGCAAGCTTCCTGCAGG-3" (SEQ ID NO-.70) into Sall/Sacl-digested pCGN7787. A fragment containing the d35S promoter, poiylinker and tml 3" region was removed from pCGN8621 by complete digestion with Asp718I and partial digestion with Notl. The fragment was blunt-ended by filling m the 5" overhangs with Klenow fragment then ligated into pCGN5139 that had been digested with Asp718I and Hindlll and blunt-ended by filling in the 5" overhangs with Kienow fragment. A plasmid containing the insert oriented so that the d35S promoter was closest to the blunted Asp7I8I site of pCGN5139 and the tml 3" was closest to the blunted HindHI site was subjected to sequence analysis to confirm both the insert orientation and the integrity of :loning junctions. The resulting plasmid was designated pCGN8625.

The plasmid construct pCGN8640 is a modification of pCGN8624 described above. A 938bp PstI fragment isolated from iransposon Tn7 which encodes bacterial spectinomycin and streptomycin resistance {Fling et al. (1985), Nucleic Acids Research 13(19);7095-7106), a determinant for E. coli and Agcobacterium selection, was blunt ended with Pfii polymerase. The blunt ended fragment was ligated into pCGN8624 that had been digested with Spel and blunt ended with Pfii polymerase. The region containing the PstI fragment was sequenced to confimi both the insert orientation and the integrity of cloning junctions.
The spectinomycin resistance marker was introduced into pCGN8622 and pCGN8623 as follows. A 7.7 Kbp Avrll-SnaBI fragment from pCGN8640 was ligated to a 10.9 Kbp AvrU-SnaBI fragment from pCGN8623 or pCGN8622, described above. The resulting plasmids were pCGN8641 and pCGN8643, respectively.
The plasmid pCGN8644 was constructed by hgating oligonucleotides 5"-GATCACCTGCAGGAAGCTTGCGGCCGCGGATCCAATGCA-3" (SEQ ID N0:71) and 5"-TTGGATCCGCGGCCGCAAGCTTCCTGCAGGT-3" (SEQ ID NO:72) into BamHI-PstI digested pCGN8640.
4C. Plant LCAT Expression Construct Preparation
The codmg sequence of LCATl was cloned from pCGN9964 as dNotV SseZZZll fragment into pCGN8640, pCGN8641, pCGN8643, and pCGN8644 to create the expression constructs pCGN9960, pCGN996U pCGN9962, and pCGN9963, respectively. The construct pCGN9960 was designed to express the LCATl coding sequence in the sense orientation from the constitutive promoter CaMV 35S. The construct pCGN996I was designed to express the LCATl coding sequence in the anlisense orientation from the napin promoter. The construct pCGN9962 was designed to express the LCATl coding sequence in the sense orientation from the napin promoter. The construct pCGN9963 was designed to express the LCATl coding sequence in the antisense orientation from the constitutive promoter CaMV 35S.
The coding sequence of LCAT2 was cloned from pCGN9985 as zNotV &e8387I fragment into pCGN8640, pCGN8641, pCGN8643, and pCGN8644 to create the expression constructs pCGN9981, pCGN9982, pCGN9983, and pCGN9984, respectively. The construct pCGN9981 was designed to express the LCAT2 coding sequence in the sense orientation from the constitutive promoter CaMV 35S. The construct pCGN9982 was designed to express the LCAT2 coding sequence in the antisense orientation from the

napin promoter. The construct pCGN9983 was designed to express the LCAT2 coding sequence in the sense orientation from the napin promoter. The construct pCGN9984 was designed to express the LCAT2 coding sequence in the antisense orientation from the constitutive promoter CaMV 35S.
The coding sequence of LCAT3 was cloned from pCGN9965 as a NotV SseZIZll fragment into pCGN8640, pCGN8641, pCGN8643, and pCGN8644 to create the expression constructs pCGN9966, pCGN9967, pCGN9968, and pCGN9969, respectively. The construct pCGN9966 was designed to express the LCAT3 coding sequence in the sense orientation from the constitutive promoter CaMV 35S. The construct pCGN9967 was designed to express the LCAT3 coding sequence in the antisense orientation from the napin promoter. The construct pCGN9%8 was designed to express the LCAT3 coding sequence in the sense orientation fim the napin promoter. The construct pCGN9969 was designed to express the LCAT3 coding sequence in the antisense orientation from the constitutive promoter CaMV 35S.
The coding sequence of LCAT4 was cloned from pCGN9995 as aA"o/I/ &e8387I fragment into pCGN8640, pCGN864l, pCGN8643, and pCGN8644 to create the expression constructs pCGN9996, pCGN9997, pCGN9998, and pCGN9999, respectively. The construct pCGN9996 was designed to express the LCAT4 coding sequence in the sense orientation from the constitutive promoter CaMV 35S. The construct pCGN9997 was designed to express the LCAT4 coding sequence in the antisense orientation from the napin promoter. The construct pCGN9998 was designed to express the LCAT4 coding sequence in the sense orientation from the napin promoter. The construct pCGN9999 was designed to express the LCAT4 coding sequence in the antisense orientation from the constitutive promoter CaMV 35S.
The coding sequence of LCAT5 was cloned from pCGN10964 asa-NotU Sse%Z%li fragment into pCGN9977 and pCGN9979, to create the expression constructs pCGN10965, and pCGN10966, respectively. The construct pCGNl0965 was designed to express the LCAT5 coding sequence in the sense orientation from the constitutive promoter CaMV 35S. The construct pCGN10966 was designed to express the LCAT5 coding sequence in the sense orientation from the napin promoter.
The coding sequence of LROl was cloned from pCGN10963 as a NotV &e8387I fragment into pCGN9977 and pCGN9979, to create the expression constructs pCGNl0960, and pCGN10961, respectively. The construct pCGN10960 was designed to express the LPOJ coding sequence in the sense orientation fim the constitutive promoter

CaMV35S. The construct pCGNl0961 was designed to express the iO/ coding sequence in the sense orientation from the napin promoter.
4D. Plant ACAT Expression Construct Preparation
A fragment containing the Arabidopsis ACAT-Iike coding region was removed fi-om pCGN8626 by digestion with Sse8387I and Not I. The fragment containing the ACAT-Iike sequence was ligated into Pstl-Not I digested pCGN8622. The resulting plasmid was designated pCGN8627. DNA sequence analysis confirmed the integrity of the cloning junctions.
A fragment containing the Arabidopsis ACAT-Iike coding region (SEQ ID NO: 42) was removed from pCGN8626 by digestion with Sse8387I and Not I. The fragment was ligated into Pstl-Not I digested pCGN8623. The resulting plasmid was designated pCGN8628. DNA sequence analysis confirmed the integrity of the cloning junctions.
A fragment containing the Arabidopsis ACAT-Iike coding region was removed from pCGN8626 by digestion with Sse8387 and Not I. The fragment was ligated into Pstl-Not I digested pCGN8624. The resulting plasmid was designated pCGN8629. DNA sequence analysis confirmed the integrity of the cloning junctions.
A fragment containing the Arabidopsis .ACAT-Iike coding region was removed from pCGN8626 by digestion with SseS387 and Not I. The fragment was Ugated into Pstl-Nor I digested pCGN8625. The resulting plasmid was designated gCGN8630. DNA sequence analysis confirmed the integrity of the cloning junctions.
An additional expression construct for the suppression of endogenous ACAT-Iike activity was also prepared. The construct pCGN8660 was constructed by cloning approximately 1 Kb of the Arabidopsis ACAT-Iike coding region from pCGNS626 in the sense orientation, and the fiiU-length Arabidopsis ACAT-Iike coding region in the antisense orientation under the regulatory control of the napin transcription initiation sequence.
For expression of the rat ACAT-!ike sequence in plants, the NotI-Sse8387I fragment of pCGN8592 was cloned into Notl-PstI digested binary vectors pCGN862I, pCGN8622, and pCGN8624 to yield plasmids, pCGN 9700, pCGN9701, andpCGN9702, respectively. Plasmid pCGN9700 expresses a sense transcript of the rat ACAT-Iike cDNA under control of a napin promoter, plasmid pCGN9701 expresses an antisense transcript of the rat ACAT-Iike cDNA under control of a napin promoter, and plasmid pCGN9702 expresses a sense transcript of the rat ACAT-Iike cDNA under control of a double 35S

promoter. Ptasmids pCGN 9700, pCGN970l, and pCGN9702 were introduced in Agrobacterium tumefaciens EPIAIOI.
Constructs were prepared to direct the expression of the rat ACAT-like sequence in the seed embryo of soybean and the endosperm of com. For expression of the rat ACAT-like DNA sequence in soybean, a 1.5 kb NotVSsefitll fragment from pCGN8592 containing the coding sequence of the rat ACAT-Uke sequence was blunt ended using Mung bean nuclease, and ligated into the Smal site of the turbo 7S binary/cloning vector pCGN8809 to create the vector pCGN8817 for transformation into soybean by particle bombardnient. The vector pCGN8817 contained the operably linked components of the promoter region of the soybean a" subunit of p-conglycinin (7S promoter, (Chen ei a!., (1986), Proc. Natl. Acad. Sci., 83:8560-8564), the DNA sequence coding for the entire rat ACAT-iike protein, and the transcriptional tennination region of pea RuBisCo small subunit, referred to as E9 3" (Coruzzi, et al (1984) EMBOJ. 3:1671-1679 and Morelli, et al. (1985) Nature 315:200-204). This construct further contained sequences for the selection of positive transformed plants by screening for resistance to glyphosate using the CP4 EPSPS (U.S. Patent 5,633,435) expressed under the control of the figwort mosaic viriis (FMV) promoter (U.S. Patent Number 5,378,619) and the transcriptional termination region of E9.
For expression of the rat AC AT-like sequence in the com endosperm, a 1.5 kb NotVSse"&ZZll fragment from pCGN8S92 containing the coding sequence of the rat ACAT-like sequence was blunt ended using Mung bean nuclease, and Ugated into the BamYQ. site of the rice pGtl expression cassette pCGNS592 for expression from the pGtl promoter (Leisy, D.J. et al.. Plant Mol. Biol. 14 (1989) 41-50) and the HSP70 intron sequence (U.S. Patent Number 5,593,874). This cassette also included the transcriptional termination region downstream of the cloning site of nopaline synthase, nos 3" (Depicker et al., J. Molec. Appl. Genet. (1982) 1: 562-573). A 7.5 kb fragment containing the pGtl promoter, the DNA sequence encoding the rat ACAT-like protein, and the nos transcriptional termination sequence was cloned into the binary vector pCGN8816 to create the vector pCGN8818 for transformation into com. This construct also contained sequences for the selection of positive transfonnants with kanamycin using the kanamycin resistance gene from Tn5 bacteria under the control of the CAMV 358 promoter and tml transcriptional termination regions.

Example 5: Expression in Insect Cell Culture
A bacuiovirus expression system was used to express the LCAT cDNAs in cultured insect cells.
The bacuiovirus expression constructs pCGN9992, pCGN9993, pCGN9994, pCGN10900, pCGNl0962, and pCGN10967 were transformed and expressed using the BAC-to-BAC Bacuiovirus Expression System (Gibco-BRL, Gaithersburg, MD) according to the manufacturer"s directions.
The transformed insect cells were used to assay for acyltransferase activities using methods known in the art (see Example 8).
Example 6: Plant Transformation
A variety of methods have been developed to insert a DNA sequence of interest into the genome of a plant host to obtain the transcription or transcription and translation of the sequence to effect phenotypic changes. Transgenic plants were obtained by Agrobacterium-mediated transformation as described by Radke et al. {Theor. Appl. Genet. (1988) 75:685-694; Plant Cell Reports (1992) 77:499-505). Alternatively, microprojectile bombardment methods, such as described by Klein et al. {Bio/Technology 70:286-291) may also be used to obtein nuclear transformed plants. Other plant species may be similarly transformed using related techniques.
The plant binary constructs described above were used in plant transformation to direct the expression of the sterol acyltransferases in plant tissues. Binary vector constructs were transformed into strain EHAlOl Agrobacterium cells (Hood et al., J. Bacterial (1986) 7(55:1291-1301), by the method of Holsters etal. (Mol. Gen. Genet. (1978) 7(55:181-187). Transgenic Arabidopsis thaliana plants were obtained by Agrobacterium-mtizi&d transformation as described by Valverkens et al., (Proc. Nat. Acad. Sci. (1988) 55:5536-5540), Bent et al. ((1994), Science 265:1856-1860), and Bechtold et al. ((1993), C. }i.Acad. Sci., Life Sciences 316:1194-1199).
Example 7: Plant Assays for Modified Sterol Content/Profile 7A: NMRofT2£eed
Seed from plants expressing LCAT 1 through 4 under the control of the napin promoter were analyzed by NMR. Arabidopsis seeds from transgenic plants were placed directly into wide-mouth MAS NMR sample tubes.

High-resolution spectra were measured at II.7 T (1H=500 MHz, 13C=125 mHz) using Varian NMR Instruments (Palo Alto, CA) Inova™ NMR spectrometers equipped with carbon-observe MAS Nanoprobes™. The 13C spectra were acquired without a field-frequency lock at ambient temperahire (prox. 21-22""C) for 14 hours using the following conditions: spectral width = 29.996 kHz, acquisition time = 2.185 seconds, p/2 pulse (3.8 ms) with no relaxation delay, IH g B2 = 2.5 kHz with Waltz decoupling. Data processing conditions were typically: digital resolution = 0.11 Hz, 0.3 to 1.5 Hz line broadening and time-reversed linear prediction of the first three data points. Chemical shifts were referenced by adding neat tetramethylsilane (TMS) to Arabidopsis seeds and using the resulting referencing parameters for subsequent spectra. The 13C resolution was 2-3 Hz for the most narrow seed resonances. Spectral resolution was independent of MAS spinningspeeds(1.5-3.5kHz)anddata were typically obtained with 1.5 kHz spinning speeds. Spinning sidebands were approx. 1% of the main resonance. PhytosteroI13C assignments were based on model samples composed of triolein, P-sitosterol and cholesterol oleate. Triacylglycerol 13C assignments were made fiiam comparison with literature assignments or with shifts computed fi"om a 13C NMR database (Advanced Chemical Development, Inc., version 3.50, Toronto Canada).
The results of these analyses are displayed in Figure 2 and show that there was a trend of an approximately 2 fold increase of phytosterols in the seeds derived from plant line 5 expressing the LCAT 4 gene (pCGN9998) under the control of the napin promoter. During the course of this analysis it was also noted that the average oil content of seed from plants expressing the LCAT2 construct (pCGN9983) under the control of the napin promoter was higher than that of controls. This is the first in planta evidence supporting the concept that ovcrexpression of a nucleotide sequence encoding a lecithinxholesterol acyltransferase-like polypeptide can increase oil content.
7B;HPLC/MSofT2seed
Seed oil from T2 plants expressing LCATl through 4 under the control of the napin promoter was extracted using an accelerated solvent extractor (ASE) method. Seed samples were ground, using a mortar and pestle, to achieve a fine homogeneous meal. Oil was obtained using a Dionex Accelerated Solvent Extractor (ASE). Clean ground seed w added to an equal amount of diatom£«;eous earth- The ground seed sample and the diatomaceous earth were thoroughly mixed until a homogeneous texture was achieved.

The sample was then loaded into the instrument and oil extraction was achieved using hexane under validated laboratory protocols.
Oil from these seed samples was then analyzed for sterol ester analysis using EIPLC/MS for &ee campesterol, stigmasterol, and sitosterol and their fatty acid esters. To the autosampler vial containing approximately 0.1 grams oil was added 0.3 mLs CDCI3. One-hundred microliters of this solution was added to 900 microliters CHCI3. Five microliters of this diluted sample was subsequently injected into an HPLC/MS with positive ion atmospheric pressure ionization. The individual components in the oils were separated using two 4.6 x 50 mm Cg Zorbax columns in series and a gradient using acetonitrile and acetonitrile with 40% CHClj. The sterol concentrations were calculated assuming each sterol and its fatty acids have the same molar responses. This was observed to be the case widi cholesterol and its esters and was assumed to be the case for campesterol, stigmasterol, and sitosterol. In the present study, the sterol identified as stigmaslerol was actually an isomer of this compound.
The results of these analyses are displayed in Figures 3 and 4 and show that there were sterol ester enhancements on the order of 50%. in the seeds derived from six out of seven T2 plant lines expressing LCAT3 (pCGN9968) under the control of the napin promoter.
Example 8: Baculovirus Insect Cell Culture for Sterol Esterification Activity
Baculovirus expression construct pCGN9992, pCGN9993, pCGN9994 and pCGN10900 (see Example 4) were transfomied and expressed using the BAC-TOBAC Baculovirus Expression System (Gibco-BRL, Gaithersburg, MD) according to the manufacturer"s instruction except harvesting of recombinant viruses was done 5 days post-transfection. The supemalant from the transfection mixture was used for generating virus stock which in turn was used for infecting Sf9 cells used in the assay.
The transformed cells were assayed for lecithin:sterol acyltransferase activities using the method described herein. Insect cells were cennifuged and the resulting cell pellet was either used immediately or stored at -70 C for later analysis. Cells were resuspended in iMedium A (100 mM Tricine/NaOH, pH 7.8,10% (w/v) glycerol, 280 mM NaCl with : 0.1 nM Aprorinin, 1 M Leupeptin, and 100 nM Pefabloc (all from Boehringer Mannheim, Germany) and lysed by sonication (2 x 10 sec). Cell walls and odier debris were pelleted by centrifugation (14,000 x g , 10 min, 4""C). The supernatant

was transferred to a new vial and membranes pelleted by centrifugation (100,000 x g, Ti 70.1 rotor, 46,000 rpm for 1 hour at 4°C). Total membranes were resuspended in Medium A. Lecithin:sterol acyltransferase activity was assayed in a 0.1 m! reaction mixture containing 100 mM Tris/HCl, pH 7, 28 mM NaCI, 0.03% Triton X-lOO, 0.1 mM sitosterol, 20 iM l,2-["""C]-palinitoyl-phosphalidyi choline (246420 dpm/nmole), and 0.05-20 mg of membrane protein. After 15 minutes at 30 °C, thereaction was tenninaied by addinon of a 0.5 ml solution of methylene chloride:methanol (4:1, v/v) containing 100 jig cholesterol and cholesterol ester as cold carriers. A portion (0.1 ml) of the bottom organic layer was removed and evaporated under nitrogen gas. The concentrated extract was resuspended in 30 p.i of hexane and spotted onto a silica gel-G thin layer chromatographic plate. The plate was migrated m hexane:diethyl etherracetic acid (80:20:1) to the lop, then air dried. Radioactivity was determined by exposure to a Low Energy Phosphor-imaging Screen. Following exposure, the screen was read on a phosphorimager.
The LCAT 4 protein from pCGN10900 in baculovinis membranes showed a radioactive spot in the region of the TLC plate where cholesterol ester migrates indicating that LCAT 4 has the ability to catalyze the transfer of an acyl group from lecithin (PC) to sitosterol to make a sitosterol ester.
Example 9: Plant Assay for Modified Lipid Content
Nir (near infixed spectroscopy spectral scanning) can be used to determine the total oil content of Arabidopsis seeds in a non-destructive way provided that a specfral calibration curve has been developed and validated for seed oil content. A seed oil spectral calibartion curve was developed using seed samples from 85 Arabidopsis plants. Seed was cleaned and scanned using a Foss NIR model 6500 (Foss-Nirs Systems, Inc.). Approximately 50 to 100 milligrams of whole seeds, per sample, were packed in a mini sample ring cup with quartz lens [ IH-0307 ] consisting a mini-insert [ IH-0337 ] and scanned in reflectance mode to obtain the spectral data. The seed samples were then ground, using a mortar and pestle, to achieve a fine homogeneous meal. The ground samples were measured for oil using an accelerated solvent extractor (ASE).
Measurement for the total oil content was performed on the Dionex Accelerated Solvent Extractor (ASE). Approximately 500 mg of clean ground seed was weighed to the nearest 0.1 mg onto a 9 x 9 cm weigh boat. An equal amount of diatomaceous earth was added using a top-loading balance accurate to the nearest 0.01 g. The ground seed sample

and the diatomaceous earth were thoroughly mixed until a homogeneous texture was achieved. The sample was loaded on to the instrument and oil extraction was achieved using hexane under validated Laboratory protocols. Standard Rapeseed samples were obtained from the Community Bureau of Reference (BCR). The ASE extraction method was validated using the BCR reference standards. A total percent oil recovery of 99% to 100% was achieved. "As-is" oil content was calculated to the nearest 0.01 mass percentage using the formula:
Oil Content = 100% x (vial plus extracted oil wt - initial vial w[) / (sample wt)
The analytical data generated by ASE were used to perform spectral calibrations. Nir calibration equations were generated using the built-in statistical pkage within the NirSytems winisi software. The spectral calibration portion of the software is capable of calibration and self-validation. From a total of 85 samples, 57 samples were used to generate the total percent oil calibration. The remaining samples were used to validate the oil calibrations. Optimized smoothing, derivative size, and mathematical treatment (modified partial least square) was utilized to generate the calibration. The samples that were not used in building respective calibrations were used as a validation set. Statistical tools such as correlation coefficient (R), coefficient of determination (R), standard error of prediction ( SEP ), and the standard error of prediction corrected for bias (SEPC) were used to evaluate the calibration equations.
T2 seeds fi"om plants that had been transformed with the LCAT genes were cleaned and scanned using a Foss NIR model 6500 (Foss-Nirs Systems, Inc.). Approximately 50 to 100 miUigrams of whole seeds, per sample, were packed in a mini sample ring cup with quartz lens [ IH-0307 ] consisting a mini-insert [ IH-0337 ] and scanned in reflectance mode to obtain the spectral data. Oil percentage in each seed sample was determined using the seed oil spectral cahbration curve detailed above.
The results of these analyses are found in Figure 5 and Table 2 and show that there was a significant increase in the oil level in seed from T2 plants expressing the LCAT2 gene. This increase in oil was seen in plants when LCAT2 was driven by either the 35S constitutive promoter or the seed-specific napin promoter. These results show that overexpression of a nucleic acid sequence encoding a lecithinxholesterol acyltransferase-like polypeptide can increase seed oil production in plants.




In light of the detailed description of the invention and the examples presented above, it can be appreciated that the several aspects of the invention are achieved.
It is to be underetood that the present invention has been described in detail by way of illustration and example in order to acquaint others skilled in the art with the invention, its principles, and its practical application. Particular formulations and processes of the present invention are not limited to the descriptions of the specific embodiments presented, but rather the descriptions and examples should be viewed in tenns of the claims that follow and their equivalents. While some of the examples and descriptions above include some conclusions about the way the invention may function, the inventors do not intend to be bound by those conclusions and functions, but put them forth only as possible explanations.
It is to be further understood that the specific embodiments of the present invention as set forth are not intended as being exhaustive or limiting of the invention, and that many alternatives, modifications, and variations will be apparent to those of ordinary skill in the art in light of the foregoing examples and detailed description. Accordingly, this invention is intended to embrace all such alternatives, modifications, and variations that fall within the spirit and scope of the following claims.

SEQUENCE LISTING Monsanto Company PLANT STEROL ACYLTRANSFERASES MTC671S
;
60/152,493 1999-08-30
80
Patentin Ver. 2.1
1
440
PRT
1
Met Gly Pro Pro Gly Ser Pro Trp Gin Trp Val Thr Leu Leu Leu Gly
15 10 IS
Leu Leu Leu Pro Pro Ala Ala Pro Phe Trp Leu Leu Asn Val Leu ?he
20 25 30
Pro Pro His Thr Thr Pro Lys Ala Glu Leu Ser Asn His Thr Arg Pro
35 40 45
Val lie Leu Val Pro Gly Cys Leu Gly Asn Gin Leu Glu Ala Lys Leu
50 55 60
Asp Lys Pro Aep Val Val Asn Trp Met Cys Tyr Arg Lys Thr Glu Asp
65 70 75 aO
Phe Phe Thr He Trp Leu Asp Leu Asn Met Phe Len Pro Leu Gly Val
85 90 95
Asp Cys Trp He Asp Asn Thr Arg Val Val Tyr Asn Arg Ser Ser Gly
100 105 110
Leu Val Ser Asn Ala pro Gly Val Gin He Arg Val Pro Gly Phe Gly
115 120 125
Lys Thr Tyr Ser Val Glu Tyr Leu Asp Ser Ser Lys Leu Ala Gly Tyr
130 135 140
Leu His Thr Leu Val Gin Asn Leu Val Asn Asn Gly Tyr Val Arg Asp
145 150 155 ISO
Glu Thr Val Arg Ala Ala Pro Tyr Asp Trp Arg Leu Glu Pro Gly Gin
165 170 175

Gin Glu Glu Tyr Tyr Arg Lys Leu Ala Gly Leu Val Glu Glu Met His
130 185 190
Ala Ala Tvr Gly Lys Pro Val Phe Leu lie Gly His Ser Leu Gly Cys
195 200 205
Leu His Leu Leu Tyr Phe Leu Leu Arg Gin Pro Gin Ala Tirp Lys Asp
210 215 220
Arg Phe He Asp Gly Phe He Ser Leu Gly Ala Pro Trp Gly Gly Ser
225 230 235 20
He Lys Pro Met Leu Val Leu Ala Ser Gly Asp Asn Gin Gly He Pro
245 250 255
He Met Ser Ser He Lys Leu Lys Glu Glu Gin Arg He Thr Thr Thr
260 265 270
Ser Pro Trp Met Phe Pro Ser Arg Met Ala Trp Pro Glu Asp His Val
275 2a0 285
Phe He Ser Thr Pro Ser Phe Asn Tyr Thr Gly Arg Asp Phe Gin Arg
290 295 300
Phe Phe Ala Asp Leu His Phe Glu Glu Gly Trp Tyr Met Trp Leu Gin
305 3X0 315 320
Ser Arg Asp Leu Leu Ala Gly Leu Pro Ala Pro Gly Val Glu Val Tyr
325 330 335
Cys Leu Tyr Gly Val Gly Leu Pro Thr Pro Arg Thr Tyr He Tyr Asp
340 345 350
His Gly Phe Pro Tyr Thr Asp Pro Val Gly Val Leu Tyr Glu Asp Gly
355 360 365
Asp Asp Thr Val Ala Thr Arg Ser Thr Glu Leu Cys Gly Leu Trp Gin
370 375 330
Gly ftxg Gin Pro Gin Pro Val His Leu Leu Pro Leu His Gly He Gin
385 390 395 400
His Leu Asn Met: val Phe Ser Asn Leu Thr Leu Glu His He Asn Ala
405 410 415
He Leu Leu Gly Ala Tyr Arg Gin Gly Pro Pro Ala Ser Pro Thr Ala
420 425 430
Ser Pro Glu Pro Pro Pro Pro Glu
435 440

2
1299
DNA
Arabidopsis thaliana
2
atgaaaaaaa tatcttcaca tcattcggCa gtcatagcga tacCcgttgt ggtgacgatg 60
acctcgatgt gtcaagctgt gggcagcaac gtgtaccctt tgattctggt Cccaggaaac 120
ggaggtaacc agctagaggt acggctggac agagaataca agccaagtag tgcctggtgt ISO
agcagctggt tataCccgat tcataagaag agtggcggac ggtttaggct atggtccgat 240
gcagcagtgt tattgccccc cttcaccagg tgcctcagcg atcgaatgat gctgtacCaC 300
gaccctgatc tggatgatta ccaaaacgct cctggtgtcc aaacccgggt tcctcatttc 360
ggttcgacca aatcacttcc acaccccgac cctcgtctcc gagatgccac atcttacatg 420
gaacatttgg cgaaagctct agagaaaaaa tgcgggtatg ttaacgacca aaccatccta 480
ggagctccat aCgatttcag gtacggcctg gctgcttcgg gccacccgtc ccgtgtagcc 540
tcacagttcc tacaagaccc caaacaattg gtggaaaaaa ctagcagcga gaacgaagga 600
aagccagtga tactcctctc ccatagccta ggaggacttt tcgCcctcca tttccccaac 660
cgtaccaccc cttcatggcg ccgcaagtac aCcaaacact ttgttgcact cgctgcgcca 720
tggggtggga cgacctctca gatgaagaca tttgcttctg gcaacacact cggtgtccct 7B0
ttagttaacc ctttgcCggt cagacggcat cagaggaccc ccgagagtaa ccaatggcta 840
cttccatcta ccaaagtgtt tcacgacaga actaaaccgc ttgtcgtaac tccccaggtt 900
aactacacag cctacgagac ggaccggctc tttgcagaca ttggattccc acaaggagtt 960
gtgccttaca agacaagagt gttgccttta acagaggagc tgatgactcc gggagtgcca 1020
gtcacttgca tatatgggag aggagttgat acaccggagg ttttgatgta tggaaaagga 10 8 0
ggattcgata agcaaccaga gattaagtat ggagatggag atgggacggt taatttggcg 1140
agcttagcag ctttgaaagt cgatagcttg aacaccgtag agattgatgg agtttcgcat 12 00
acatctatac ttaaagacga gatcgcactt aaagagatta tgaagcagat ttcaattatt 12 60
aattatgaat Cagccaatgt taatgccgtc aatgaatga 1299
3
432
PRT
Arabidopsis thaliana
3
Met Lys Lys lie Ser Ser His Tyr Ser Val Val He Ala lie Leu Val
1 S 10 15
Val Val Thr Met Thr Ser Met Cys Gin Ala Val Gly Ser Asn Val Tyr
20 25 30
Pro Leu He Leu Val Pro Gly Asn Gly Gly Asn Gin Leu Glu Val Arg
35 40 45
Leu Asp Arg Glu Tyr Lys Pro Ser Ser Val Trp Cys Ser Ser Trp Leu
SO 55 60
Tyr Pro He His Lys Lys Ser Gly Gly Trp Phe Arg Leu Trp Phe Asp
65 70 75 aO
Ala Ala Val Leu Leu Ser Pro Phe Thr Arg Cys Phe Ser Asp Arg Met,
S5 90 95
Met Leu Tyr Tyr Asp Pro Asp Leu Asp Asp Tyr Gin Asn Ala Pro Gly
100 105 110

Val Gin Thr Arg Val Pro His Phe Gly Ser Thr Lys Ser Leu Leu Tyr
115 120 125
Leu Asp pro Arg Leu Arg Asp Ala Thr Ser Tyr Met Glu His Leu Val
130 135 140
Lys Ala Leu Glu Lys Lys Cys Gly Tyr Val Asn Asp Gin Thr lie Leu
145 150 15S 160
Gly Ala Pro Tyr Asp Phe Arg Tyr Gly Leu Ala Ala Ser Gly His pro
165 170 175
Ser Arg Val Ala Ser Gin Phe Leu Gin Asp Leu Lys Gin Leu Val Glu
180 185 190
Lys Thr Ser Ser Glu Asn Glu Gly Lys Pro Val lie Leu Leu Ser His
195 200 205
Ser Leu Gly Gly Leu Phe Val Leu His Phe Leu Asn Arg Thr Thr Pro
210 215 220
Ser Trp Arg Arg Lys Tyr lie Lys His Phe Val Ala Leu Ala Ala Pro
225 230 235 240
Trp Gly Gly Thr lie Ser Gin Met Lys Thr Phe Ala Ser Gly Asn Thr
245 250 255
Leu Gly Val Pro Leu Val Asn Pro Leu Leu Val Arg Arg His Gin Arg
2S0 265 270
Thr Ser Glu Ser Asn Gin Trp Leu Leu Pro Ser Thr Lys Val Phe His
275 280 2B5
Asp Arg Thr Lys Pro Leu Val Val Thr Pro Gin Val Asn Tyr Thr Ala
290 295 300
Tyr Glu Met Asp Arg Phe Phe Ala Asp lie Gly Phe Ser Gin Gly Val
305 310 315 320
Val Pro Tyr Lys Thr Arg Val Leu Pro Leu Thr Glu Glu Leu Met Thr
325 330 335
Pro Gly Val Pro Val Thr Cys lie Tyr Gly Arg Gly Val Asp Thr Pro
340 345 350
Glu Val Leu Met Tyr Gly Lys Gly Gly Phe Asp Lys Gin Pro Glu lie
355 360 365
Lys Tyr Gly Asp Gly Asp Gly Thr Val Asn Leu Ala Ser Leu Ala Ala
370 375 380
Leu Lys Val Asp Ser Leu Asn Thr Val Glu lie Asp Gly Val Ser Hia
335 390 395 400
Thr Ser lie Leu Lys Asp Glu He Ala Leu Ly3 Glu He Met Lys Gin
405 410 415

lie Ser lie lie Asn Tyr Glu Leu Ala Asn Val Asn Ala Val Asn Glu
420 425 430
4
1641
DNA
Arabidopsis thaliana
4
atgggagcga actcgaaatc agtaacggct tccttcaccg tcatcgccgt ttttttcttg 60
atttgcggtg gccgaactgc ggtggaggac gagaccgagt ttcacggcga cCactcgaag 120
ctatcgggta taatcattcc gggatttgcg tcgacgcagc tacgagcgtg gtcgatcctt 180
gactgtccac acactccgtt ggacttcaat ccgctcgacc tcgtatggct agacaccact 240
aagcttcttt ctgctgtcaa ccgctggttt aagtgtatgg tgctagatcc ttataatcaa 300
acagaccacc ccgagtgCaa gccacggcct gacagtggtc tttcagccat cacagaattg 360
gatccaggtt acataacagg tcctctttct actgtctgga aagagtggct taagtggtgt 42 0
gttgagtttg gCatagaagc aaaCgcaatt gtcgctgttc catacgatCg gagattgtca 480
ccaaccaaat tggaagagcg tgacctttac tttcacaagc tcaagttgac ctttgaaact 54 0
gctttaaaac Cccgtggcgg cccttctata gtattcgccc attcaatggg taacaatgtc 500
ctcagatact ttccggaatg gctgaggcca gaaattgcac caaaacatta tttgaagtgg 660
cttgatcagc ataCccatgc ttatttcgct gctggagctc ctcttcttgg ttctgttgag 720
gcaatcaaat ctaccctctc tggcgtaacg tttggccttc ctgtttctga gggaactgct 7S0
cggttgttgc ccaattcttt tgcgtcgtca ttgcggctta tgccatttcc aaagaatcgc 840
aagggtgata acacatcctg gacgcatttt tctgggggtg ctgcaaagaa agataagcgc 90 0
gCaCaccact gtgatgaaga ggaatatcaa tcaaaatatt ctggctggcc gacaaatatt 960
actaacattg aaattccctc cactagcgtt acagaaacag ctctagtcaa catgaccagc 1020
atggaatgtg gccttcccac ccctttgtct ttcacagccc gtgaactagc agatgggact 1030
cttitccaaag caatagaaga cCatgaccca gatagcaaga ggatgttaca ccagccaaag 1140
aagctgtatc atgatgaccc tgttttcaat cctctgactc cttgggagag accacctata 1200
aaaaatgcat tttgcatata tggtgctcat ctaaagacag aggttggtta ttactttgcc 1260
ccaagtggca aaccttatcc tgataaCtgg atcatcacgg ataccactta cgaaactgaa 1320
ggttccctcg tgtcaaggtc tggaactgtg gttgatggga acgctggacc tataactggg 1380
gatgagacgg Caccctatca tccactctct tggtgcaaga attggctcgg accCaaagtt 1440
aacaCaacaa tggctcccca gccagaacac gatggaagcg acgtacatgt ggaactaaat 1500
gctgatcatg agcatgggCc agacatcata gctaacatga caaaagcacc aagggttaag 1560
tacataacct tttatgaaga ccctgagagc attccgggga agagaaccgc agtctgggag 1620
cttgataaaa gtgggtatta a 1641
5
S46
PRT
Arabidopsis thaliana
5
Met Gly Ala Asn Ser Lys Ser Val Thr Ala Ser Phe Thr Val He Ala
15 10 15
Val Phe Phe Leu lie Cys Gly Gly Arg Thr Ala Val Glu Asp Glu Thr
20 25 30
Glu Phe His Gly Asp Tyr Ser Lys Leu Ser Gly He He lie Pro Gly
35 40 45

Phe Ala Ser Thr Gin Leu Arg Ala Trp Ser lie Leu Asp Cys Pro Tyr
50 55 SO
Thr Pro Leu Asp Phe Asn Pro Leu Asp Leu Val Trp Leu Asp Thr Thr
65 70 75 SO
Lys Leu Leu Ser Ala Val Asn Cys Trp Phe Lys Cys Met Val Leu Asp
85 90 95
Pro Tyr Asn Gin Thr Asp His Pro Glu Cys Lys Ser Arg Pro Asp Ser
100 105 110
Gly Leu Ser Ala lie Thr Glu Leu Asp Pro Gly Tyr lie Thr Gly Pro
115 120 125
Leu Ser Thr Val Trp Lys Glu Trp Leu Lys Trp Cys Val Glu Phe Gly
130 135 140
lie Glu Ala Asn Ala lie Val Ala Val Pro Tyr Asp Trp Arg Leu Ser
145 150 1S5 160
Pro Thr Lys Leu Glu Glu Arg Asp Leu Tyr phe His Lys Leu Lys Leu
165 170 " 175
Thr Phe Glu Thr Ala Leu Lys Leu Arg Gly Gly Pro Ser lie Val Phe
180 185 190
Ala His Ser Met Gly Asn Asn Val Phe Arg Tyr Phe Leu Glu Trp Leu
195 200 205
Arg Leu Glu lie Ala Pro Lys His Tyr Leu Lys Trp Leu Asp Gin His
210 215 220
He His Ala Tyr Phe Ala Val Gly Ala Pro Leu Leu Gly ser Val Glu
225 230 235 240
Ala lie Lys Ser Thr Leu Ser Gly Val Thr Phe Gly Leu Pro Val Ser
245 250 255
Glu Gly Thr Ala Arg Leu Leu Ser Asn Ser Phe Ala Ser ser Leu Trp
260 265 270
Leu Met Pro Phe Ser Lys Asn Cys Lys Gly Asp Asn Thr Ser Trp Thr
275 280 285
His Phe Ser Gly Gly Ala Ala Lys Lys Asp Lys Arg Val Tyr His Cys
290 295 300
Asp Glu Glu Glu Tyr Gin Ser Lys Tyr Ser Gly Trp Pro Thr Asn lie
305 310 315 320
He Asn He Glu He Pro Ser Thr Ser Val Thr Glu Thr Ala Leu Val
325 330 335
Asn Met Thr Ser Met Glu Cys Gly Leu Pro Thr Leu Leu Ser Phe Thr
340 345 350

Ala Arg Glu Leu Ala Asp Gly Thr Leu Phe Lys Ala lie Glu Asp Tyr
35S 360 365
Asp Pro Asp Ser Lys Arg Met Leu His Gin Leu Lys Lys Leu Tyr His
370 375 380
Asp Asp Pro Val Phe Asn Pro Leu Thr Pro Trp Glu Arg Pro Pro lie
385 390 395 400
Lys Asn Val Phe Cys lie Tyr Gly Ala His Leu Lys Thr Glu Val Gly
405 410 415
Tyr Tyr Phe Ala Pro Ser Gly Lys Pro Tyr Pro Asp Asn Trp lie lie
420 425 430
Thr Asp lie lie Tyr Glu Thr Glu Gly Ser Leu Val Ser Arg Ser Gly
435 440 445
Thr Val Val Asp Gly Asa Ala Gly Pro lie Thr Gly Asp Glu Thr Val
450 455 4S0
Pro Tyr His Ser Leu Ser Trp Cys Lys Asn Trp Leu Gly Pro Lys Val
465 470 475 480
Asn lie Thr Met Ala Pro Gin Pro Glu Hia Asp Gly Ser Asp Val His
485 490 495
Val Glu Leu Asn Val Asp His Glu His Gly Ser Asp lie lie Ala Asn
500 505 510
Met Thr Lys Ala Pro Arg Val Lys Tyr lie Thr Phe Tyr Glu Asp Ser
515 520 525
Glu Ser lie Pro Gly Lys Arg Thr Ala Val Trp Glu Leu Asp Lys Ser
530 535 540
Gly Tyr 545
6
1608
DNA
Arabidopsis thaliana
■:400> 6
atgcctctat tactggaaga gatcactaga tcagtagagg ccttgctgaa gctcagaaat 60 cggaatcaag aaccctaCgt tgacccgaat ctaaacccgg ttcttctagt tccaggaatc 12 0 gctggatcaa ttctaaacgc cgttgaCcat gagaacggga acgaagaacg tgtttgggtt 18 0 aggatcttcg gtgctgatca tgagtttcga acaaagaCgt ggtctcgatt tgatccttca 240 actggtaaaa cgatatctct tgatccaaaa acgagtattg ttgttcctca agacagagct 300 gggctacatg caattgatgt cCtagaccct gaCaCgaCCg CCggccgCga gtctgCgCac 3 60 caCCtccaCg agaCgaCCgt tgagatgaCc ggaCggggaC tCgaagaagg gaaaaccctt 420 tttggtttcg gtCatgaCtt ccgccaaagc aacagactgc aggaaacgtt ggaccagtct 480 gctaaaaagc tggaaacCgt ttataaagcc tcaggagaga agaagattaa tgCCaCtagC 540 caCCctaCgg gaggacCaCt ggCgaaatgC ctcatgggtc tgcatagtga tatatCcgag 600

aagtatgtac agaattggat tgctattgct gctccatttc gaggtgctcc tggatatatc 660
acatcgactt tattgaatgg aacgtcgttt gtcaatggtt gggaacagaa cttttttjgtc 720
tctaagCgga gcatgcatca gcCgcttatt gagtgtccat ccatatatga gccgatgtgt 780
tgcccgcact ttaaarggga gctccctccc gtcttagagc tgtggagaga gaaagagagc 840
aatgatggag ttggaacctc tgatgttgtt ctcgagtctt accgtagcct ggagagtfctt 900
gaagttttta cgaaacctct ctcgaataat acagctgatt attgtggaga gtcgatcgac 960
cttcctttta actggaagat-catggagtgg gctcacaaaa ccaagcaagt attagcCtcc 1020
gccaagctgc ctccgaaagt taaattctat aacatatatg ggaccaatct agaaacccct 1080
catagtgttt gccatgggaa tgagaagatg cccgttaaag atctaacgaa tctaagatac 1140
ttccagccga catatatatg cgtggatggt gaCggcacag tcccgatgga atctgccatg 1200
gcggatgggc ttgaagcagt agcaagagtt ggagtccctg gtgagcaccg aggaatcctc 1260
aacgatcacc gtgtcttccg aatgctcaaa aaatggctaa atgtaggcga accagacccg 1320
ttctacaacc cagtaaacga ttatgtcatc cttcccacca catatgaatt tgagaattc 1380
catgagaatg gactcgaggt tgctcccgtg aaagaatcgt gggacatcat atcagacgac 1440
aacaatatcg gcacaaccgg gtcaaccgtg aactccatat cagtctctca acccggagat 1500
gatcaaaacc ctcaagctga agctcgtgca accttaaccg tccaaccaca aagcgacggc 1560
agacaacacg tagagctcaa tgctgtaagt gtctctgttg atgcataa 1608
7
535
PHT
Arabidopsis thaliana
7
Met Ser Leu Leu Leu Glu Glu lie lie Arg Ser Val Glu Ala Leu Leu
15 10 15
Lys Leu Arg Asn Arg Asn Gin Glu Pro Tyr Val Asp Pro Asn Leu Asn
20 25 30
Pro Val Leu Leu Val Pro Gly He Ala Gly Ser He Leu Asn Ala Val
35 40 45
Asp Hia Glu Asn Gly Asn Glu Glu Arg Val Trp Val Arg lie Phe Gly
50 55 60
Ala Asp His Glu Phe Arg Thr Lya Met Trp Ser Arg Phe Asp Pro Ser
65 70 75 go
Thr Gly Lys Thr He Ser Leu Asp Pro Lys Thr Ser He Val Val Pro
85 90 95
Gin Asp Arg Ala Gly Leu His Ala lie Asp Val Leu Asp Pro Asp Met
100 105 110
He Val Gly Arg Glu Ser Val Tyr Tyr Phe His Glu Met He Val Glu
115 120 125
Met He Gly Trp Gly Phe Glu Glu Gly Lye Thr Leu Phe Gly Phe Gly
130 135 140
Tyr Asp Phe Arg Gin Ser Asn Arg Leu Gin Glu Thr Leu Asp Gin Phe
145 150 155 160

Ala Lys Lys Leu Glu Thr Val Tyr Lys Ala Ser Gly Glu Lys Lys He
165 170 175
Asn Val He Ser His Ser Met Gly Gly Leu Leu Val Lys c?ys Phe Mec
laO 185 190
Gly Leu His Ser Asp lie Phe Glu Lys Tyr Val Gin Asn Trp He Ala
195 200 205
He Ala Ala Pro Phe Arg Gly Ala Pro Gly Tyr He Thr Ser Thr Leu
210 215 220
Leu Asn Gly Met Ser Phe Val Aan Gly Trp Glu Gin Asn Phe Phe Val
225 230 235 240
Ser Lys Trp Ser Met His Gin Leu Leu He Glu Cys Pro Ser He Tyr
245 250 255
Glu Leu Met Cys Cys Pro Tyr Phe Lys Trp Glu Leu Pro Pro Val Leu
260 255 270
Glu Leu Trp Arg Glu Lys Glu Ser Asn Asp Gly Val Gly Thr Ser Asp
275 280 285
Val Val Leu Glu Ser Tyr Arg Ser Leu Glu Ser Leu Glu Val Phe Thr
290 295 300
Lys Ser Leu Ser Asn Asn Thr Ala Asp Tyr CyS Gly Glu Ser He Asp
305 310 315 320
Leu Pro Phe Asn Trp Lys He Met Glu Trp Ala His Lys Thr Lys Gin
325 330 335
Val Leu Ala Ser Ala Lys Leu Pro Pro Lys Val Lys Phe Tyr Asn He
340 345 350
Tyr Gly Thr Asn Leu Glu Thr Pro His Ser Val Cya Tyr Gly Asn Glu
355 360 365
Lys Met Pro Val Lys Asp Leu Thr Asn Leu Arg Tyr Phe Gin Pro Thr
370 375 380
Tyr He Cys Val Asp Gly Asp Gly Thr Val Pro Met Glu Ser Ala Met
335 390 395 400
Ala Asp Gly Leu Glu Ala Val Ala Arg Val Gly Val pro Gly Glu His
405 410 415
Arg Gly He Leu Asn Asp His.Arg Val Phe Arg Met Leu Lys Lys Trp
420 425 430
Leu Asn Val Gly Glu Pro Asp Pro Phe Tyr Asn Pro Val Asn Asp Tyr
435 440 445
Val He Leu Pro Thr Thr Tyr Glu Phe Glu Lys Phe His Glu Asn Gly
450 455 460

Leu Glu val Ala Ser Val Lys Glu Ser Trp Asp lie lie Ser Asp Asp
465 470 475 480
Asn Asn lie Gly Thr Thr Gly Ser Thr Val Asn Ser lie Ser Val Ser
485 490 495
Gin Pro Gly Asp Asp Gin Asn Pro"Gln Ala Glu Ala Arg Ala Thr Leu
500 505 510
Thr Val Gin Pro Gin Ser Asp Gly Arg Gin His Val Glu Leu Asn Ala
515 520 525
Val Ser Val Ser Val Asp Ala
530 535
8
1344
DKA
8
atgggctgga ttccgtgtcc gtgctgggga accaacgacg atgaaaacgc cggcgaggtg 60
gcggatcgcg atccggtgcc tccagtacct ggaattggag gctctattct gcattctaag 120
aagaagaatt caaagtccga aactcgggtt tgggtccgaa tatttctagc taaccttgcc 180
ttcaagcaga gcctctggtc tctctataat cccaaaactg gctatacaga gccgttggat 240
gataatattg aagtatcggc ccctgatgat gaccatggac tctatgcaat tgacattcta 300
gatccccctt ggtttgtaaa gctttgtcac ttgacggagg tttatcactt Ccacgatatg 360
atagaaacgc tggttggatg cggttataag aaggggacta cattattcgg ttatggttac 420
gatttccgtc aaagcaatag gatcgatcta ctcatactag gtctgaagaa gaagctggaa 480
actgcatata aacgttcagg ggggagaaaa gtcactatca tctcccatcc aacgggagga 54 0
cttatggctt catgtttcat gtatctccat ccggaggcat tttccaagta tgtaaataaa 500
tggattacaa ttgcaacacc tttccaagga gcaccagggt gcatcaacga ttcaatcttg 660
actggagtgc aatttgtgga agggttagaa agtttcttct ttgtgtcacg ttggacgatg 720
caccaactgt tggtcgaatg cccatctata cacgagatga tggcaaatcc agactttaag 780
tggaaaaagc aaccagagac tcgagtttgg cgCaagaaat ctgaaaacga cgttgatact 840
tctgCagaac tggaatcatt tggcttaatc gagagtattg atctattcaa cgatgcatta 900
aaaaataacg agctaagcta tggtgggaat aaaacagctt tgccctctaa ctctgctatc 960
ctcgactggg ctgctaagac aagagaaatt ctcaacaaag cgcaacCtcc tgatggagtg 1020
tccttctata acatatatgg agtgtcactt aatacaccct tcgacgttcg ttatggcaca loao
gagacttctc cgatagacga tttgtctgaa atatgtcaaa ctatgcctga gtatacatat 1140
gtagatggag atggaactgt ccctgctgaa tcagctgcag ctgctcagtt taaagcagtt 1200
gctagcgtag gagtttcggg tagccaccgc gggcttctcc gtgatgaaag agtgtttgag 1260
ctcattcaac aatggttagg agttgagccc aagaaggcta aacggaagca tttaaggact 1320
cacaaagtag ttgattctgg ttaa 1344
9
447
PRT
Arabidopsis thaliana
9
Met Gly Trp lie Pro Cys Pro Cys Trp Gly Thr Asn Asp Asp Glu Asn
15 10 15

Ala Gly Glu Val Ala Asp Arg Asp Pro Val Leu Leu Val Ser Gly lie
20 25 " 30
Gly Gly Ser lie Leu His Ser Lys Lys Lys Asn Ser Lyg Ser Glu lie
35 40 45
Arg Val Trp Val Arg He Phe Leu Ala Asn Leu Ala Phe Lys Gin Ser
50 55 60
Leu Trp Ser Leu Tyr Asn Pro Lys Thr Gly Tyr Thr Glu Pro Leu Asp
65 70 75 80
Asp Asn He Glu Val Leu Val Pro Asp Asp Asp His Gly Leu Tyr Ala
85 90 95
He Asp He Leu Asp Pro Ser Trp Phe Val Lys Leu Cys His Leu Thr
100 105 110
Glu Val Tyr His Phe His Asp Met He Glu Met Leu Val Gly Cys Gly
115 120 125
Tyr Lys Lys Gly Thr Thr Leu Phe Gly Tyr Gly Tyr Asp Phe Arg Gin
130 135 140
Ser Asn Arg He Asp Leu Leu He Lau Gly Leu Lys Lys Lys Leu Glu
145 ISO 155 150
Thr Ala Tyr Lys Arg Ser Gly Gly Arg Lys Val Thr He He Ser Kis
165 170 175
Ser Met Gly Gly Leu Met Val Ser Cys Phe Met Tyr Leu His Pra Glu
180 laS 190
Ala Phe Ser Lys Tyr Val Asn Lys Trp He Thr Ha Ala Thr Pro Phe
195 200 205
Gin Gly Ala Pro Gly Cys He Asn Asp Ser He Leu Thr Gly Val Gin
210 215 220
Phe Val Glu Gly Leu Glu Ser Phe Phe Phe Val Ser Arg Trp Thr Met
225 230 235 240
His Gin Leu Leu Val Glu Cys Pro Ser He Tyr Glu Met Met Ala Asn
245 250 255
Pro Asp Phe Lys Trp Lys Lys Gin Pro Glu He Arg Val Trp Arg Lys
2S0 2S5 270
Lys Ser Glu Asn Asp Val Asp Thr Ser Val Glu Leu Glu Ser Phe Gly
275 230 255
Leu He Glu Ser He Asp Leu Phe Asn Asp Ala Leu Lys Asn Asn Glu
290 295 300
Leu Ser Tyr Gly Gly Asn Lys He Ala Leu Pro Phe Asn Phe Ala He
305 310 315 320

Leu Asp Tr? Ala Ala Lys Thr Arg Glu He Leu Asn Lys Ala Gin Leu
325 330 335
Pro Asp Gly Val Ser Phe Tyr Asn lie Tyr Gly Val Ser Leu Asn Thr
340 345 350
Pro Phe Asp Val Cys Tyr Gly Thr Glu Thr Ser Pro He Asp Asp Leu
355 360 365
Ser Glu He Cys Gin Thr Met Pro Glu Tyr Thr Tyr Val Asp Gly Asp
370 375 360
Gly Thr Val Pro Ala Glu Ser Ala Ala Ala Ala Gin Phe Lys Ala Val
385 390 395 400
Ala Ser Val Gly Val Ser Gly Ser His Arg Gly Leu Leu Arg Asp Glu
405 410 415
Arg Val Phe Glu Leu lie Gin Gin Trp Leu Gly Val Glu Pro Lys Lys
420 425 430
Ala Lys Arg Lys His Leu Arg Thr His Lys Val Val Asp Ser Gly
435 440 445
";210> 10
3107
DNA
Arabidopsis thaliana
10
cctttttgat ctttcagctc aaCgagcttt tcicaatttt ttgggggaac Cgaatatgtg 60 aatttcaaag tttccacatc gagtttattc acacgtcttg aatttcgtcc atcctcgttc 120 cgttatccag ctttgaactc ctcccgaccc tgctatggat atattaaaaa aaaagtgttt ISO tgtgggctgc atctttgtta cgatccgcat ctccttcttt cggctcagtg ctcatgcttt 240 tgctaCggta gagatgggca atgttattgt tgatggtaac agtggtatag ttgacagtaC 300 cctaactaat caattatctc tttgattcag gcctctatgt tgggtggaac acatgtcact 360 tgacaatgaa accgggttgg atccagctgg tattagagct cgagctgtat caggactcgt 420 ggctgccgac tactttgctc ctggccactt tgtctgggca gtgctgattg ctaaccctgc 480 acatattgga Catgaagaga aaaatatgta catggctgca catgactggc ggctttcgct 540 tcagaacaca gaggctcttt tctcatcgtt ctttctatta ttctgttcca tgccacgCtt 600 ctttcctcat tacttaaggc ttaaatatgt ttcatgttga attaataggt acgtgaccag 660 actcttagcc gtatgaaaag taatatagag ttgatggtct ctaccaatgg cggaaaaaaa 720 gcagttatag ttccgcattc catgggggtc ttgtattttc tacattttat gaagtgggtt 780 gaggcaccag ctcctctggg tggcgggggC gggccagatt ggCgtgcaaa gtatattaag S40 gcggtgacga acactggtgg accacttctt ggtgttccaa aagccgctgc agggcttttc 900 tctgctgaag caaaggatgt tgcagttgcc aggtattgaa tatctgctta tacttttgat 960 gatcagaacc ttggctctgg aactcaaagc tattctacta aatatcaatt ctaataacat 1020 tgctatatta tcgctgcaac cgacattggt tgattatttt gctgcttatg caactgaaac 1080 tctcttgaga ttagacaaat gatgaattga taattcttac gcactgctcc gtgatgacca 1140 gtttcttagc ttcgacgata acatttgtca tactgtcttt Cggagggcat tgaattctgc 1200 tatggaaagc gctggagctc ccatgcttgc attctttacc aattagcatt attctgcttc 12S0 tttcaatttt cttgtatatg catctatggt cttttatttc ttcttaatta aagactcgtt 1320 ggagtagttg ctctattagt cgcttggttc cttaatatag aacttcactc tcttcgaaaa 1380 ttgcagagcg attgccccag gattcttaga caccgacata tttagacttc agaccttgca 1440 gcatgtaatg agaatgacac gcacatggga ctcaacaatg tctatgttac cgaagggagg 1500

tgacacaata tggggcgggc ttgattggtc accggagaaa ggccacacct gttgCgggaa ISSO
aaagcaaaag aacaacgaaa cttgtggtga agcaggtgaa aacggagttt ccaagaaaag 1620
Ccctgttaac Catggaagga tgatatcttt cgggaaagaa gtagcagagg ctgcgccatc 1680
tgagactaac aatattgatt tccgagcaag gacatataaa ccataataaa ccttgtacac 1740
ttcgtgattg tatgatgaat atctgtacat tttatctggt gaagggcgct gtcaaaggtc 1800
agagtatccc aaatcacacc tgtcgcgacg tgcggacaga gtaccatgac atgggaattg 1860
ctgggatcaa agctatcgct gagCaCaagg tctacactgc tggtgaagct atagatctac 1920
Cacactatgt tgctcctaag atgatggcgc gcggtgccgc tcatctctct catgggattg 1980
ctgatgattt ggatgacacc aagtatcaag aCcccaaata ctggccaaat ccgttagaga 2040
caaagtaagt gatttctcga ttccaaccgt atccttcgcc ctgacgcatc atcagtcttt 2100
ttgttttcgg tcttgttgga Cacggttttc agctcaaagc ttacaaagct gtrcctgagc 2160
ctttctcaaa aaggcctgct cagttatatt gaggtgctaa agttgaCaca tgtgacccct 2220
gctcataaat cctccgcttg gtttgttctg ctctttcaga ttaccgaatg ctcctgagat 22S0
ggaaatctac tcattatacg gagcggggat accaacggaa cgagcaCacg tatacaagct 2340
taaccagtct cccgacagtt gcatccccct tcagaCattc acttctgctc acgaggagga 2400
cgaagatagc CgCctgaaag caggagttta caatgtggat ggggatgaaa cagcacctgt 2460
cctaagtgcc gggCacatgt gCgcaaaagc gtggcgCggc aagacaagac Ccaacccttc 2520
cggaatcaag acttacaCaa gagaatacaa tcactctccg ccggctaacc tgttggaagg 2580
gcgcgggacg cagagtggtg cccatgttga tatcaCggga aactctgctt tgatcgaaga 2640
tatcatgagg gtCgccgccg gaggtaacgg gCctgatata ggacatgacc aggtccactc 2700
tggcatattt gaatggCcgg agcgCattga cctgaagctg tgaacatcat gatctcttta 2760
agctgtcctg tcagctcatg cgaatccaat actttgaaag agagaCcacc atcaactcac 2820
catcatcgtc atcatcatga tgctcaactc acaaagaagc ccgagaatga tactttggtg 2B80
cgaaattctc aatacctcct caaCattctt attgaaCgta aattaCacaa ccctatetaa 2940
tgtctgaacg ataacgcaaa actCgcCgcg ccatgcttgt ttgtcttgCc aaaagcatca 3000
atttgtgggt catacgtagt gcagaggatg attcaaattt gcgacaaatt tggtaatcaa 3060
agttaattct gaaaatgcaa caccacatga actaCgtcac taaggcc 3107
11
1680
DNA
Arabidopeis thaliana
=220>
unsure (694) n=uiikown
":400> 11
cgcataaggt gttcgagtgt ttgcagcttg agaagttccg agtccaagag accCggagcc 60
aaagaCctga accaCaaaaa CgaccaaCca aaaCccatta agccaattca aacattcact 12 0
aaaaatgtta cagttctcac gaatactaac ataacaagtg aaagtaaact taaaaacgct 180
catggaccta acctggcgta acggtatgtc tctgccctca gcagaaagCa aattactgac 240
ggccctaggg acaccCaaaa aggcgggccc aatgttgacg acggatttga tgtgtttggc 300
acaccaacct ggaccacccc caccgccCcc atcaggaaga ggtgttCcCa cccattcaag 360
gaagtgaagg aaatagatag cccccaCCga atgcggaacc accacaactt tcttaaaccc 420
actggtggca tacattagct cgaCtttgct cttcagCcta cttaacgact ggccacgcac 480
ctgctcggtt tcaatccaaa aacCaCagat tagtccaaag ctctacaaca atatgtaatt 540
acatacacta aagtagccaa tcatggaggC cttatagtat atcattatca Ccattctcta 600
gaccaccagC gtCgCcaacg Cgatcataca ggcattaata acgactaatc tgagcacacc 660
CcggtgttaC ggaaagagag cctccaaCca Caangaggcc atgtgaaggC ccttgccttc 720
ataCccaaCC Cttgccaaat cctctatgag aactgcccaa gcaaagcagc atggtgcgaa 780
atagtctgca gccactagtc ctgggactgc tcggacacgg actcccggtg gatcgagacc 840
ggtctcactg cctagagata agtgccccaa ceagcacaat ggcccataaa tcaaatcaca 900
acaattaaac gaccaagtat acacttcaaa cCaaCtcaga attgagaaaa tcgaaatgcc 960

aaccagaaaa tcatgtaaaC caaaaaccgt aacaafccaat atatataCat atattttcca 1020
gaatccaCgC taaaaccata accaaaaata tatgaaaatt tagaaatacc aaaataatat 1080
gttaaaactg atattctaaa tttagtaagt tttaaaatgc aatgaaatcg tcattcatgt 1140
tttgaacata aatatattct atagttttgt aggacgattt tctacttcct atatagaaat 1200
caaaacttac ggtttccatt tccaaattcg aatgacattt aaaaacatat cccaaaaatc 1260
acgattaatt attaatttcc taaaaccacc catcattact tagaaaataa" tattttcata 1320
aactagctgc aaaacaataa caaaacccaa agaaccatct ccacccacta accaaaacga 1380
aaatccaaag accatccata acaacaacag tataacacta cgtaaagcca attcaagaag 1440
aaaccaagct taaccaatta tacatacccc taaccagacg aaccaaacca atatctgacc 1500
gggtctataa aaatatctcg aaccgaacat aacggtctaa tgtgttacct tctaagaatc 1560
tcggagaagc tagcacccca aagacgttca cgaaagagtc cttcagcgca aggccgacct 1620
tcccaaagct cgagcccgcc ggttacaacc cccggaacaa gaatcaccgg atgaaacgcc 1680
12
264
DNA
Glycine max

unsure (39) n=untaown

<:2> unsure {175) n=unknown

unsure (241) n=unknown
12
ccaagaactc gatgattact tcaacactcc tggggttgng acccgggtcc ctcactttgg 50
ttccaccaac tctcttctct catctcaacc ctcgtctcaa gcatatcscc ggatacatgg 120
cacccctggt agattcatta caaaagcttg gctacgctga tggtgagact ctgtntggag 180
ccccttatga ctttagatat ggtctagctg ctgaaggtca cccttcacaa gtgggttcca 240
ngttcctcaa agatctaaag aact 264
13
273
DNA
Glycine max

unsure
(12)
n=i.iIvknown

unsure
(33)
n=Uilknown


unsure (252) n=uniaiown

unsure n=unlaiQwn

unsure (272) n=unknown
13
ccaacatctg anaggggcag agtaggtatt Ccnacctatt atccatatcg tgatgaagaa 60
ggaacaagaa gagggtctca agattgaggt tgctacactc acagttacag tagttgttgt 120
gacgctgCca ttgctacgca catgtggggc aagcaacctc gaccctttga tcctaatacc ISO
aggtaacgga gggaaccaac tagaagcaag gttgaccaat cagtacaagc cctctacntt 24 0
cacctgcgac cntggtaccc tcccannaag ana 273
14
419
DNA
Glycine max

unsure (99) n=unknown
<:220>
unsure
n=unknown

unsure (352) n=unknown

unsure (392) e223> n=unknown

unsure (405) n=unknown


unsure (418) n=uiiknQwn
14
gctgcatatg attggagaac agcacctcag aacactgagg tgagggatca aacactaagt 60 cggataaaaa gcaacacaga acttatggtt gctactaang gtggaaataa ggcagctatt 120 attccacact caatgggggt cttgtacttc ctacatttta tgaaatgggt tgaagcacca 180 gctccaatgg gtggtggggg aggaccagat tggtgctcca aatatataaa ggcagttgta 240 aacatcggtg gaccattttt aggtgttccc aaggctatag cagggctact ctcagctgag 300 gccaaggata ctgctgttgc caggacgata gctccaggat ttttanataa cnatctgttt 360 ccgcattcaa acccttgcaa catgtaatga anatgaaccc gttcnttggg actcaacna 419
272
DNA
Glycine max

unsure
(1) . . 1272)
■!:223> n=uiiknown
15
Cganttgatc ntgngaagtn attctgtgta ttanttccat gacatgaccg ttnnagatnc 60
gtaagtgang ggtntgaaga gggaaagacg ccttttggtn ttngatatga ttttcgccaa 120
agcaacaggt tgcaggaaac aatggatcgg ttggctgcna agtcagaatc aanttataat 180
gccgcaggnn ggaagacaat aaacattata nctcattcta tgggcggtct tttccnngan 240
atgtttcntg tgcctgcaaa gcgatatttt ga 272
16
237
DNA
Glycine max

unsure c222> (1).-(237) ■c223> n=ucknown
16
gattttcgcc aaagcaacag gttgcaggaa acaatggatc ggttggctgc aaagttagaa 50 tcaatntata atgcngcagg agggaagana ataaacatta taactcattc tatgggcggt 12 0 cttttggtga aatgnttcac gtgcctgcaa agcgatattt ttgagaaata Cgttaagaat 180 tgggttgcaa ttcgtgcgcc attccagggt gcaccaggaa ccatcaattc naccttt 237
17
244
<:212> DNA
Glycine max


unsure (1)..(244) n=unknown
17
gattttcgcc aaagcaacag gttgcaggaa acaatggatc ggtnggctgc aaagCCagaa 60
Cgcaatttat aatgctgcag gagggaagaa aataaacatt ataactcacc ctacgggcgg 120
tcttttggcg aaatgtttca tgtgcctgca aagcgatacc tttgagaaat atgccaagaa 180
ttgggttgca actcgcgcgc catcccaggg tgcaccagga accatcaact cCacccttct 240
aaat 244
18
263
DNA
Glycine max
■;400> IB
gacgaaacca aaccgtgggc gaccaagctt gtttactcgg CtgactcaCg gcaagaccaa 60
gttcgCCgct tcatagaaga ggCcattggc gaaccagtct atcccgtggg caactcacCa 120
ggaggactgg ctgcattgca ttccgcggca aacasccctc atccagcgaa aggcgtcgca 130
ttgcttaagc aacacctttt- tgggggCttc tgccaaaCcc cataaaaagt ccaagactag 240
;gaaaatact tccatgggcc gga 263
:210> 19 ;211> 311 ;212> DNA ;213> Zea mays
220>
221> unsure 222> (11 . . (311! 223> n=uiiknown
;400> 19
:ggacgctgg ncatgtCcgg agccccctac gacttccgct acgcgccgcc gtcccccggc 60
:agacgCccg aggcgtactc ccgctacttc aaggagctga Cggagctggt cgaggccgcg 12 0
Lgcgagagga cccggaagaa ggccgtcatc cCcggccaca gccccggcgg cacggCcgcg 180
iccgagttcg Cccggaacac Cccgccggcg Cggcggcgcg agcacaCcga gcgcctcgtc 240
icggtcgcgc cgacgctccc cggcgggttc ctggagccgg Cgcgcaactt cgcgtccggg 300
■cggacatcc t 311
210> 20 211> IISS 212> DNA 213> Zea mays
■100> 20
cgacccacg cgCccggcca caagaaccct cCcaagccag actggtgcct cggaaagcCg 60 gagccgcac Cggaagacat gggataccga gacggagaca ccatgttcgg agccccctac 120 acctccgcc acgcgccgcc gtcccccggc cagacgtccg aggtgtactc ccgctacttc 180 aggagccga tggagcCggt cgaggccgca agcgagagga cccggaagaa ggccgtcatc 240 Ccggccaca gcttcggcgg catggCcgcg cCcgagcccg Cccggaacac Cccgccggcg 300

tggcggcgcg agcacaccga gcgcctcgtc ctggtcgcgc cgacgctccc cggcgcgttc 360
ctggagccgg tgcgcaacct cgcgtccggg acggacatcc tmtacgcgcc agcgacgacg 420
ccgctggcca cgcgagccat gcgragragc ttcgagagcg ccatcgtgaa cctcccgtcg 430
ccggccgcgt tcgggcgcct gcaggcgccg ctcgtggtca ccagggagcg gaactactcc 540
gcgcccgcgc acgacacgga gcgcctcctc gccgccgtcg gccccggcga ggccgcggag 600
cccttcagga gacgggccgt ccccaagacg ggcagcctcg cggcgccgat ggtgcccatg 660
acgtacatca gcggggtcgg caacaggacg ccgccgcggc Cggtgctctg gggcgacgac 720
ttcgacgcgg ccccggaggt ggcggcgcac ggggacggag atggcaagat caattcgatc 7S0
agcgtcttgg cgtctgagaa ggagatgcgt cggcagccgg agcagaagaa gcsgttcaaa 840
tccatcaaga tcgataaggc ccagcattcc acgatcgtca cggatgattt tgccctgcac 900
agggtcattc aagaaattgt tgaggccaat aatcagaaga ttccatccta aattcctcat 9S0
gtcatgtatg cattaccgag ctgtgggggc caatagtggg ttgggagttg ggacaccggc 1020
tccgtgctta aaacggtcgt ggtgtggtct caattcaatc gattagttaC ttgctaacgc 1080
caactgcttg cctcaaaaaa aaaaaaaaaa aaaaaaaaaa aaaaaaaaaa aaaaaaaaaa 1140
aaaaaaaaar gggcg 1155
21 328 DNA Zea mays
21
gttggaatgc tcttcaactt tcttacgctc atttacaCtg cttttctctg cggtaaattg 60
ctggcctaaa tgcacgctgc ttgaacccta taatcagata gaccatcccg aatgcaagCc 120
aaggcctgat agtggtcttc CgcaaCtaca gagccggacc ctggttatac aacaggtcct ISO
ctctcttcag tatggaaaga atgggtcaaa tggtgtgtag agtttggcac tgaagctaat 240
gcaattaccg ctgtCccgta tgactggaga ctgcccccat caacgctcga ggagagagat 300
ccgtactttc acaattaaac aggatcag 328
22 356 DNA Zea mays
22
gtctttctgc aattacagag ctggaccctg gttatataac aggtttcagg tcctctctct SO tcagtatgga aagaatgggC caaatggtgt gtagagtttg gcaCCgaagc taaCgcaatt 12 0 atcgctgttc cgtatgattg gagactgccc ccatcaatgc ttgaggagag agatctgtac ISO tttcacaaat taaagtttgt aacacttgcc tcaacttgtt atgaagcaac caaCgctata 240 caCcCgtCag gatcagCaag agttaacggc ccatgacgga Ctcaggttcc tgcCcaccaa 300 cagatcccac aagcatacgg Ccaccgccaa tgcctgcagt tggacagtac caaccc 356
23 1552 DNA Zea mays
23
tcgacccacg cgCccgcaga catgatcatC ggtgatgaca ctgCgtacta ctatcatgac 60 atgatagCgg aaatgattaa atggggatat caagaaggaa aaactctctt tggaCCtggt 12 0 tacgatttcc gtcaaagcaa caggctctca gagacacttg acagactttc taaaaagctg 180 gagCcagtgt acacagcttc tggtggaaag aagatcaatd tcattactca ttcaatgggg 24 0 ggaCtacttg tgaaatgttt caCctcactg cacagCgata tatttgaaaa atacgtcaar 300

agttggatcg caattgctgc accattccar ggcgcccctg ggcamataac taccak-ytig 360 ctgaaCggaa tgncttttgc craaggacgg gaaycaagat tcttratctc caaawkgkgt 420 atgcascaat tgccactcga gtgcccatca atctacgagk tgctgscaam ccctaactct 4S0 ccagtggaga gacatcccac tgctacagat tcggagagag aacttggata mcagtggcaa 540 gaaaagtgcc ccgttagagt cgtatgagcc cgaggaagca acaaagatga ttaaagaggc 600 tctttccagC aatgagatca ttgctgatgg catgcatatt ccggtgcccc ttaatttgga 660 tatattgaat tgggcaaaga aacttatgat cctttatgca gcacaaagcc tccggaatca 720 gtgaaattct acaacattta tgggaCtgat tatgatactc cacatactgt ctgctacggc 780 agtgaacagc agccggcttc aagtcttagc agcctcctat atgctcaggg aaaatacgcc 840 tatgttgacg gcgacggatc tgttcccgca gaatcagcaa aggctgacgg aCttaatgca 900 gtggcaaggg ttggggttgc tcccgaccac cggggaatcg tgcgcagtcg ccgcgcgttc 960 cggatcgtcc agcactggct gcacgccgga gaacctgacc cgttctacga cccgctgagc 1020 gactatgtca tactcccaac acgcttacga aatcgagaag catcgtgaga aacacgggga 1080 tgtcacgtca gtagcggagg actgggagat catctcccct aacgacggca agaccatrrg 1140 gccaggcgag cttcctccta tggtcagcac actgaccacg agccgggaag gcaaggaggg 1200 agcactggaa gaggcgcatg ccaccgtggt cgttcacccg gagaagaagg gacggcagca 12 60 tgtgcaagtt agggctgtgg gtgtcagcca tggtggctaa agccgtagga gccacgttgg 1320 ctgcctactc tatccagcag tagcagctat acctctgtgc acgcaccgta aaattggatg 1380 tacatatatg gctatgacct ctgtagggat ctggttttag aagtataaat gggcaccctg 1440 cctgcttgta aatgctcaga accgaaaaca caggccctgt tctttttttt cctttttaaa 1500 aaaaataaaa agatggtaaa ggattccatt aaaaaaaaaa aaaaaaaagg eg 1552
24
";211> 227
DMA
Zea mays

unsure (1)..(227) n=unknown
24
ttggttatga tttccgtcaa agcaacaggc cctcagagac acttgacaga ttttctaaaa 60
agctggagtc agtgtacaca gctcccggtg gaaagaagat caatctcatt, actcattcaa 120
tggggggatt acttgtgaaa tgtntcatct cactgcacag tgatatatnt gaaaaatatg ISO
tcaagagctg gntcgcaatt gcngcaccat Cccaaggtgc ccctggg 227
25 1587 DNA c213> Zea mays

e221> unaure 11).-(1587) n=ankiiown
25
ggagattgtc gtgccggagg acgaccacgg cctgtttgcc atcgacattc ttgatcctcc 50 ctggtttgta gaactcgacc cacgcgtccg cccaccgtcc gggagattgt cgtgccggag 120 gacgaccacg gcctgtttgc catcgacatt cttgaccctt cctggtttgt agaacttctc 180 catctgtcta tggtgtatca cttccatgat atgattgata tgctcataaa ctgtggatat 240 gagaaaggga ccacactatt tggatatggt tatgattttc gccaaagcaa caggatagac 300

aaagcgatgg ccggttcgag agcaaaactt gagacagctc ataagacctc tggagggaaa 3S0
aaagttaatc caaccccaca ctctatgggt ggacrgctag tacgctgctt catgcctatg 420
aatcacgatg tattcactaa gtatgtcaac aaatggattt gcatcgcttg tccactccaa 480
ggtgcccccg gaCgcaccaa cgactctcta cttactggat tgcaatttgt ttatggtttt 540
gagagcttct ttttcgtatc cagatgggca atgcaccaat tgcttgtcga atgcccatca €00
atccaCgaaa tgttaccaaa cccagaattc aagtggaagg aaaaaccaat tattcaggtt 660
tggcgtaaga accctgaaaa ggatggaact gtggagcttg ttcaatatga agcaactgat 72 0
tgtgtgtcct cgctcgaaga agctttaagg aataatgagc tcacgtataa cggaaagaaa 780
gtagcactac cattcaatat gtcagtcttc aaatgggcca ccaagactcg ccaaatccta 840
gacaacgctg aattaccaga cactgtgagc CtttacaaCa tatacgggac atcttacgaa 900
actccatacg atgcatgcta cggctcagaa agctccccga Ctggagattt gtcagaagtg 960
tgtcacacag tgccggcata cacttaCgtg gatggagatt gcacggttcc catagaatcg 1020
gcacgggctg atgggttctc tgcgaaagaa agagttggcg tcaaggcgga ccaccgtggc lOao
ctgctgtccg atgagaacgt attcaagctt ctcaagaaat ggctcggtgt gagcgagaag 1140
aagccagagt ggcgttgcgt gcctaaatcc cactccaaag tgacctaatt gggttgcctg 1200
tagttcttca ggaagactgt tatcttggcc cctcctcctg aagagaagat gaaacaaaac 1260
CccggtgaCt gtattgcatg cctgcacgat gtaaatctct gcaagctgca cggaacaagg 1320
gactagtgcc cttgtacgat gtatcattgg caggcatttn cttttgaacc tangggcata 1360
ttcntttgnc cttccactct ggacntagta aagaacatnt gaatcgacct tanttnnaan 1440
nngtctgnnn nnjumimrinii nnnnnnnnnn ruimmnnnim nnnnnnrmnn nnimnnnnnn 1500
nrmnimrmnn niiaaaaaaaa awgkgaagcc gntnntnntt tnaaaagnnt tttnnnaaaa 1550
aaaaaaaaaa aaaaaaaaaa aaaaaaa 1587
26 300 DNA Zea mays

unsure (1)..(300) n=unknown
gacaaagcga tggctggttt gagagcaaaa cttgagacag cccataagac ctctggaggg 60 aaaaaagtta atttaatctc acattctatg ggtggattgc tagtacgctg cttcatgtct 120 atgaatcatg atgtgagttt tcatgttttc tgtgCttttt ttgcttttgc ataaatatcc 180 atgtcaattt cccccacttt ctaggtattc actangtatg tcaacaaatg gatctgcatt 240 gcttgtccat tccaaggtaa cttatgggac acttcaattg tttattanat natggggncc 300
27 1240 DNA Zea mays
=:220>
«221> unsure =:222> (1) . . (1240) i;223> n=un)aiown
;400> 27
:cgacccacg cgtccggttc ccagttccca ccgtgtagat ggttctggta taaaatgtat 60
-.gccatattt gtaacacaga ttactatata caggttcgtg atcaaacttt gagcagaata 120
iagascaaca ttgaactcat agwagsgaca aatggtggaa atagggcggt ggiangatccc 180

acnactccat ggggtcnttn attctntgcn ttctacgnaa tggnccgaag ccczcctccg 240
tgggggcagt "gggtccgaac tggntgtaga accatataaa gcCgtaaCga atattggagg 300
acctttccta ggagttccta aggctgttgc tgggcttttt ttcttctaag caaaagacgc 360
tgccggttgc taggtataag taacgattca tccatttaaa gcaaaaggga atagcaaaag 420
aacgaacatt attggatgct cgacaagctt gcggagcttt tgctcccaag ccatctcctg 430
gaccccacaa gtccagggag tgcctgcctc tgatcctcat catcaggaac aggcCcaagc 540
atgcaccgac ggtaccgcga ggccatttct atcctgatgc aacaccacgt acttgttgat 600
ggcaaggcca ggactgacaa gacctaccct gctgggCtca tggaCgtcat ttccatcccc 660
aagacaaacg agaactacag gccgctttcg Ccttcaccca aCcagggatg aggatgccaa 720
gttcaagctc cacaaggtga ggtctgttca gtttggccag aaagacaccc cctatctgaa 780
cacctacgac gaccgcacca tccgctaccc cgacccgctc atcaaggcca acgacaccat 340
caagaccgat ctggagacca acaagatcat ggacttcacc atgtttgacg tcggcaacgt 900
ggtcacggtg atcggcagga ggaataccgg gcgtgtagga gtgatcaara taagggagaa 960
gcataagggc aactccgaga ccatccacgt gctgctcgra gctttttgct atgtctagct: 1020
ttctcctatt tgttgtacag gaaaacaCag aatgaaattc aaatttggtg gccacaaaag 1080
tgtggagact tgatttcata taaagttagg cttaacatta gtgcaaacag ttgtatttta 1140
gtttagattt agagtacact atgtatgcgt tgtttgacaa tgcttattta tgatacattg 1200
aatggtactt aCttatatta attaattaaa aaaaaaaaaa 1240
28
324
DNA
Zea mays
cgaatgctcc tgacatggaa atattttcca tgtacggagt aggcattcct actgaaaggg 60
catatgtcta taagttggcc ccacaggcag aatgttatat acctttccga attgacacct 120
cggctgaagg cggggaggaa aatagctgct tgaaaggggg tgtttactta gccgatggtg 180
acgaaactgt tccagttctt agtgcgggct acatgtgtgc aaaaggatgg cgtggcaaaa 240
ctcgtttcaa ccctgccggc agcaagactt acgtgagaga atacagccat tcaccaccct 300
ctactctcct ggaaggcagg ggca 324
29 254 DNA Zea mays
29
gaataaagag caacattgaa ctcatggtag caacaaatgg tggaaatagg gtggtggtga 60
tcccacactc catgggggtc ctctattttt tgcattttat gaaatgggtc gaagcacctc 120
ctcccatggg gggtggcggt ggtccagact ggtgtgagaa gcatattaaa gctgtaatga 180
atattggagg acctttctta ggagttccta aggctgttgc tggccttCtc tcatctgaag 240
ccaaagatgt tgcc 254
30
513
DNA
Mus musculua
30
tggaggacaa cgcggggtct gatacgactc actataggga atttggccct cgagcagtag 60
attcggcacg acgggcacga ggactccatc atgttcctca agctttattc ctaccgggaC 120
gtcaacctgt ggtgccgcca gcgaagggtc aaggccaaag ctgtctctac agggaagaag ISO

gtcagtgggg ctgccgcgag caagctgtga gctatccaga caacctgacc taccgagatc 240
tcgatcactt cacctttgct cccactttgt gttatgaact caactttcct cggccccccc 300"
gaaCacgaga gcgctttctg ctacgacgag tCcttgagat gctctttttc acccagctcc 360
aagtggggct gatccaacag cggatggtcc ctactatcca gaactccatg gaagcccttt 420
caagagcttc tgcagcttcg gagaccgcga gctctacaga gactggtgga atgctgagtc 480
tgccaccgac ttttggcaga actggaacat ccccgcgg 51B
31
299
DNA
Mus musculus
31
ccatgatggc tcaggtccca ctggcctgga tcgtgggccg attcttccaa gcgaactatg 60 gcaatgcagc tgtgtgggcg acactcatca ttgggcaacc ggtggctgcc tcacgtatgt 120 ccacgactac tacgtgctca actacgatgc cccagtgggt catgagctac tgccaaaggc 180 agcccCccct aacctgggcc tggagttctg gaggggttcc tggctgcctg cacactcctc 240 ctagtctggg aggcctctct gccccCatgc gctactcctg ctcttgggga tggcattcg 299
32 1395 Artificial Sequence

Description of Artificial Sequence: Inferred cDNA sequence

unsure (1).,(1895) n=unknovm
c400> 32
gcctggtgtg aCggggacag ggagggactt ccccCtaccc agcactggtg ttggctgagg 60 tgggtgccga gtcccagagc ttggcatgga gaccagacag ggctgggtct gcaagcctga 120 ggctgccgcc ctgagctcgg gctgggacgt gcccagaggt gttgggagga tctggggcga 180 gcaccctgtg gccaggacta aaggggctnc acccccctgt ccaCccctcg cagatcttga 240 gcaatgcccg gttatttctg gagaacctca tcaagtatgg catcctggtg gaccccatcc 300 aggtggtctc tctgttcctg aaggatccct atagctggcc cgccccatgc ctggttattg 360 cggccaangt ctttgctgtg gctgcattcc aggttgagaa gcgcctggcg gtgggtgccc 420 tgacggagca ggcgggactg ctgccgcacg tggccaacct ggccaccatt ctgtgcttcc 480 cagcggccgt ggtcttactg gttgagtcta tcactccagt gggctccctg ctggcgctga 540 tggcgcacac catcctcttc ctcaagctct tcccctaccg cgacgtcaac tcatggtgcc 600 gcagggccag ggccaaggct gcctcCgcag ggaagaaggc cagcagtgct gctgccccgc 660 acaccgtgag ctacccggac aatctgaccc accgcgatcc ctactacttc ctcttcgccc 720 ccaccttgtg ctacgagctc aaccttcccc gctctccccg catccggaag cgctttctgc 780 tgcgacggat ccctgagatg ctgttcttca cccagctcca ggtggggctg atccagcagc 840 ggatggtccc caccatccag aactccatga agcccttcaa ggacatggac tacccacgca 900 tcatcgagcg cctcctgaag ctggcggtcc ccaatcacct catctggcCc aCcttcttct 960 accggctctt ccactcctgc ctgaaCgccg Cggctgagct catgcagttt ggagaccggg 102 0 agttctaccg ggactggtgg aactccgagt ctgtcaccta cttctggcag aactggaaca 1080 tcccCgtgca caagtggtgc atcagacact tctacaagcc catgctCcga cggggcagca 1140 gcaagCggaC ggccaggaca ggggCgCCcc CggccCcggc cctcttccac gagtacctgg 1200

tgagcgtccc tctgcgaatg ttccgcctct gggcgttcac gggcatgaCg gcccagatcc 12S0
cactggcctg gttcgtgggc cgctttttcc agggcaacta tggcaacgca gctgtgtggc 1320
cgccgctcaC catcggacag ccaatagccg Ccctcatgta cgtccacgac tacCacgtgc 1330
tcaactat-ga ggccccagcg gcagaggcct gagctgcacc tgagggcccg gcttctcact 1440
gccacctcac acccgctgcc agagcccacc tctcctccta ggcctcgagt gctggggatg 1500
ggcctggctg cacagcatcc tcctctggtc ccagggaggc ctctctgccc ctatggggct 1560
ctgtcctgca cccctcaggg atggcgacag caggccagac acagtctgat gccagctggg 1620
agtcttgctg accctgcccc gggtccgagg gtgtcaataa agtgctgtcc agcgacctct 1580
tcagcctgcc aggggcctgg ggcctggtgg ggggtatggc cacacccaca agggcgagcg 1740
ccagagccgt gtggacagct gtcccaggac ctgccgggga gcagcagctc cactgcagca laoo
gggcgggcat ggccggtagg gggagtgcaa ggccaggcag acgcccccat tccccacact 1860
cccctaccta gaaaagctca gctcaggcgt cctct 1895
33
l76o
DKA
Artificial Sequence

Description of Artificial Sequence: Inferred cDNA sequence
33
cacgactggg ccgcgacgtg gtgcgggccg aagccatggg cgaccgcgga ggcgcgggaa 60
gctctcggcg tcggaggacc ggctcgcggg tttccatcca gggtggtagt gggcccatgg 12 0
tagacgaaga ggaggtgcga gacgccgctg tgggccccga cttgggcgcc gggggtgacg 190
ctccggctcc ggctccggtt ccggctccag cccacacccg ggacaaagac cggcagacca 240
gcgtgggcga cggccactgg gagctgaggt gccatcgtct gcaagactct ttgttcagct 300
cagacagcgg tttcagcaat taccgtggta tcctgaattg gtgcgtggtg atgctgatcc 360
tgagtaatgc aaggttattt ttagagaatc ttatcaagta tggcatcctg gCggatccca 420
tccaggtggc gtctctgttt ctgaaggacc cctacagctg gcctgcccca tgcttgatca 480
ttgcatccaa tatctttact gtggctacat ttcagattga gaagcgcctg tcagtgggtg 540
ccctgacaga gcagatgggg ctgctgctac atgtggttaa cctggccaca attatctgct 600
tcccagcagc tgtggcctta ctggttgagt ctatcactcc agtgggttcc ctgtttgctc 650
tggcatcata ctccatcatc ttcctcaagc ttttctccta ccgggatgtc aatctgtggt 72 0
gccgccagcg aagggtcaag gccaaagctg tgtctgcagg gaagaaggtc agtggggctg 780
ctgcccagaa cactgtaagc tatccggaca acctgaccta ccgagatctc tattactcca 840
tctttgctcc tactttgtgt tatgaactca actttcctcg atccccccga atacgaaagc 900
gctttctgct acggcgggtt cttgagatgc tctttttcac ccagcttcaa gtggggctga 960
tccagcagtg gatggtccct actatccaga actccatgaa gcccttcaag gacatggact 1020
attcacgaac cattgagcgt ctcttaaagc tggcggtccc caaccatctg atacggctca 1080
tcttcttcta ttggcttttc cactcatgtc tcaatgctgt ggcagagctc ctgcagtttg 1140
gagaccgcga gttctacagg gactggCgga atgctgagtc tgtcacctac ttttggcaga 1200
actggaataC ccccgtgcac aagtggtgca tcagacactt ctacaagcct atgctcagac 1260
tgggcagcaa caaacggatg gccaggactg gggtcttttt ggcgtcagcc ttcttccatg 1320
agtacctagC gagcattccc ctgaggatgt tccgcctctg ggcattcaca gccatgatgg 1380
ctcaggtccc actggcctgg attgtgaacc gcttcttcca agggaactat ggcaatgcag 1440
ctgtgtgggc gacactcatc attgggcaac cggtggctgt gctcatgtat gtccacgact IS 00
actacgtgct caactatgat gccccagtgg gggcctgagc tactgccaaa ggccagccct 1560
Gcctaacctg ggcctggagt tctggagggc ttcctggctg cctgcacact cctcctagtc 1620
tgggaggcct ctctgcccct atggggccta ctcctgctct tggggatggc acctgagtcc 1680
agctggtatg agccagtgct gggagtctgt gctgaccagg ggctgaggat atcaataaag 1740
agctatctaa aaaaaaaaaa aaaaaa 1766

c210> 34
PRT
Homo sapiens
34
Arg Arg Ser Leu Leu Asp Glu Leu Leu Glu Val Asp His He Arg Thr
IS 10 15
He Tyr His Met Phe He Ala Leu Leu He Leu Phe He Leu Ser Thr
20 25 30
Leu Val Val Asp Tyr He Asp Glu Gly Arg Leu Val Leu Glu Phe Ser
35 40 45
Leu Leu Ser Tyr Ala Phe Gly Lys Phe Pro Thr Val Val Trp Thr Trp
50 55 60
Trp He Met Phe Leu Ser Thr Phe Ser Val Pro Tyr Phe Leu Phe Gin
65 70 75 80
His Trp Arg Thr Gly Tyr Ser Lys Ser Ser His Pro Leu lie Arg Ser
as 90 95
Leu Phe His Gly Phe Leu Phe Met He Phe Gin He Gly val Leu Gly
100 105 110
Phe Gly Pro Thr Tyr Val Val Leu Ala Tyr Thr Leu Pro Pro Ala Ser
115 120 125
Arg Phe He He He Phe Glu Gin He Arg Phe Val Het Lys Ala His
130 135 140
Ser Phe Val Arg Glu Asn Val Pro Arg Val Leu Asn Ser Ala Lys Glu
145 150 155 160
Lys Ser Ser Thr Val Pro He Pro Thr Val Asn Gin Tyr Leu Tyr Phe
165 170 175
Leu Phe Ala Pro Thr Leu He Tyr Arg Asp Ser Tyr Pro Arg Asn Pro
180 185 190
Thr Val Arg Trp Gly Tyr Val Ala Met Lys Phe Ala Gin Val Phe Gly
195 200 205
Cys Phe Phe Tyr Val Tyr Tyr He Phe Glu Arg Leu Cys Ala Pro Leu
210 215 220
Phe Arg Asn He Lys Gin Glu Pro Phe Ser Ala Arg Val Leu Val Leu
225 230 235 240
Cys Val Phe Asn Ser He Leu Pro Gly Val Leu He Leu Phe Leu Thr
245 250 255
Phe Phe Ala Phe Leu His Cys Trp Leu Asn Ala Phe Ala Glu Met Leu
260 265 270

Arg Phe Gly Asp Arg Met Phe Tyr Lys Asp Trp Trp Asn Ser Thr Ser
275 280 285
Tyr Ser Asn Tyr Tyr Arg Thr Trp Asn Val Val Val His Asp Trp Leu
290 295 300
Tyr Tyr Tyr Ala Tyr Lys Asp Phe Leu Trp Phe Phe Ser Lys Arg Phe
305 310 315 320
Lys Ser Ala Ala Met Leu Ala Val Phe Ala Val Ser Ala Val Val His
325 330 335
Glu Tyr Ala Leu Ala Val Cys Leu Ser Phe Phe Tyr Pro Val Leu Phe
340 345 350
Val Leu Phe Met Phe Phe Gly Met Ala Phe Asn Phe He val Asn Asp
355 360 365
Ser Arg Lys Lys Pro He Trp Asn Val Leu Met Trp Thr Ser Leu Phe
370 375 330
Leu Gly Asn Gly Val Leu Leu Cys Phe Tyr Ser Gin Glu Trp Tyr Ala
385 390 395 40o
Arg Arg His Cys Pro Leu Lys Asn Pro 405
35
409
PRT
Mus tnusculus
35
Arg Gin Ser Leu Leu Asp Glu Leu Phe Glu Val Asp His He Arg Thr
15 10 15
He Tyr His Met Phe He Ala Leu Leu He Leu Phe Val Leu Ser Thr
20 25 30
He Val Val Asp Tyr He Asp Glu Gly Arg Leu Val Leu Glu Phe Aan
35 40 45
Leu Leu Ala Tyr Ala Phe Gly Lys Phe Pro Thr Val He Trp Thr Trp
50 55 60
Trp Ala Met Phe Leu Ser Thr Leu Ser He Pro Tyr Phe Leu Phe Gin
65 70 75 ao
Pro Trp Ala His Gly Tyr Ser Lys Ser Ser His Pro Leu He Tyr Ser
35 90 95
Leu Val His Gly Leu Leu Phe Leu Val Phe Gin Leu Gly Val Leu Gly
100 105 110

Phe Val Pro Thr Tyr Val Val _Leu Ala Tyr Thr Leu Pro Pro Ala Ser
115 ""i20 " 125
Arg Phe lie Leu lie Leu Glu Gin lie Arg Leu lie Met Lys Ala His
130 135 140
Ser Phe Val Arg Glu Asn lie Pro Arg Val Leu Asn Ala Ala Lys Glu
145 150 155 160
Lys Ser Ser Lys Asp Pro Leu Pro Thr Val Asn Gin Tyr Leu Tyr Phe
165 170 175
Leu Phe Ala Pro Thr Leu lie Tyr Arg Asp Asn"Tyr Pro Arg Thr Pro
130 laS 190
Thr Val Arg Trp Gly Tyr Val Ala Met Gin Phe Leu Gin Val Phe Gly
195 200 205
Cys Leu Phe Tyr Val Tyr Tyr lie Phe Glu Arg Leu Cys Ala Pro Leu
210 215 220
Phe Arg Asn lie Lys Gin Glu Pro Phe Ser Ala Arg Val Leu Val Leu
225 230 235 240
Cys Val Phe Asn Ser lie Leu Pro Gly Val Leu lie Leu Phe Leu Ser
245 250 255
Phe Phe Ala Phe Leu His Cys Trp Leu Asn Ala Phe Ala Glu Met Leu
260 265 270
Arg Phe Gly Asp Arg Met Phe Tyr Lys Asp Trp Trp Asn Ser Thr Ser
275 280 285
Tyr Ser Asn Tyr Tyr Arg Thr Trp Asn Val Val Vai His Asp Trp Leu
290 295 300
Tyr Tyr Tyr Val Tyr Lys Asp Leu Leu Trp Phe Phe Ser Lys Arg Phe
305 310 315 320
Lys Ser Ala Ala Met Leu Ala Val Phe Ala Leu Ser Ala Val Val His
325 330 335
Glu Tyr Ala Leu Ala lie Cys Leu Ser Tyr Phe Tyr Pro Val Leu Phe
340 345 350
Val Leu Phe Met Phe Phe Gly Met Ala Phe Asn Phe He Val Asn Asp
355 360 365
Ser Arg Lys Arg Pro He Trp Asn He Met Val Trp Ala Ser Leu Phe
370 375 380
Leu Gly Tyr Gly Leu He Leu Cys Phe Tyr Ser Gin Glu Trp Tyr Ala
385 390 395 400
Arg Gin His Cys Pro Leu Lys Asn Pro 405

36
429
PRT
Saccharomyces cerevisiae
Asp Lys Ala Asp Ala Pro Pro Gly Glu Lys Leu Glu Ser Asn Phe Ser
15 10 IS
Gly He Tyr Val Phe Ala Trp Met Phe Leu Gly Trp lie Ala He Arg
20 25 30
Cys Cys Thr Asp Tyr Tyr Ala Ser Tyr Gly Ser Ala Trp Asn Lys Leu
35 40 45
Glu He Val Gin Tyr Met Thr Thr Asp Leu Phe Thr He Ala Met Leu
50 55 60
Asp Leu Ala Met Phe Leu Cys Thr Phe Phe Val Val Phe Val His Trp
65 70 75 80
Leu Val Lys Lys Arg He He Asn Trp Lys Trp Thr Gly Phe Val Ala
85 30 95
Val Ser He Phe Glu Leu Ala Phe He Pro Val Thr Phe Pro He Tyr
100 105 110
Val Tyr Tyr Phe Asp Phe Asn Trp Val Thr Arg He Phe Leu Phe Leu
115 120 125
His Ser Val Val Phe Val Met Lys Ser His Ser Phe Ala Phe Tyr Asn
130 135 140
Gly Tyr Leu Trp Asp He Lys Gin Glu Leu Glu Tyr Ser Ser Lys Gin
145 150 155 160
Leu Gin Lys Tyr Lys Glu Ser Leu Ser Pro Glu Thx Arg Glu He Leu
165 170 175
Gin Lys Ser Cys Asp Phe Cys Leu Phe Glu Leu Asn Tyr Gin Thr Lys
180 IBS 190
Asp Asn Asp Phe Pro Asn Asn He Ser Cys Ser Asn Phe Phe Met Phe
195 200 205
Cys Leu Phe Pro Val Leu Val Tyr Gin He Asn Tyr Pro Arg Thr Ser
210 215 220
Arg He Arg Trp Arg Tyr Val Leu Glu Lys Val Cys Ala He He Gly
225 230 23S 240
Thr He Phe Leu Met Met Val Thr Ala Gin Phe Phe Met His Pro Val
245 250 255
Ala Met Arg Cys He Gin Phe His Asn Thr Pro Thr Phe Gly Gly Trp
260 265 270

lie Pro Ala Thr Gin Glu Trp Phe His Leu Leu Phe Asp Met He Pro
275 ■ 280 2SS
Gly Phe Thr Val Leu Tyr Met Leu Thr Phe Tyr Met He Trp Asp Ala
290 295 300
Leu Leu Asn Cys Val Ala Glu Leu Thr Arg Phe Ala Asp Arg Tyr Phe
305 310 315 320
Tyr Gly Asp Trp Trp Asn Cys Val Ser Phe Glu Glu Phe Ser Arg lie
325 330 335
Trp Asn Val Pro Val His Lys Phe Leu Leu Arg His Val Tyr His Ser
340 345 350
Ser Met Gly Ala Leu His Leu Ser Lys Ser Gin Ala Thr Leu Phe Thr
355 3S0 365
Phe Phe Leu Ser Ala Val Phe His Glu Met Ala Met Phe Ala He Phe
370 375 380
Arg Arg Val Arg Gly Tyr Leu Phe Met Phe Gin Leu Ser Glr Phe Val
385 390 395 400
Trp Thr Ala Leu Ser Asn Thr Lys Phe Leu Arg Ala Arg Pro Gin Leu
405 410 415
Ser Asn Val Val Phe Ser Phe Gly Val Cys Ser Gly Pro
420 425
37
432
PRT
37
Glu Thr Val Val Thr Val Glu Thr Thr He He Ser Ser Asn Phe Ser
15 10 15
Gly Leu Tyr Val Ala Phe Trp Met Ala He Ala Phe Gly Ala Val Lys
20 25 30
Ala Leu He Asp Tyr Tyr Tyr Gin His Asn Gly Ser Phe Lys Asp Ser
35 40 45
Glu He Leu Lys Phe Met Thr Thr Asn Leu Phe Thr Val Ala Ser Val
50 55 60
Asp Leu Leu Met Tyr Leu Ser Thr Tyr Phe Val Val Gly He Gin Tyr
65 70 75 80
Leu Cys Lys Trp Gly Val Leu Lys Trp Gly Thr Thr Gly Trp He Phe
85 90 95

Thr Ser He Tyr Glu Phe Leu Phe Val He Phe Tyr Met Tyr Leu Thr
100 105 110
Glu Asn He Leu Lys Leu His Trp Leu Ser Lys He phe Leu Phe Leu
115 120 125
His Ser Leu Val Leu Leu Met Lys Met His Ser Phe Ala Phe Tyr Asn
130 135 140
Gly Tyr Leu Trp Gly He Lys Glu Glu Leu Gin Phe Ser Lys Ser Ala
145 150 155 ISO
Leu Ala Lys Tyr Lys Asp Ser He Asn Asp Pro Lys Val He Gly Ala
165 170 175
Leu Glu Lys Ser Cys Glu Phe Cys Ser Phe Glu Leu Ser Ser Gin Ser
180 185 190
Leu Ser Asp Gin Thr Gin Lys Phe Pro Asn Asn He Ser Ala Lys Ser
195 200 205
Phe Phe Trp Phe Thr Met Phe Pro Thr Leu He Tyr Gin He Glu Tyr
210 215 220
Pro Arg Thr Lys Glu lie Arg Trp Ser Tyr Val Leu Glu Lys He Cya
225 230 235 240
Ala He Phe Gly Thr He Phe Leu Met Met He Asp Ala Gin He Leu
245 250 255
Met Tyr Pro Val Ala Met Arg AJa Leu Ala Val Arg Asn Ser Glu Trp
260 265 270
Thr Gly He Leu Asp Arg Leu Leu Lys Trp Val Gly Leu Leu Val Asp
275 280 285
He Val Pro Gly Phe He Val Met Tyr He Leu Asp Phe Tyr Leu He
290 295 300
Trp Asp Ala He Leu Asn Cys Val Ala Glu Leu Thr Arg Phe Gly Asp
305 310 315 320
Arg Tyr Phe Tyr Gly Asp Trp Trp Asn Cys Val Ser Trp Ala Asp Phe
325 330 335
Ser Arg He Trp Asn He Pro Val His Lys Phe Leu Le- Arg His Val
340 345 350
Tyr His Ser Ser Met Ser Ser Phe Lys Leu Asn Lys Ser Gin Ala Thr
355 360 365
Leu Met Thr Phe Phe Leu Ser Ser Val Val His Glu Leu Ala Met Tyr
370 375 380
Val He Phe Lys Lys Leu Arg Phe Tyr Leu Phe Phe Phe Gin Met Leu
385 390 395 400

Gin Met Pro Leu Val Ala Leu Thr Asn Thr Lys Phe Met Arg Asn Arg
405 410 415
Thr lie He Gly Asn Val He Phe Trp Leu Gly He Cys Met Gly Pro
420 425 430
38
1942
Arabidopsia thaliana
38
ctctcgtgaa tcctttttcc tttcttcttc ttcttctctt cagagaaaac tttgcttctc 60
tttctataag gaaccagaca cgaatcccat tcccaccgat ttcttagctt cttccttcaa 120
tccgctcttt ccctctccat tagattctgt ttcctctttc aatttcttct gcatgctcct 180
cgattctctc tgacgcctct tttctcccga cgctgtttcg tcaaacgctt ttcgaaacgg 240
cgattttgga ttctgctggc gttactacgg tgacggagaa cggtggcgga gagttcgtcg 300
aCcttgatag gcttcgtcga cggaaatcga gatcggattc ttctaacgga cttcttccct 360
ctggtCccga taataattct ccttcggatg atgttggagc tcccgccgac gttagggatc 420
ggattgattc cgctgttaac gatgacgctc agggaacagc caatttggcc ggagataata 430
acggtggtgg cgataataac ggtggtggaa gaggcggcgg agaaggaaga ggaaacgccg 540
atgctacgct tacgtatcga ccgtcggttc cagctcaCcg gagggcgaga gagagtccac 600
ttagccccga cgcaatcttc aaacagagcc atgccggatt attcaacctc tgtgtagcag 660
ttcttattgc tgtaaacagt agacCcaCca tcgaaaatct tatgaagtat ggttggttga 720
tcagaacgga tttctggttt agttcaagat cgctgcgaga ttggccgctt ttcatgtgtt 780
gtacatcccc ttcgatcttt cctttggctg cctttacggt tgagaaattg gcacttcaga 840
aatacatacc agaacctgct gtcatctttc ttcatattat tatcaccatg acagaggctc 900
tgtatccagc ttacgtcacc ctaaggtgtg attctgcttt tttatcaggt gtcacttcga 960
tgctcctcac ttgcattgtg tggctaaagt tggtttctta tgctcatacC agctatgaca 102 0
taagacccct agccaatgca gctgataagg ccaatcctga agtctcctac tacgttagct 1080
tgaagagctt ggcatatttc atggtcgctc ccacattgtg ttatcagcca agttatccac 1140
gttctgcatg catacggaag ggttgggtgg ctcgtcaatt Cgcaaaactg gtcatatcca 1200
ccggattcat gggatttata atagaacaat atataaatcc tattgtcagg aactcaaagc 1260
atcctttgaa aggcgatctt ctatatgcta ttgaaagagt gttgaagctt tcagttccaa 1320
atttatatgt gtggctctgc atgttctact gcttcttcca cctttggtta aacatatcgg 1380
cagagcttct ctgcttcggg gatcgtgaat tctacaaaga ttggtggaat gcaaaaagtg 1440
tgggagatta ctggagaatg tggaatatgc ctgttcataa atggatggtt cgacatatat 1500
acttcccgtg cttgcgcagc aagataccaa agacactcgc cattatcatt gctttcccag 1560
tctctgcagt ctttcatgag cCatgcatcg cagttccttg tcgtctcttc aagctatggg 1620
cttttcttgg gattatgttt caggtgcctt tggtcttcat cacaaactat ctacaggaaa 1680
ggtttggctc aacggtgggg aacatgatct tctggttcat cttctgcatt ttcggacaac 1740
cgatgtgtgt gcttctttat taccacgacc tgatgaaccg aaaaggatcg atgtcatgaa 1300
acaactgttc aaaaaatgac tttcttcaaa catctatggc ctcgttggat ctccgttgat 1860
gttgtggtgg ttctgatgct aaaacgacaa atagtgttat aaccattgaa gaagaaaaga 1920
caattagagt tgttgtatcg ca 1942
39
520
PRT
Arabidopsis thaliana

39
Met Ala He Leu Asp Ser Ala Gly Val Thr Thr Val Thr Glu Asn Gly
15 10 15
Gly Gly Glu Phe Val Asp Leu Asp Arg Leu Arg Arg Arg Lys Ser Arg
20 25 30
Ser ASp Ser Ser Asn Gly Leu Leu Leu Ser Gly Ser Asp Asn Asn Ser
35 40 45
Pro Ser Asp Asp Val Gly Ala Pro Ala Asp Val Arg Asp Arg He Asp
SO 55 60
Ser Val Val Asn Asp Asp Ala Gin Gly Thr Ala Asn Leu Ala Gly Asp
65 70 75 80
Asn Asn Gly Gly Gly Asp Asn Asn Gly Gly Gly Arg Gly Gly Gly Glu
85 90 95
Gly Arg Gly Asn Ala Asp Ala Thr Phe Thr Tyr Arg Pro Ser Val Pro
100 105 110
Ala His Arg Arg Ala Arg Glu Ser Pro Leu Ser Ser Asp Ala He Phe
115 12 0 125
Lys Gin Ser His Ala Gly Leu Phe Asn Leu Cys Val Val Val Leu He
130 135 140
Ala Val Asn Ser Arg Leu He He Glu Asn Leu Met Lys Tyr Gly Trp
1-15 150 155 160
Leu He Arg Thr Asp Phe Trp Phe Ser Ser Arg Ser Leu Arg Asp Trp
165 170 175
Pro Leu Phe Met Cys Cys He Ser Leu Ser He Phe Pro Leu Ala Ala
180 185 190
Phe Thr Val Glu Lys Leu Val Leu Gin Lys Tyr He Ser Glu Pro Val
195 200 205
Val He Phe Leu His He He He Thr Met Thr Glu Val Leu Tyr Pro
210 215 220
Val Tyr Val Thr Leu Arg Cys Asp Ser Ala Phe Leu Ser Gly Val Thr
225 230 235 240
Leu Met Leu Leu Thr Cys He Val Trp Leu Lys Leu Val Ser Tyr Ala
245 250 255
His Thr Ser Tyr Asp He Arg Ser Leu Ala Asn Ala Ala Asp Lys Ala
260 265 270
Asn Pro Glu Val Ser Tyr Tyr Val Ser Leu Lys Ser Leu Ala Tyr Phe
275 280 285

Met Val Ala Pro Thr Leu Cys Tyr Gin Pro Ser Tyr Pro Arg Ser Ala
290 " 295 300
Cys He Arg Lys Gly Trp Val Ala Arg Gin Phe Ala -Lys Leu Val He
305 310 315 320
Phe Thr Gly Phe Met Gly Phe He He Glu Gin Tyr He Asn Pro He
325 330 335
Val Arg Asn Ser LyS His Pro Leu Lys Gly Asp Leu Leu Tyr Ala He
340 34S 350
Glu Arg Val Leu Lys Leu Ser Val Pro Asn Leu Tyr Val Trp Leu Cys
355 3fi0 365
Met Phe Tyr Cys Phe Phe His Leu Trp Leu Asn He Leu Ala Glu Leu
370 375 380
Leu Cys Phe Gly Asp Arg Glu Phe Tyr Lys Asp Trp Trp Asn Ala Lys
385 390 395 400
Ser Val Gly Asp Tyr Trp Arg Met Trp Asn Met Pro Val His Lys Trp
405 410 415
Met Val Arg His He Tyr phe Pro Cys Leu Arg Ser Lys He Pro Lys
420 425 430
Thr Leu Ala He He He Ala Phe Leu Val Ser Ala Val Phe His Glu
435 440 445
Leu Cys He Ala Val Pro Cys Arg Leu Phe Lys Leu Trp Ala Phe Leu
450 455 460
Gly He Met Phe Gin Val Pro Leu Val Phe He Thr Aan Tyr Leu Gin
465 470 475 480
Glu Arg Phe Gly Ser Thr val Gly Asn Met He Phe Trp Phe lie Phe
485 490 495
Cys He Phe Gly Gin Pro Met Cys Val Leu Leu Tyr Tyr His Asp Leu
500 505 510
Met Asn Arg Lys Gly Ser Met Ser
515 520
40
29
DNA
Artificial Sequence


Description of Artificial Sequence: Synthetic oligonucleotide primer
40
tgcaaattga cgagcacacc aaccccttc
41 26 DNA
Artificial Sequence

Description of Artificial Sequence: Synthetic oligonuclotide primer
41
aaggatgctt tgagttcctg acaatagg
42
1942
DNA
Arabidopsis thaliana
42
ctctcgtgaa tcctttttcc tttcttcttc ttcttctctt cagagaaaac tttgcttctc 60 tttctataag gaaccagaca cgaatcccat tcccaccgat ttcttagctc cttccttcaa 120 tccgctcttt ccctctccat tagattctgt ttcctctttc aatttcttcc gcatgcttct 180 cgattctctc tgacgcctct tctctcccga cgctgtttcg tcaaacgctt ttcgaaatgg 240 cgactttgga ttctgctggc gttactacgg tgacggagaa cggtggcgga gagttcgtcg 300 atcctgatag gcttcgtcga cggaaatcga gatcggattc ttctaacgga cttcttctct 360 ctggttccga taataattct ccttcggatg-atgttggagc tcccgccgac gttagggatc 420 ggattgattc cgttgttaac gatgacgctc agggaacagc caatttggcc ggagataaCa 480 acggtggtgg cgataataac ggtggtggaa gaggcggcgg agaaggaaga ggaaacgccg 540 atgctacgtt tacgCatcga ccgtcggttc cagctcatcg gagggcgaga gagagtccac 600 ctagctccga cgcaatcttc aaacagagcc atgccggatt attcaacctc tgtgtagtag 660 ttcttattgc tgtaaacagt agactcatca tcgaaaatct taCgaagtat ggttggttga 720 tcagaacgga tttctggttt agttcaagat cgctgcgaga ttggccgctt ttcatgtgtt 780 gtatatccct ttcgatcttt cctttggctg cctttacggt tgagaaattg gtacttcaga 840 aatacatatc agaacctgtt gccatctttc ttcatattat taccaccatg acagaggttt 900 tgtatccagt ttacgtcacc ccaaggtgtg attctgcttt tttatcaggt gtcaccttga 960 tgctcctcac ttgcattgtg tggctaaagt Cggtttctta tgctcatact agctatgaca 1020 taagatccct agccaatgca gctgataagg ccaatcctga agtctcctac tacgttagct 1080 tgaagagctt ggcatatttc acggtcgctc ccacattgtg ttatcagcca agttatccac 114 0 gttctgcatg tatacggaag ggttgggtgg ctcgtcaatt tgcaaaactg gtcatattca 1200 ccggattcat gggatttata atagaacaat atataaatcc tattgtcagg aactcaaagc 1260 atccttcgaa aggcgatctt ctatatgcta ttgaaagagt gttgaagctt tcagttccaa 1320 atttatatgt gtggctctgc acgttctact gcttcttcca cctttggtta aacatattgg 1380 cagagcttct ctgcttcggg gatcgtgaat tctacaaaga ttggtggaat gcaaaaagtg 1440 tgggagatta ctggagaatg tggaatatgc ctgttcataa atggatggtt cgacatatat 1500 acttcccgtg cttgcgcagc aagataccaa agacactcgc cattatcatt gctttcctag 15S0 tctctgcagt ctttcatgag ctatgcatcg cagttccttg tcgtctcttc aagctatggg 1620 cttttcttgg gattatgttt caggtgcctt tggtcttcat cacaaactat ctacaggaaa 1630 ggtttggctc aacggtgggg aacatgatct tctggctcat cttctgcatt ttcggacaac 1740

cgatgtgtgt gcStctttat cacqacgacc tgatgaaccg aaaaggatcg acgtcatgaa laoo
acaactgttc aaaaaatgac cttctccaaa catctatggc ctcgttggat ctccgttgat 1860
gttgtggtgg ttctgatgct aaaacgacaa atagtgCtat aaccattgaa gaagaaaaaa 1920
caattagagt tgttgtatcg ca 1942
43
234
UNA
Glycine max

unsure (1),.(234) n=unknown
43
gtaagcttca agagcttagc atanttcctg gttgccccta ncattatgtt accagccaan 60
ctatcctcgc acaccttata ctcgaaaggg ttggctgttt cgccaacttg tcaactgata 120
atatttacag gagttatggg atttataata gaacaataca ttaatcccat tgtacaaaat 180
tcacagcatc ctctcaaggg aaaccttctt tacgccatcg agagagttct gaag 234
44
267
Glycine max
44
ctgcttttgt atctggtgtc acgttgatgc tactaacttg cattgtgtgg ttaaaattgg 60
tgccatatgc acatacaaac tatgatatga gagcacttac tgtttcgaac gaasagggag 120
aaacattacc caatactttg atatggagta tccgtacacc gtgaccttca ggagtttggc 180
atacttcatg gttgctccta cattatgcta tcagacaagc tatcctcgca caccttcagt 240
tcgaaagggt tgggtgcttc gtcaact 267
45
275
DMA
Glycine max

unsure (1)..(275) n=unknown
45
gtggaatgcc aaaactgttg aagattattg gaggatgtgg aatatgcctg ttcacaaatg 60
gatgatccgc cacctatatt ttccatgttt aaggcacggt ataccaaagg ccgttgctct 120
tttaattgcc ttcctggttc tgctttattc catgagctgt gcatcgctgc tccttgccca 180
catattcaag tngtgggttt cngnggaatt nagtttcagg tnccttgggt ttcnaccnna 240
attimtnggc naaaaaattc cnngaacccc ggggg 275

4S
257
DNA
Glycine max
46
aacggaactg agactccaga gaatatgcca aaatgtatta ataattgtca caacttggaa 60
ggcttttgga aaaactggca tgcttccttc aacaagtggc ttgtgaggta tatatacatt 12 0
cctcttgggg gatctaagaa aaagctacta aatgtgtggg ttgttttcac acttgttgca lao
atctggcatg atttagagtg gaagcttctt tcatgggcat ggttgacgtg tctattcttc 240
atccctgagt tggtttt 257
47 253 DNA agaaaatgga acatgcctgt gcataaatgg attgttcgtc atatatattt tccttgcatg 60
cgaaatggta tatcaaagga agttgctgtt tttatatcgt tcttgtttct gctgtacttc 120
atgagttatg tgttgctgtt ccctgccaca tactcaagtt ctgggctttt tttaggaatc 130
atgcttcaga ttcccctcat catattgaca tcatacctca aaaataaatt cagtgacaca 240
atggttggca ata 253
tgaagtatgg cttattaata agatctggct cttggtttaa tgctacatca ttgcgagact 60
ggccactgct aatgtgttgc cttagtctac ccatatttcc ccttggtgca tttgcagtcg 120
aaaagttggc attcaacaat ctcattagtg atcctgctac tacctgtttt cacatccttt ISO
ttacaacatt tgaaattgta tatccagtgc tcgtgattct taagtgtgat tctgcagttt 240
tatcaggctt tgtg 254
49 262 DNA Zea mays
49
gaagtatggc ttattaataa gatctggctt ttggtttaat gctacatcat tgcgagactg 60
gccactgcta atgtgttgcc ttagtctacc catatttccc cttggtgcat ttgcagtcga 120
aaagttggca ttcaacaatc tcattagtga tcctgctact acctgttttc acatcctttt 180
tacaacattt gaaattgtat atccagtgct cgtgattctt aagtgtgatt ctgcagtttt 240
acaggctttg tgttgatgtt ta 262
SO
325
DHA
Zea mays


unsure {1}..(325) n=unknown
50
taatcnaacc tcgntncngg ctcagctgta tnccacgaga tatgtaatgc ggtgccgtgc 60
cacatantca nacctnggca tnncngggat catngctcag ataccgntgg nactcctgac 12 0
aagatatctc caCgctacgt tcaagcaCgt aatggtgggc aacatgatan ttcggntcnn 180
cagtatagtc ggacagccga tgtimimima tctatactac catgacgtca tgaacaggca 240
ggcccaggca agtagatagt ncggcagaga catgtacttc aacatcganc atcagnagca 300
nacngagcga gcggcangaa ncagc 32 5
51
519
DNA
Mortierrella alpina
unsure (1)..{519) n=uiiknown
51
gagnnnngna acgtttagcc tnccgtagcc gccaaaatcc aagggncnac cnaccctncg 60
ttanactnaa ttngaaaatn crmncccaac ttnaggnact tnnagncccc ccnacttgac 12 0
aacggagcac tatatttacc ccgtggtngt tcaacccagc catctcaccc ttgcgagcat 180
tggtgctgct ctCgataccc ttcatgctta actatctcat gatcttttac atcattttcg 240
agtgcatctg csacgccttt gcggaactaa gttgctttgc ggatcgcaac ttttacgagg 300
atcggtggaa ctgcgtcagc tttgatgagt gggcacgcaa atggascaag cctgtgcaac 360
acttcttgct ccgccacgtg tacgacccga gcatccgagt ccttccactt gtccgaaatc 420
caatgccgcn aattgcaaac gttccttccc ggtcgtcaat gcgttcaacg aacctgggtg 480
aagaatgggt ggtgacaacg ttaaagtgcg cccggtatc 513
52
45
c212> DSA
Artificial Sequence

Description of Artificial Sequence: Oligonucleotide primer
52
ggatccgcgg ccgcacaatg aaaaaaatat cttcacatta ttcgg
53
40
DNA
c213> Artificial Sequence


Description of Artificial Sequence: Oligonucleotide primer
S3
ggatcccctg caggtcattc attgacggca ttaacattgg
54
44
DNA
Artificial Sequence

Description of Artificial Sequence; Synthetic oligonucleotide primer
54
ggatccgcgg ccgcacaatg ggagcgaatt cgaaatcagt aacg
55 40 DNA
Artificial Sequence

Description of Artificial Sequence: Synthetic oligonucleotide primer
55
ggatcccctg caggttaata cccactttta tcaagctccc
56
":211> 41
DNA
Artificial Sequence

Description of Artificial Sequence-. Synthetic oligonucleotide primer
56
ggatccgcgg ccgcacaatg tctctattac tggaagagat c
57
41
DNA
Artificial Sequence

Description of Artificial Sequence: Synthetic oligonucleotide primer

c400i 57
ggatcccctg caggttatgc atcaacagag acacttacag c
58
41
DNA
Artificial Sequence

Description of Artificial Sequence: Synthetic oligonucleotide primer
53
ggatccgcgg ccgcacaatg ggctggattc cgtgtccgtg c
59
38
DNA
Artificial Sequence

Description of Artificial Sequence: Synthetic oligonucleotide primer
59
ggatcccctg caggttaacc agaatcaact actctgtg
c210> 60
39
DNA
Artificial Sequence

Description of Artificial Sequence; Oligonucleotide primer
c400> 60
ccgacctgca ggaagcttag aaatggcgat tttggattc
61
36
DNA
Artificial Sequence

Description of Artificial Sequence; Oligonucleotide primer
c400> 61
ggatccgcgg ccgctcatga catcgatcct tttcgg

62 56
DNA
Artificial Sequence

Description of Artificial Sequence; Annealed oligonucleotide adapter
62
cgcgatttaa atggcgcgcc ctgcaggcgg ccgcctgcag ggcgcgccat ttaaac
63
32
DKA
Artificial Sequence

Description of Artificial Sequence: Ligating oligonucleotide
63
tcgaggatcc gcggccgcaa gcttcctgca gg
64
32
DNA
Artificial Sequence

Description of Artificial Sequence: Ligating oligonucleotide
64
tcgacctgca ggaagcttgc ggccgcggat cc
65
32
DHA
Artificial Sequence
■;220>
Description of Artificial Sequence: Ligating oligonucleotide
65
tcgacctgca ggaagcttgc ggccgcggat cc

66
32
DNA
Artificial Sequence
t;220>
Description of Artificial Sequence; Ligating oligonucleotide
66
tcgaggaccc gcggccgcaa gcttcctgca gg
67
36
DNA
Artificial Sequence

Description of Artificial Sequence: Ligating oligonucleotide
67
tcgaggatcc gcggccgcaa gcttcctgca ggagct
6a
28
DNA
Artificial Sequence
<:220>
Description of Artificial Sequence: Ligating oligonucleotide
68
cctgcaggaa gcttgcggcc gcggatcc
69
36
DNA
Artificial Sequence

Description of Artificial Sequence: Ligating o1i gonuc1eot i de
63
tcgacctgca ggaagcttgc ggccgcggat ccagct

70
2B
DHA
Artificial Sequence

Description of Artificial Sequence: Ligating oligonucleotide
70
ggatccgcgg ccgcaagctc cctgcagg
71
<:211> 39
DKA
Artificial Sequence
Description of Artificial Sequence: Ligating oligonucleotide
71
gatcacctgc aggaagcttg cggccgcgga tccaatgca
72
31
DNA
Artificial Sequence

Description of Artificial Sequence: Ligating oligonucleotide
72
ttggatccgc ggccgcaagc ttcctgcagg t
73
2013
DNA
«:213> Arabidopsis thaliana
73
atgcccctta ttcatoggaa aaagccgacg gagaaaccat cgacgccgcc atctgaagag 60
gtggtgcacg atgaggattc gcaaaagaaa ccacacgaat cttccaaatc ccaccataag 120
aaatcgaacg gaggagggaa gtggtcgtgc atcgactctt gttgttggtt cattgggCgt 130
gtgtgtgtaa cctggtggtt tcttctcttc ctttacaacg caatgcctgc gagcttccct 240
cagtatgtaa cggagcgaat cacgggtcct ttgcctgacc cgcccggtgt taagctcaaa 300
aaagaaggtc ttaaggcgaa acatcctgtt gtcttcattc ctgggattgt caccggtggg 360
ctcgagcttu gggaaggcaa acaatgcgct gatggtttat ttagaaaacg tttgtggggt 420
ggaacttttg gtgaagtcta caaaaggcct ctatgttggg tggaacacat gtcacttgac 480
aatgaaactg ggttggatcc agctggtatt agagttcgag ctgtatcagg actcgtggct S40
gctgactact ttgctcctgg ctacttgtc tgggcagtgc tgattgctaa ccttgcacat SCO
attggatatg aagagaaaaa tatgtacatg gctgcatatg actggcggct ttcgtttcag 6€0

aacacagagg tacCgatca gactcttagc cgcatgaaaa gtaatataga gttgatggct TZQ.
tctaccaacg gtg"gaaaaaa agcagttaca getccgcatt ccatgggggt cttgtacctt 780
ctacatccta tgaagtgggt cgaggcacca gctcccctgg gtggcggggg cgggccagat 640
tggtgtgcaa agtatattaa ggcggtgatg aacattggtg gaccatttcc tggtgtccca 900
aaagctgttg cagggctttt ctctgctgaa gcaaaggatg ttgcagttgc cagagcgatt 960
gccccaggat Ccttagacac cgacatattt agacttcaga ccttgcagca tgcaatgaga 1020
atgacacgca catgggactc aacaatgtct atgctaccga agggaggtga cacgatatgg 1O80
ggcgggcttg attggtcacc ggagaaaggc cacacctgtt gtgggaaaaa gcaaaagaac 1140
aacgaaactt gtggtgaagc aggtgaaaac ggagtttcca agaaaagtcc tgttaactac 1200
ggaaggatga tatcttttgg gaaagaagta gcagaggctg cgccatctga gactaataat 12 60
atugattttc gaggtgctgt caaaggtcag agtatcccaa atcacacctg tcgtgacgtg 1320
tggacagagt accatgacat gggaattgct gggatcaaag ctatcgctga gcataaggtc 1330
tacactgctg gtgaagctat agatctacta cattatgttg ctcctaagat gatggcgcgt 1440
ggtgccgctc atttctctta tggaattgct gatgatttgg atgacaccaa gcatcaagat 1500
cccaaatact ggtcaaatcc gttagagaca aaattaccga atgctcctga gatggaaatc 1560
tactcactat acggagtggg gataccaacg gaacgagcat acgtatacaa gcttaaccag 1620
tctcccgaca gttgcatccc ctttcagata ttcacttctg cccacgagga ggacgaagat 1680
agctgtctga aagcaggagt ttacaatgtg gatggggatg aaacagtacc cgtcctaagt 174 0
gccgggtaca tgtgtgcaaa agcgtggcgt ggcaagacaa gattcaaccc ttccggaatc laoo
aagacttata taagagaata caatcactct ccgccggcta acctgctgga agggcgcggg 1860
acgcagagtg gtgcccatgt tgatatcatg ggaaactttg ccttgatcga agatatcatg 1920
agggttgccg ccggaggtaa cgggtctgac ataggacatg accaggtcca ctctggcata 1980
tttgaatggt cggagcgtat tgacctgaag ctg 2013
74
671
PRT
ArabidopsiG thaliana
74
Met Pro Leu lie His Pg Lys Lys Pro THr GLu Lys Pro Ser Thr Pro
15 10 IS
Pro Ser Glu Glu Val Val His Asp Glu Asp Ser Gin Lys Lys Pro His
20 25 30
Glu Ser Ser Lys Ser His His Lys Lys Ser Asn Gly Gly Gly Lys Trp
35 40 45
Ser Cys lie Asp Ser Cys Cys Trp Pbe He Gly Cys Val Cys Val Thr
50 55 60
Trp Trp Phe Leu Leu Phe Leu Tyr Asn Ala Met Pro Ala Ser Phe Pro
55 70 75 ao
Gin Tyr Val Thr Glu Arg lie Thr Gly Pro Leu Pro Asp Pro Pro Gly
85 90 95
Val Lys Leu Lys Lys Glu Gly Leu Lys Ala Lys His Pro Val Val Phe
100 lOS 110
lie Pro Gly lie Val Thr Gly Gly Leu Glu Leu Trp Glu Gly Lys Gin
115 120 125

Cys Ala Asp Gly Leu ?he Arg Lys Arg Leu Trp Gly Gly Thr Phe Gly
130 135 140
Glu Val Tyr Lys Arg Pro Leu Cys Trp Val Glu His Met Ser Leu Asp
145 150 155 160
Asn Glu Thr Gly Leu Asp Pro Ala Gly lie Arg Val Arg Ala Val Ser
165 170 175
Gly Leu val Ala Ala Asp Tyr Phe Ala Pro Gly Tyr Phe Val Trp Ala
180 1B5 190
Val Leu lie Ala Asn Leu Ala His lie Gly Tyr Glu Glu Lys Asn Me
195 200 205
Tyr Met Ala Ala Tyr Asp Trp Arg Leu Ser Phe Gin Asn Thr Glu Val
210 215 220
Arg Asp Gin Thr Leu Ser Arg Met Lys Ser Asn lie Glu Leu Met Val
225 230 235 240
Ser Thr Asn Gly Gly Lys Lys Ala Val lie Val Pro His Ser Met Gly
245 250 255
Val Leu Tyr Phe Leu His Phe Met Lys Trp Val Glu Ala Pro Ala Pro
260 265 270
Leu Gly Gly Gly Gly Gly Pro Asp Trp Cys Ala Lys Tyr He Lys Ala
275 280 235
Val Met Asn lie Gly Gly Pro Phe Leu Gly Val Pro Lys Ala Val Ala
290 295 3O0
Gly Leu Phe Ser Ala Glu Ala Lys Asp Val Ala val Ala Arg Ala He
305 310 315 320
Ala Pro Gly Phe Leu Asp Thr Asp He Phe Arg Leu Gin Thr Leu Gin
325 330 335
His Val Met Arg Met Thr Arg Thr Trp Asp Ser Thr Met Ser Met Leu ""
340 345 350
Pro Lys Gly Gly Asp Thr He Trp Gly Gly Leu Asp Trp Ser Pro Glii
355 360 365
Lys Gly His Thr Cys Cys Gly Lys Lys Gin Lys Asn Asn Glu Thr Cys
370 375 380
Gly Glu Ala Gly Glu Asn Gly Val Ser Lys Lys Ser Pro Val Asn Tyr
385 390 395 400
Gly Arg Met He Ser Phe Gly Lys Glu Val Ala Glu Ala Ala Pro Ser
405 410 415
Glu He Asn Asn He Asp Phe Arg Gly Ala Val Lys Gly Gin Ser He
420 425 430

Pro Asn His Thr Cys Arg .Asp Val Trp Thr Glu Tyr His Asp MeC Gly
435 440 445
lie Ala Gly He Lys Ala He Ala Glu Tyr Lys Val Tyr Thr Ala Gly
450 455 460
Glu Ala rie Asp Leu Leu His Tyr Val Ala Pro Lys Met Met Ala Arg
4S5 470 475 480
Gly Ala Ala His Phe Ser Tyr Gly lie Ala Asp Asp Leu Asp Asp Thr
485 490 495
Lys Tyr Gin Asp Pro Lys Tyr Trp Ser Asn Pro Leu Glu Thr Lys Leu
500 505 SIO
Pro Asn Ala Pro Glu Met Glu lie Tyr Ser Leu Tyr Gly Val Gly He
515 520 525
Pro Thr Glu Arg Ala Tyr Val Tyr Lys Leu Asn Gin Ser Pro Asp Ser
530 535 540
Cys He Pro Phe Gin He Phe Thr Ser Ala His Glu Glu Asp Glu Asp
545 550 555 560
Ser Cys Leu Lys Ala Gly Val Tyr Asn Val Asp Gly Asp Glu Thr Val
565 570 575
Pro Val Leu Ser Ala Gly Tyr Met Cys Ala Lys Ala Trp Arg Gly Lys
530 585 590
Thr Arg Phe Asn Pro Ser Gly He Lys Thr Tyr He Arg Glu Tyr Asn
595 600 605
His Ser Pro Pro Ala Asn Leu Leu Glu Gly Arg Gly Thr Gin Ser Gly
610 615 620
Ala His Val Asp He MeC Gly Asn Phe Ala Leu He Glu Asp He Met
625 630 635 640
Arg Val Ala Ala Gly Gly Asn Gly Ser Asp He Gly His Asp Gin Val
645 650 655
His Ser Gly He Phe Glu Trp Ser Glu Arg He Asp Leu Lys Leu
560 665 670
=:210> 75
c211> 1986
!:212> DNA
213> Saccharomyces cerevisiae
:400> 75
itgggcacac tgtttcgaag aaatgtccag aaccaaaaga gtgattctga tgaaaacaat 60
laagggggtt ctgttcataa caagcgagag agcagaaacc acactcatca tcaacaggga 120
:taggc:cata agagaagaag gggtaccagt ggcagtgcaa aaagaaatga gcgtggcaaa 180
jacttcgaca ggaaaagaga cgggaacggt agaaaacgtt ggagagatcc cagaagaccg 240

attttcaCtc ttggtgcatt cttaggtgta ctttCgccgt tcagctttgg cgcttaccaC 300
gttcacaata gcgatagcga ctcgtctgac aactttgcaa attttgattc acctaaagtg 3S0
tatttggatg actggaaaga tgctctccca caaggtataa gctcgtttat cgatgacacc 420
caggctggta actactccac atcttcttta gatgatctca gtgaaaattt tgccgttggt 480
aaacaacCcc tacgtgatta taatatcgag gccaaacatc ctgttgtaac ggtccctggc 540
gtcatttcta cgggaattga aagctgggga gttattggag acgatgagtg cgatagctct 600
gcgcaCtttc gtaaacggct gtggggaagt ttttacatgc tgagaacaat ggttatggat 660
aaagtttgtt ggttgaaaca tgtaatgtta gatcctgaaa caggtctgga cccaccgaac 720
ttcacgctac gcgcagcaca gggcttcgaa tcaactgatt actccatcgc agggtactgg 780
atttggaaca aagttttcca aaatctggga gtaattggct atgaacccaa taaaatgacg 340
agcgctgcgt acgattggag gcttgcatat ttagatctag aaagacgcga caggtacttt 900
acgaagccaa aggaacaaac cgaactgttt catcaattga gtggtgaaaa agtttgCtta 960
attggacatt ctaCgggttc tcagattatc ttccacttta tgaaatgggt cgaggctgaa 1020
ggccctcctt acggcaatgg tggccgtggc tgggctaacg aacacataga ttcattcatt 1080
aatgcagcag ggacgcttct gggcgctcca aaggcagttc cagctcCaat tagtggtgaa 1140
acgaaagata ccactcaatt aaatacgtta gccacgtacg gtttggaaaa gttcctctca 1200
agaattgaga gagtaaaaat gctacaaacg tggggtggta taccatcaat gccaccaaag 1260
ggagaagagg tcatttgggg ggataCgaag tcatcttcag aggatgcatt gaataacaac 1320
actgacacat acggcaattt cattcgattt gaaaggaata cgagcgatgc tttcaacaaa 1380
aatttgacaa tgaaagacgc cattaacatg acattatcga tatcacctga atggctccaa 1440
agaagagcac atgagcagta ctcgttcggc tattccaaga atgaagaaga gttaagaaaa 1500
aatgagctac accacaagca ctggtcgaat ccaacggaag taccacttcc agaagctccc 1560
cacatgaaaa tctaCCgtat atacggggtg aacaacccaa ctgaaagggc aCaCgtatat 162 0
aaggaagagg atgacccctc tgctctgaat ttgaccatcg actacgaaag caagcaacct 1680
gCattcctca ccgaggggga cggaaccgct ccgctcgtgg cgcactcaat gtgtcacaaa 174 0
tgggcccagg gtgcttcacc gtacaaccct gccggaatta acgttactat tgtggasatg ISOO
aaacaccagc cagatcgatt tgaCatacgt ggtggagcaa aaagcgccga acacgtagac 1860
aCcctcggca gcgcggagtt gaacgattac atcttgaaaa ttgcaagcgg taatggcgat 1920
ctcgtcgagc cacgccaatt gtctaatttg agccagtggg tttctcagat gcccttccca 1980
aCgtaa 1986
76
661
PRT
Saccharomyces cerevisiae
76
Met Gly Thr Leu Phe Arg Arg Asn Val Gin Asn Gin Lys Ser Asp Ser
IS 10 15
Asp Glu Asn Asn Lys Gly Gly Ser Val His Asn Lys Arg Glu Ser Arg
20 25 30
Asn His He His His Gin Gin Gly Leu Gly His Lys Arg Arg Arg Gly
35 40 45
He Ser Gly Ser Ala Lys Arg Asn Glu Arg Gly Lys Asp Phe Asp Arg
SO 55 60
Lys Arg Asp Gly Asn Gly Arg Lys Arg Trp Arg Asp Ser Arg Arg Leu
65 70 75 80
He Phe He Leu Gly Ala Phe Leu Gly Val Leu Leu Pro Phe Ser Phe
85 90 95

Gly Ala Tyr His Val His Asn Ser Asp Ser Asp Leu Phe Asp Asn Phe
100 105 110
Val Asn Phe Asp Ser Leu Lys Val Tyr Leu Asp Asp Trp Lys Asp Val
115 120 125
Leu Pro Gin Gly He Ser Ser Phe He Asp Asp He Gin Ala Gly Asn
130 135 140
Tyr Ser Thr Ser Ser Leu Asp Asp Leu Ser Glu Asn Phe Ala Val Gly
145 150 155 160
Lys Gin Leu Leu Arg Asp Tyr Asn He "Glu Ala Lys His Pro Val Val
165 170 175
Met Val Pro Gly Val He Ser Thr Gly He Glu Ser Trp Gly Val He
180 IBS 190
Gly Asp Asp Glu Cys Asp Ser Ser Ala His Phe Arg Lys Arg Leu Trp
195 200 205
Gly Ser Phe Tyr Met Leu Arg Thr Met Val Met Asp Lys Val Cys Trp
210 215 220
Leu Lys His Val Met Leu Asp Pro Glu Thr Gly Leu Asp Pro Pro Asn
225 230 235 240
Phe Thr Leu Arg Ala Ala Gin Gly Phe Glu Ser Thr Asp Tyr Phe He
245 250 255
Ala Gly Tyr Trp He Trp Asn Lys Val Phe Gin Asn Leu Gly Val He
260 265 270
Gly Tyr Glu Pro Asn Lys Met Thr Ser Ala Ala Tyr Asp Trp Arg Leu
275 280 285
Ala Tyr Leu Asp Leu Glu Arg Arg Asp Arg Tyr Phe Thr Lys Leu Lys
290 295 300
Glu Gin He Glu Leu Phe His Gin Leu Ser Gly Glu Lys Val Cys Leu
305 310 315 320
He Gly His Ser Met Gly Ser Gin He He Phe Tyr Phe Met Lys Trp
325 330 335
Val Glu Ala Glu Gly Pro Leu Tyr Gly Asn Gly Gly Arg Gly Trp Val
340 345 350
Asn Glu His He Asp Ser Phe He Asn Ala Ala Gly Thr Leu Leu Gly
355 3S0 365
Ala Pro Lys Ala Val Pro Ala Leu He Ser Gly Glu Met Lys Asp Thr
370 375 380
He Gin Leu Asn Thr Leu Ala Met Tyr Gly Leu Glu Lys Phe Phe Ser
385 390 395 400

Arg lie Glu Arg Val Lys Mec Leu Gin Thr Trp Gly Gly lie Pro Ser
40S 410 415
Mec Leu Pro Lys Gly Glu Glu Val lie Trp Gly Asp Met Lys Ser Ser
420 425 430
Ser Glu Asp Ala Leu Asn Asn Asn Thr Asp Thr Tyr Gly Asn Phe He
435 440 445
Arg Phe Glu Arg Asn Thr Ser Asp Ala Phe Asn Lys Asn Leu Thr Met
450 455 460
Lys Asp Ala He Asn MeC Thr Leu Ser He Ser Pro Glu Trp Leu Gin
465 470 475 480
Arg Arg Val His Glu Gin Tyr Ser Phe Gly Tyr Ser Lys Asn Glu Glu
485 490 495
Glu Leu Arg Lys Asn Glu Leu His His Lys His Trp Ser Asn Pro Met
500 505 510
Glu Val Pro Leu Pro Glu Ala Pro His Met Lys He Tyr Cys He Tyr
515 520 525
Gly Val Asn Asn Pro Thr Glu Arg Ala Tyr Val Tyr Lys Glu Glu Asp
530 535 540
Asp Ser Ser Ala Leu Asn Leu Thr He Asp Tyr Glu Ser Lys Gin Pro
545 550 S5S 560
Val Phe Leu Thr Glu Gly Asp Gly Thr Val Pro Leu Val Ala His Ser
565 570 575
Met Cys His Lys Trp Ala Gin Gly Ala Ser Pro Tyr Asn Pro Ala Gly
580 585 590
He Asn Val Thr He Val Glu Met Lys His Gin Pro Asp Arg Phe Asp
595 600 605
lie Arg Gly Gly Ala Lys Ser Ala Glu His Val Asp He Leu Gly Ser
610 615 620
Ala Glu Leu Asn Asp Tyr He Leu Lys He Ala Ser Gly Asn Gly Asp
625 630 635 640
Leu Val Glu Pro Arg Gin Leu Ser Asn Leu Ser Gin Trp Val Ser Gin
645 S50 655
Met Pro Phe Pro Met 660

77 35 <:212> DNA
Arrificial Sequence

Description of Artificial Sequence: Synthetic oligonucleotide primer
77
ggacccgcgg ccgcacaatg ccccttattc atcgg
73
e211> 35
DNA
Artificial Sequence

Description of Artificial Sequence: Synthetic oligonucleotide primer
78
ggatcccctg caggtcacag cttcaggtca atacg
79
37
DNA
Artificial Sequence

Description of Artificial Sequence: Synthetic oligonucleotide primer
79
ggatccgcgg ccgcacaatg ggcacactcc ttcgaag
ao
c211> 39
DNA
e213> Artificial Sequence

Description of Artificial Sequence: Synthetic oligonucleotide primer
80
ggatcccctg caggttacac tgggcacact gtttcgaag


WE CLAIM:
1. A recombinant nucleic acid construct, for altering the sterol content in host cells and plants, having a coding sequence and a heterologous regulatory sequence Operably linked together, wherein said coding sequence comprises a polynucleotide having sequence selected from the group consisting of SEQ ID NO: 2, 4, 6, 8, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 73, 75 and degenerate variants thereof and encodes a plant sterol: cholesterol acyltransferase-like polypeptide or fragment thereof, said sterol: cholesterol acyltransferase-like polypeptide is a lecithin: cholesterol acyltransferase-like polypeptide or acyl CoA: cholesterol acyltransferase-like polypeptide and said regulatory sequence is a sequence functional in plants.
2. The recombinant nucleic acid construct as claimed in claim 1, wherein said plant lecithin: cholesterol acyltransferase-like polypeptide is irom a plant selected from the group consisting of Arabidopsis, soybean and com.
3. The recombinant nucleic acid construct as claimed in claim 1, wherein the polynucleotide is selected from the group consisting of:

a) a polynucleotide encoding a polypeptide of SEQ ID NO 3, 5, 7, 9, 74 or 76 or SEQ ID NO 3, 5, 7, 9, 74 or 76 with at least one conservative amino acid substitution;
b) SEQ ID NO: 2, 4, 6, 8, 73 or 75;

c) a polynucleotide that has at least 70% sequence identity to SEQ ID NO: 2, 4, 6, 8, 73 or 75;
d) a polynucleotide that has at least 80% sequence identity to SEQ ID NO:2,4, 6, 8, 73 or 75;
e) a polynucleotide that has at least 90% sequence identity to SEQ ID NO: 2, 4, 6, 8, 73 or 75;
f) a polynucleotide that has at least 95%) sequence identity to SEQ ID NO:2,4, 6, 8, 73 or 75;

g) a polynucleotide of at least 10 nucleic acids that hybridizes under
stringent conditions to SEQ ID NO: 2, 4, 6, S, 73 or 75;
h) a polynucleotide complementary to a polynucleotide of (a), (b), (c),
(d),(e),(f)or(g);and
i) a polynucleotide that hybridizes under stringent conditions to SEQ
ID NO: 2, 4, 6, 8, 73 or 75 and encodes a plant lecithin; cholesterol acyltransferase-
like polypeptide.
4. The recombinant nucleic acid construct as claimed in claim 1, wherein the polynucleotide is of the formula 5"X-(Ri)n-CR2)-{R3)n-Y3", where X is hydrogen, Y is hydrogen or a metal, R1 and R3 are any nucleic acid, n is an integer between 0-3000, and R2 is selected from the group consisting of:
a) a polynucleotide encoding a polypeptide of SEQ ID NO: 3, 5, 7, 9, 74 or 76 or SEQ ID NO: 3, 5, 7, 9, 74 or 76 with at least one conservative amino acid substitution;
b) SEQ ID NO: 2, 4, 6, 8, 73 or 75;

c) a polynucleotide that has atleast 70% sequence identity to SEQ ID NO: 2, 4, 6, 8, 73 or 75;
d) a polynucleotide that has at least 80% sequence identity to SEQ ID N0:2, 4, 6, 8, 73 or 75;
e) a poiynudeotide that has at least 90% sequence Identity to SEQ ID NO;2,4,6, 8, 73 or 75;
f) a polynucleotide that has at least 95% sequence identity to SEQ ID NO: 2, 4, 6, 8, 73 or 75;
g) a polynucleotide of at least 10 nucleic acids that hybridizes under stringent conditions to SEQ ID NO: 2, 4, 6, 8, 73 or 75;
h) a polynucleotide complementary to a polynucleotide of (a), (b), (c),
(d),(e),(f)or{g);and
i) a polynucleotide that hybridizes under stringent conditions to SEQ
ID NO; 2, 4, 6, 8, 73 or 75 and encodes a plant lecithin; cholesterol acyltransferase-
like polypeptide.

5. The recombinant nucleic acid construct as claimed in claim 1, wherein the
polynucleotide is selected from the group consisting of:
a) SEQ ID NO: 42 or a degenerate variant thereof;
b) a polynucleotide having at least 70% sequence identity with SEQ ID NO: 42;
c) a polynucleotide having at least 80% sequence identity with SEQ ID NO: 42;
d) a polynucleotide having at least 90% sequence identity with SEQ ID NO: 42;
e) a polynucleotide having at least 95% sequence identity with SEQ ID NO: 42;
f) a polynucleotide of at least 10 nucleic acids that hybridizes under stringent conditions to SEQ ID NO: 42;
g) a polynucleotide complementary to a polynucleotide of (a), (b), (c), (d), (e), or(f); and
h) a polynucleotide that hybridizes under stringent conditions to SEQ ID NO: 42 and encodes an acyl CoA: cholesterol acyltransferase-like polypeptide.
6. The recombinant nucleic acid construct as claimed in claim I, wherein the
polynucleotide is of the formula 5"X-(R,V(R2)-(R3)n-Y3", where X is hydrogen, Y is
hydrogen or a metal R1 and R3 are any nucleic acid, n is an integer between 0 and
3000, and R2 is selected from the group consisting of:
a) SEQ ID NO: 42 or degenerate variants thereof;
b) a polynucleotide having at least 70% sequence identity to SEQ ID NO: 42;
c) a polynucleotide having at least 80% sequence identity to SEQ ID NO: 42;
d) a polynucleotide having at least 90% sequence identity to SEQ ID NO: 42;

e) a polynucleotide having at least 95% sequence identity to SEQ ID NO: 42;
f) a polynucleotide of at least 10 nucleic acids that hybridizes under stringent conditions to SEQ ID NO: 42;
g) a polynuc]eotide complementary to a polynucleotide of (a), (b), (c), (d), (e), or(f); and
h) a polynucleotide that hybridizes under stringent conditions to SEQ ID NO; 42 and encodes an acyl CoA: cholesterol acyltransferase-like polypeptide.
7. The recombinant nucleic acid construct as claimed in claim 1, wherein said lecithin: cholesterol acyltransferase-like polypeptide is a plant lecithin: cholesterol acyltransferase-like polypeptide.
8. The recombinant nucleic acid construct as claimed in claim 1, wherein said acyl CoA: cholesterol acyltransferase-like polypeptide is a plant acyl CoA: cholesterol acyltransferase-like polypeptide.

9. The recombinant nucleic acid construct as claimed in claim 1, comprising a termination sequence as herein described.
10. The recombinant nucleic acid construct as claimed in claim I, wherein said regulatory sequence comprises a constitutive promoter.
11. The recombinant nucleic acid construct as claimed in claim 1, wherein said regulatory sequence comprises an inducible promoter.
12. The recombinant nucleic acid construct as claimed in claim 1, wherein said regulatory sequence is selected from the group consisting of a tissue specific promoter, a developmentally regulated promoter, an organelle specific promoter, and a seed specific promoter.

13. An isolated polypeptide obtained by culturing a host cell selected from yeast, bacteria, bacteriophage or viruses containing the recombinant nucleic acid as claimed in claim 1, comprising an amino acid sequence selected from the group consisting of SEQ ID NO: 3, SEQ ID NO: 3 with at least one conservative amino acid substitution, SEQ ID NO: 5, SEQ ID NO: 5 with at least one conservative amino acid substitution, SEQ ID NO: 7, SEQ ID NO: 7 with at least one conservative amino acid substitution, SEQ ID NO: 9, SEQ ID NO; 9 with at least one conservative amino acid substitution, SEQ ID NO: 74, SEQ ID NO: 74 with at least one conservative amino acid substitution, SEQ ID NO: 76 and SEQ ID NO: 76 with at least one conservative amino acid substitution.
14. An isolated polypeptide as claimed in claim 13, which is immunogenic and contains at least 10 consecutive amino acids.
15. A method of producing a polypeptide as claimed in claim 13, comprising the steps of culturing the host cell under conditions permitting expression of said sterol: cholesterol acyl transferase- like polypeptide and Isolating the polypeptide in a known manner.

Documents:

in-pct-2002-0433-che abstract.pdf

in-pct-2002-0433-che claims-duplicate.pdf

in-pct-2002-0433-che claims.pdf

in-pct-2002-0433-che correspondence-others.pdf

in-pct-2002-0433-che correspondence-po.pdf

in-pct-2002-0433-che description(complete)-duplicate.pdf

in-pct-2002-0433-che description(complete).pdf

in-pct-2002-0433-che drawings-duplicate.pdf

in-pct-2002-0433-che drawings.pdf

in-pct-2002-0433-che form-1.pdf

in-pct-2002-0433-che form-19.pdf

in-pct-2002-0433-che form-26.pdf

in-pct-2002-0433-che form-3.pdf

in-pct-2002-0433-che form-4.pdf

in-pct-2002-0433-che form-5.pdf

in-pct-2002-0433-che others.pdf

in-pct-2002-0433-che petition.pdf


Patent Number 216458
Indian Patent Application Number IN/PCT/2002/433/CHE
PG Journal Number 13/2008
Publication Date 31-Mar-2008
Grant Date 13-Mar-2008
Date of Filing 21-Mar-2002
Name of Patentee MONSANTO TECHNOLOGY LLC
Applicant Address 800 North N Lindbergh Boulevard St, Louis, MO 63167,
Inventors:
# Inventor's Name Inventor's Address
1 LASSNER, Michael 515 Galveston Drive, Redwood City, CALIFORNIA 94063,
2 VAN EENENNAAM, Alison 856 Burr Street, Davis, CA 95616,
PCT International Classification Number C12N 15/00
PCT International Application Number PCT/US00/23863
PCT International Filing date 2000-08-30
PCT Conventions:
# PCT Application Number Date of Convention Priority Country
1 60/152,493 1999-08-30 U.S.A.