Title of Invention

COMPOSITIONS FOR DEGRADING CELLULOSIC MATERIAL

Abstract The present invention relates to cellulolytic compositions for degrading or converting cellulose-containing material and methods of producing and using the compositions.
Full Text COMPOSITIONS FOR DEORAOINQ CELLULOSIC MATERIAL
Statement as to Rights to Inventions Made Under Federally Sponsored Research and Development
This invention was made with Government support under NREL Subcontract No. ZCO-30017-02, Prime Contract DE-AC36-98GO10337 awarded by the Department of Energy. The government has certain rights in this Invention.
Reference to a Sequence Listing
This application contains a Sequence Listing in computer readable form. The computer readable form is incorporated herein by reference.
Reference to a Deposit of Biological Material
This application contains a reference to deposits of biological material, which deposits are incorporated herein by reference.
Baci(ground of the invention
Field of the Invention
The present invention relates to celiuloiytic protein compositions for degrading or converting cellulose-containing material and methods of producing and using the compositions.
Description of the Related Art
Cellulose is a polymer of the simple sugar glucose covalently bonded by beta-1,4-linkages. Many microorganisms produce enzymes that hydrolyze beta-linked glucans. These enzymes include endoglucanases, cellobiohydroiases, and beta-glucosidases. Endoglucanases digest the cellulose polymer at random locations, opening it to attack by cellobiohydroiases. Cellobiohydroiases sequentially release molecules of cellobtose from the ends of the cellulose polymer. Cellobiose is a water-soluble beta-1,4-linked dimer of glucose. Beta^iucosidases hydrolyze cellobiose to glucose.
The conversion of cellulosic feedstocks into ethanol has the advantages of the ready availability of large amounts of feedstock, the desirability of avoiding burning or land filling the materials, and the cleanliness of the ethanol fuel. Wood, agricultural residues, herbaceous crops, and municipal solid wastes have been considered as

feedstocks for ethanol production. These materials primarily consist of cellulose, hemicellulose, and lignin. Once the cellulose is converted to glucose, the glucose Is easily fermented by yeast into ethanol.
WO 2005/074647 discloses Isolated polypeptides having cellulolytic enhancing activity and polynucleotides thereof from Thielavia terresMs. WO 2005/074656 discloses an Isolated polypeptide having cellulolytic enhancing activity and a polynucleotide thereof from Thamnoascus aurantlacus. U.S. Published Applicatioh Serial No. 2007/0077630 discloses an isolated polypeptide having cellulolytic enhancing activity and a polynucleotide thereof from rrK^ftoderma/Base/.
It would be an advantage in the art to improve the ability of cellulolytic protein compositions to degrade or convert cellulosic material.
The present invention relates to cellulolytic protein compositions improved In their ability to degrade or convert cellulosic material.
Summary of the Invention
The present invention relates to filamentous fungal host cells, comprising: (a) a first polynucleotide encoding a native or heterologous polypeptide having cellulolytic enhancing activity; (b) a second polynucleotide encoding a native or heterologous beta-glucosidase; and (c) one or more (several) third polynucleotides encoding native or heterologous cellulolytic enzymes selected from the group consisting of a Trtohodorma reesBl cellobiohydrolase I (CEL7A), a Trichodema reesei celloblohydrolase II (CEL6A), and a Trichoderma naese/endoglucanase I (CEL7B), and orthologs or variants thereof.
The present invention also relates to methods of producing a cellulolytic protein composition, comprising: (a) cultivating such filamentous fungal host cells under conditions conducive for production of the cellulolytic protein composition; and (b) recovering the cellulolytic protein composition. The present invention also relates to cellulolytic protein compositions obtained by such a methods.
The present invention also relates to cellulolytic protein compositions,
comprising: (a) a polypeptide having cellulolytic enhancing activity; (b) a beta-
giucosldase; and (c) one or more (several) cellulolytic enzymes selected from the group
consisting of a Trichoderma reesei cellobiohydrolase I (CEL7A), a Trichoderma reesei
cellobiohydrolase II (CEL6A), and a Trichoderma reesei endoglucanase I (CEL7B), and
orthologs or variants thereof. "
The present invention also relates to methods for degrading or converting a cellulosic material, comprising: treating the cellulosic material with an effective amount of such a cellulolytic protein composition.

The present invention further relates to methods for producing a fermentation product, comprising: (a) saccharifying a cellulosic material with an effective amount of such a ceiiulolytic protein composition; (b) fermenting the saccharified cellulosic material of step (a) with one or more (several) fermenting microorganisms to produce the fermentation product; and (c) recovering the fermentation product from the femnentation.
Brief Description of the Ftgurot
Figure 1 shows a restriction map of pMJ04.
Figure 2 shows a restriction map of pCaHJ527.
Figure 3 shows a restriction map of pMT2188.
Figure 4 shows a restriction map of pCaHj568.
Figure 5 shows a restriction map of pMJOS.
Figure 6 shows a restriction map of pSMail 30.
I^igure 7 shows the ON A sequence and aniino acid sequence of an Aspergillus oryzae beta-glucosldase native signal sequence (SEQ ID NOs: 91 and 92).
Figure 8 shows the DNA sequence and amino acid sequence of a Humlcola Insolens endoglucanase V signal sequence (SEQ ID NOs: 95 and 96).
Figure 9 shows a restriction map of pSMaH 35.
Figure 10 shows a restriction map of pSMai140.
Figure 11 shows a restriction map of pSaMe-F1.
Figure 12 shows a restriction map of pSaMe-FX.
Figure 13 shows a restriction map of pAILo47.
Figure 14 shows a restriction map of pSaMe-FH.
Definitions
Celiuioiytic enhancing activity: The temn "ceiiulolytic enhancing activity" is defined herein as a biological activity that enhances the hydrolysis of a cellulose^ containing material by proteins having ceiiulolytic activity. For purposes of the present invention, ceiiulolytic enhancing activity is detennined by measuring the Increase In reducing sugars or the increase of the total of cellobiose and glucose from the hydrolysis of a cellulose-containing material by ceiiulolytic protein under the following conditions: 1-50 mg of total protein containing 80-99.5% w/W ceiiulolytic proteln/g of ceilulQjse in PCS and 0.5-20% w/w protein of ceiiulolytic enhancing ;«ictivity for 1-7 day at SO'C compared to a control hydrolysis with equal total protein loading without ceiiulolytic enhancing activity (1-50 mg of ceiiulolytic protein/g of cellulose in PCS). In a preferred aspect, h

mixture of CELLUCU^ST® 1.5L (Novozymes A/S, Bagsvsrd, Denmark) In the presence of 3% Aspergillus oryzae beta-glucosidase (recombinantly produced in Aspergillus oryzae according to WO 02/095014) or 3% Aspergillus fumigatus beta-glucpsldase (recombinantly produced In Aspergillus oryzae according to Example 22 of WO 02/095014) of cellulase protein loading Is used as a standard of the cellulolytic activity.
The polypeptides having cellulolytic enhancing activity have at least 20%, preferably at least 40%, more preferably at least 50%, more preferably at least 60%, more preferably at least 70%, more preferably at least 80%, even more preferably at least 90%, most preferably at least 95%, and even most preferably at least 100% of the cellulolytic enhancing activity of the mature polypeptide of SEQ ID NO: 2,4,6, 8,10,12, or 14.
The polypeptides having cellulolytic enhancing activity enhance the hydrolysis of a cellulose'Containing material catalyzed by proteins having cellulolytic activity by reducing the amount of cellulolytic enzyme required to reach the same degree of hydrolysis preferably at least 0.1-fold, more at least 0.2-fold, more preferably at least 0.3-fold, more preferably at least 0.4-fold, more preferably at least 0.5-fold, more preferably at least 1-fold, more preferably at least 3-fold, more preferably at least 4-fold, more preferably at least 5-fold, more preferably at least 10-fold, more preferably at least 20-fold, even more preferably at least 30-fold, most preferably at least SO-fokJ, and even most preferably at least 100-fold.
Cellulolytic activity: The term "cellulolytic activity" is defined herein as cellulase activity {e.g., endoglucana8e(s), cellobiohydrolase(s), beta-gluco8ida8e(8), or combinations thereof) that hydrolyzes a cellulose-containing material. Cellulolytic protein may hydrolyze or hydrolyzes carboxymethyl cellulose (CMC), thereby decreasing the viscosity of the incubation mixture. The resulting reduction in viscosity may be determined by a vibration viscoslmeter (e.g., MIVI 3000 from Sofraser, France). Determination of cellulase activity, measured in terms of Cellulase Viscosity Unit (CEVU), que^ntifies the amount of catalytic activity present in a sample by measuring the ability of the sample to reduce the viscosity of a solution of carboxymethyl cellulose (CMC). The assay is performed at the temperature and pH suitable for the cellulolytic protein and substrate.
For purposes of the present invention, cellulolytic activity is determined by measuring the increase in hydrolysis of a cellulose-containing material by a cellulolytic composition under the following conditions: 1-50 mg of cellulolytic protein/g of cellulose in PCS for 1-7 day at SO'C compared to a control hydrolysis without addition of cellulolytic protein.
Endoglucanase: The term "endoglucanas'e" is defined herein as an endo-1,4-

(1,3;1,4)-beta-D-glucan 4-giucanohydrola8e (E.G. No. 3.2.1.4), which catalyses endohydrolysis of 1,4-beta-D-glyco8idic linkages in cellulose, cellulose derivatives (such as carboxymethyi cellulose and hydroxyethyl cellulose), iichenin,beta-1,4 bonds in mixed beta-1,3 giucans such as cereal beta-D-glucans or xyioglucans, and other piant material containing celiulosic components. For purposes of the present invention, endogiucanase activity is detemiined using carboxymethyi cellulose (CMC) hydrolysis according to the procedure of Ghose, 1987, Pure andAppl. Chem. 59:257-268.
Celloblohydroiase: The term "celloblohydroiase" is defined herein as a 1,4-beta-D-giucan celloblohydroiase (E.C. 3.2.1.91), which catalyzes the hydrolysis of 1,4-beta-D-glucosidic linkages in cellulose, ceiiooligosaccharides, or any beta-1,4-llnked glucose containing polymer, releasing ceiloblose from the reducing or non-reducing ends of the chain. For purpose| of the present inventksn, celiobtohydrolase activity is determined according to the procedures described by Lever et al., 1972, Anal. Bkxfhem. 47: 273-279 and by van Tiibeurgh e^ al., 1982, FE8S Letters 149: 152-156; van Tiibeurgh and Ciaeyssens, 1985, FEBS Letters 187:283-288. In the present Inventton, the Lever et al. method was employed to assess hydrolysis of cellutose in com stover, while the method of van Tiibeurgh et al. was used to determine the cellobiohycirolase activity oh a fluorescent disaccharide derivative.
Beta-glucotldase: The term "beta-glucosidase" is defined herein as a beta-D-glucoside giucohydrolase (E.C. 3.2.1.21), which catalyzes the hydrolysis of terminal non-reducing beta-P-glucose residues with the release of beta-D-glucose. For purposes of the present invention, beta-glucosidase activity is determined according to the baste procedure described by Venturi et al., 2002, J. Bash Microbiol. 42: 55-66, except different conditions were employed as described herein. One unit of beta-glucosidase activity is defined as 1.0 pmole of ^nitrophenol produced per minute atSO'C, pH 5 from 4 mM p-nitrophenyi-beta-D-glucopyranoside as substrate in 100 mM sodium citrate, 0.01%TWEEN Family 1, Family 3, Family 6, Family 6, Family 7, Family 9, Family 12, Family 46, Family 61, or Family 74 glycoside hydrolase: The temr) "Family 1, Family 3, Family 5, Family 6, Family 7, Family 9, Family 12, Family 45, Family 61, or Family 74 glycoside hydrolase" or "Family GH1, Family GH3, Family GH5, Family GH6, Family GH7, Family GH9, Family GH12, Family GH45, Family GH61, or Family GH74" Is defined herein as a polypeptide falling into the glycoskJe hydrolase Family 1, Family 3, Family 5, Family 6, Family 7, Family 9, Family 12, Family 45, Family 61, or Family 74, respectively, according to IHenrissat B., 1991, A classiftoatbn of glycosyi hydrolases based on amino-acid sequence similarities, Blochem. J. 280:309-316, and Henrissat B., and Bairoch A,, 1996, Updating the sequence-based classification of glycosyi
6

hydrolases, Biochem. J. 316: 695-696. Presently, Henrissat lists the GH61 Family as unclassified indicating that properties such as mechanism, catalytic nucleophile/base, catalytic proton donors, and 3-D structure are not known for polypeptides belonging to this family.
Cellulose-containing material: The predominant polysaccharide in the primary cell wall of biomass is cellulose, the second most abundant Is hemi-cellulose, and the third is pectin. The secondary cell wall, produced after the cell has stopped growing, also contains polysaccharides and is strengthened by polymeric lignin covalently cross-linl The cellulose-containing material can be any material containing cellulose. Cellulose is generally found, for example, in the stems, leaves, hulls, husks, and cobs of plants or leaves, branches, and wood of trees. The cellulose-containing material can be, but is not limited to, herbaceous material, agricultural resklues, forestry residues, municipal solid wastes, waste paper, and pulp and paper mill residues. The celiulo8»-contalning material can be any type of biomass including, but not limited to, wood resources, municipal solid waste, wastepaper, crops, and crop residues (see, for example, Wiselogel etal., 1995, in Handbook on Bloethanol (Charles E. Wyman, editor), pp.105-118, Taytor & Francis, Washington D.C.; Wyman, 1994, Bioresoume Technology 50: 3-16; Lynd, 1990, Applied Biochemistry and Bbtechnology 24/25: 695-719; Mosier et ai, 1999, Recent Progress in Bioconversion of Lignocellulosics, \r\ Advances In Biochemical Engineering/Biotechnology, T. Scheper, managing editor, Volume 65, pp.23-40, Springer-Verlag, New York). It is understood herein that the cellulose-containing material is preferably in the form of llgnocellulose, e.g., a plant cell wall material containing lignin, cellutose, and hemicellutose In a mixed matrix.
In a prefen-ed aspect, the cellulose-containing material is corn stover. In another preferred aspect, the cellulose-containing material is com fiber. In another prefenvd aspect, the cellulose-containing material Is corn cobs. In another preferred aspect, the cellulose-containing material is switch grass. In another preferred aspect, the cellutose-containing material Is rice straw. In another preferred aspect, the cellulose-containing material is paper and pulp processing waste. In another preferred aspect, the cellulose-containing material is woody or herbaceous plants. In another prefen'ed aspect, the

ceiiulose-containing material is bagasse.
The cellulose-containing material may be used as is or may be subjected to pretreatment, using conventional methods known in the art. For example, physical pretreatment techniques can include various types of milling, irradiation, steaming/steam explosion, and hydrothermolysis; chemical pretreatment techniques can include dilute acid, alkaline, organic solvent, ammonia, sulfur dtoxide, carbon dioxide, and pH-controiled hydrothermolysis; and biological pretreatment technkiues can involve applying lignin-solubliizing microorganisms (see, for example, l-lsu, t.-A., 1696, Pretreatment of biomass, m Handbook on Bloethand: Productton and Utilization, Wyman, C. E., ed., Taylor & Francis, Washington, DC, 179-212; Ghosh, P., and Singh, A., 1993, Physicochemical and biological treatments for enzymatto/microbiai conversion of lignocellulosic biomass. Adv. Appl. Microbiol. 39: 295-333; McMillan, J. D., 1994, Pretreating lignocellulosic biomass: a review, in Enzymatic Conversion of Btomass for Fuels Production, Himmel, M. E., Baker, J. O., and Overend, R. P., eds., ACS Symposium Series 566, American Chemical Society. Washington, DC, chapter 15; Gong, C. S., Cao, N. J., Du, J., and Tsao, G. T., 1999, Ethanol production from renewable resources, in Advances In Biochemical Englneering/Btotechnology, Scheper, T., ed., Springer-Verlag Berlin Heidelberg, Germany, 65: 207-241; Olsson, L, and Hahn-Hagerdal, B., 1996, Fennentation of lignocellulosic hydrolysates for ethanol production, Enz. MIcrob. Tech. 18: 312-331; and Vaitander, L., and Eriksson', K.-E. L., 1990, Production of ethanol from lignocellulosic'materials: State of the art. Adv. Blochem. Eng./Blotechnol. 42:63-95).
Pre-treated com stover: The term "PCS" or "Pre-treated Corn Stover" is defined herein as a cellulose-containing material derived from corn stover by treatment with heat and dilute acid. For purposes of the present invention, PCS is made by the method described herein.
Full-length polypeptide: The term "full-length polypeptide* is defined herein as a precursor form of a polypeptide having biological activity, wherein the precursor contains a signal peptide region and alternatively also a propeptide region, wherein upon secretion from a cell, the signal peptide is cleaved and alternatively also the propeptide is cleaved yielding a polypeptide with biologteal activity.
Signal peptide: The term "signal peptide" is defined herein as a peptide linked in frame to the amino terminus of a polypeptide and directs the encoded polypeptkle into a cell's secretory pathway.
Signal peptide coding sequence: The term "signal peptide coding sequence" is defined herein as a peptide coding region that codes for an amino acid sequence

linked in frame to the amino terminus of a polypeptide and directs the encoded polypeptide into a cell's secretory pathway.
Propeptide: The term "propeptide" is defined herein as a peptide linked in frame to the amino terminus of a polypeptide. The resultant polypeptide is known as a proenzyme or propolypeptide (or a zymogen in some cases). A propolypeptide is generally inactive and can be converted to a mature active polypeptkle by catalytk; or autocatalytic cleavage of the propeptide from the propolypeptide. Where both signal peptide and propeptide regions are present at the amino terminus of a polypeptide, the propeptide region is linked in frame to the amino temninus of a polypet}tkle and the signal peptide region is linked in frame to the amino terminus of the propeptide region.
Propeptide coding sequence: The term "propeptide coding sequence" is defined herein as a peptide coding region that codes for an amino acid sequence linked in frame to the amino terminus of a polypeptide forming a proenzyme or propolypeptkie (or a zymogen in some cases).
Catalytic domain: The term "catalytic domain" is defined herein as a structural portion or region of the amino acid sequence of a beta-glucosidase or an endoglucanase that possesses the catalytic activity of the beta-glucosidase or the endoglucanase.
Beta-glucosidase fusion protein: The term "beta-glucosklase fusion protein" is defined herein as a polypeptide that exhibits beta-glucosidase activity and comprises at least both a beta-glucosidase catalytic domain and an endoglucanase catalytk} domain.
Components of a beta-giucosidase fusion protein: The term "component of a beta-glucosidase fusion protein" is defined herein as indivklual (cleaved) fragments of the beta-glucosidase fusion protein, wherein each fragment has beta-glucosidase activity and includes either the endoglucanase and the beta-glucosidase catalytic domain or the beta-glucosidase catalytic of the fusion protein. For example, the presence of a cleavage site, e.g., Kex2 site, between the endoglucacase and beta-glucosidase components of the fusion protein can result in the production of a polypeptide having endoglucanase activity and another polypeptide having tieta-glucosidase activity.
Cellulose binding domain: The term "cellulose binding domain (CBD)" is defined herein as a portion of the amino acid sequence of an endoglucanase (cellulase) that is involved in the cellulose binding activity of the endoglucanase. Cellulose binding domains generally function by non-covalently binding the endoglucanase to cellutose, a cellulose derivative, or a polysaccharide equivalent thereof CBDs typically functton independent of the catalytic domain.

Beta-gtucosidase fusion construct: The term "beta-glucosidase fusion construct" refers to a nucleic acid construct that is composed of different genes or protions thereof in operable linitage. The components include from the 5' end a DNA molecule comprising at least an endogiucanase catalytic domain and a DNA molecule comprising at least a beta-glucosidase catalytic domain.
Isolated polypeptide: The term "isolated polypeptide' as used herein refers to a polypeptide that is isolated from a source. In a preferred aspect, the polypeptide is at least 1% pure, preferably at least 5% pure, more preferably at least 10% pure, more preferably at least 20% pure, more preferably at least 40% pure, more preferably at least 60% pure, even more preferably at least 80% pure, and most preferably at llMi'Qt 90% pure, as detennined by SDS-PAGE.
Substantially pure polypeptide: The temi "substantially pure polypeptide" denotes herein a polypeptide preparation that contains at most 10%, preferably at most 8%, more preferably at most 6%, more preferably at most 5%, more preferably at most 4%, more preferably at most 3%, even more preferably at most 2%, most preferably at most 1%, and even most preferably at most 0.5% by weight of other polypeptide material with which it is natively or recombinantiy associated. It is, therefore, preferred that the substantially pure polypeptide is at least 92% pure, preferably at least 94% pure, more preferably at least 95% pure, more preferably at least 96% pure, more preferably at least 96% pure, more preferably at least 97% pure, more preferably at least 98% pure, even more preferably at least 99%, most preferably at least 99.5% pure, and even most preferably 100% pure by weight of the total polypeptide material present in the preparation. The polypeptides bf the present invention are preferably in a substantially pure form, I.e., that the polypeptide pfeparatton is essentially free of other polypeptide material with which it is natively or recombinantiy associated. This can be accomplished, for example, by preparing the polypeptide by well-known recombinant methods or by classical purification methods.
Mature polypeptide: The term "mature polypeptide" Is defined herein as a polypeptide having biological activity, e.g., enzyme activity, which is in its final fomi following translation and any post-translational modifications, such as N-terrninal processing, C-terminal truncation, glycosylation, phosphorylation, etc.
Mature polypeptide coding sequence: The temi "mature polypeptide coding sequence" is defined herein as a nucleotide sequence that encodes a mature polypeptide having biological activity.
Identity: The relatedness between two amino acid sequences or between two nucleotide sequences is described by the parameter "identity".
10

For purposes of the present invention, the degree of identity between two amino acid sequences Is determined using the Needleman-Wunsch algorithm (Needleman and Wunsch, 1970, J. Mol. Biol. 48: 443-453) as implemented in the Needle program of the EMBOSS package (EMBOSS: The European Molecular Biology Open Software Suite, Rice et al.. 2000, Trends in Genetics 16: 276-277), preferably version 3.0.0 o|- latbr. The optional parameters used are gap open penalty of 10, gap extension penalty of 0.5, and the EBLOSUM62 (EMBOSS version of BLOSUM62) substitution matrix. The output of Needle labeled "longest Identity" (obtained using the -nobrief option) Is used as the percent identity and is calculated as follows: (Identical Residues x 100)/(Length of Alignment > Total Number of Gaps in Alignment)
For purposes of the present invention, the degree of identity between.two
■•'■■■■'., ■ 'f ',.-
deoxyribonucleotide sequences is determined using the Needleman-Wuhsch algorithm'
(Needleman and Wunsch, 1970, supra) as implemented in the Needle program of the
EMBOSS package (EMBOSS: The European Molecular BiokHjy Open Software Suite,
Rice et al., 2000, supra), preferably version 3.0.0 or later. The optional parameters used
are gap open penalty of 10, gap extension penalty of 0.5, and the EONAFULL
(EMBOSS version of NCBI NUC4.4) substitution matrix. The output of Needle labeled
"longest identity" (obtained using the -nobrief option) is used as the percent identity and
is calculated as follows:
(Identical Deoxyribonucleotides x 100)/(Length of Alignment - Total Number of Gaps In
Alignment)
Homologous sequence: The term "homologous sequence" Is defined herein as sequences with an E value (or expectancy score) of less than 0.001 using the blastp (for protein databases) or tbiastn (for nucleic acid databases) algorithms with the BLOSUM62 matrix, wordsize 3, gap existence cost 11, gap extension cost 1, no.k>W complexity filtration, and a mature protein sequence as query. See Altschul etal., 1997, A/uc/e/c >\c/c/s/?e«. 25: 3389-3402.
Polypeptide Fragment: The term "polypeptide fragment" is defined herein as a polypeptide having one or more (several) amino ackJs deleted from the amino and/or carboxyl terminus of the mature polypeptide or a homologous sequence thereof, wherein the fragment has activity as the mature polypeptide thereof.
Subsequence: The term "subsequence" is defined herein asa nuclelitld^ sequence having one or more (several) nucleotides deleted from the 5' and/or 3' end of the mature polypeptide coding sequence or a homoksgous sequence thereof, wherein the subsequence encodes a polypeptide fragment having activity as the mature polypeptide thereof.
Allelic variant: The term "allelic variant" denotes herein any of two or more
11

alternative forms of a gene occupying the same chromosomal locus. Allelic variation arises naturally through mutation, and may result in polymorphism within populations. Gene mutations can be silent (no change in the encoded polypeptide) or may encode polypeptides having altered amino acid sequences. An allelic variant of a polypeptide is a polypeptide encoded by an allelic variant of a gene.
Isolated polynucleotide: The term "isolated polynucleotide" as used herein refers to a polynucleotide that is isolated from a source. In a preferred aspect, the polynucleotide is at least 1% pure, preferably at least 5% pure, more preferably at least 10% pure, more preferably at least 20% pure, more preferably at least 40% pure, more preferably at least 60% pure, even more preferably at least 80% pure, and most preferably at least 90% pure, as determined by agarose electrophoresis.
Substantially pure polynucleotide: The term "substantially pure polynucleotide" as used herein refers to a polynucleotide preparation free of bther extraneous or unwanted nucleotides and in a form suitable for use within genetically engineered protein production systems. Thus, a substantially pure polynucleotide contains at most 10%, preferably at most 8%, more preferably at most 6%. more preferably at most 5%, more preferably at most 4%, more preferably at most 3%, even more preferably at most 2%, most preferably at most 1%, and even most preferably at most 0.5% by weight of other polynucleotide material with which it is natively or recombinantly associated. A substantially pure polynucleotide may, however, include naturally occurring 5' and 3' untranslated regions, such as promoters and temninators. It is preferred that the substantially pure polynucleotide is at least 90% pure, preferably at least 92% pure, more preferably at least 94% pure, more preferably at least 95% pure, more preferably at least 96% pure, more preferably at least 97% pure, even more preferably at least 98% pure, most preferably at least 99%, and even most preferably at least 99.5% pure by weight. The polynucleotides of the present invention are preferably in a substantially pure form. I.e., that the polynucleotide preparation is esseiitially free of other polynucleotide material with which it is natively or recombinantly associated. The polynucleotides may be of genomic, cDNA, RNA, semisynthetic, synthetic origin, or any combinations thereof.
cDNA: The term "cDNA" is defined herein as a DNA molecule that can be prepared by reverse transcription from a mature, spliced, mRNA molecule obtained from a eul(aryotic cell. cDNA lacks intron sequences that are usually present ii? the corresponding genomic DNA. The initial, primary RNA transcript is ai'precursor to mRNA that is processed through a series of steps before appearing as mature spliced mRNA. These steps include the removal of intron sequences by a process called splicing. cDNA derived from mRNA lacks, therefore, any intron sequences.
12

Nucleic acid construct: The term "nucleic acid construct" as used herein refers to a nucleic acid molecule, either single- or double-stranded, which is isolated from a naturally occurring gene or which is modified to contain segments of nucleic acids in a manner that would not otherwise exist in nature or which Is synthetic. The term nucleic acid construct is synonymous with the term "expression cassette" when the nucleic acid construct contains the control sequences required for expression of a coding sequence of the present invention.
Control sequence: The term "control sequences" is defined herein t6 include all components necessary for the expression of a polynucleotide encoding a polypeptide of the present invention. Each control sequence may be native or foreign to the nucleotide sequence encoding the polypeptide or native or foreign to each other. Such control sequences include, but are not limited to, a leader, poiyadenylatlon sequence, propeptide sequence, promoter, signal peptide sequence, and transcription terminator. At a minimum, the control sequences include a pronfK)ter, and transcriptional and translational stop signals. The control sequences may be provided with linkers for the purpose of introducing specific restriction sites facilitating ligation of the control sequences with the coding region of the nucleotide sequence encoding a polypeptide.
Operably linked: The term "operably linked" denotes herein a conflguratton in which a control sequence is placed at an appropriate position relative to the coding sequence of the polynucleotide sequence such that the control sequence directs the expression of the coding sequence of a polypeptide.
Coding sequence: When used herein the term "coding sequence" means a nucleotide sequence, which directly specifies the amino acid sequence of its protein product. The boundaries of the coding sequence are generally determined by an open reading frame, which usually begins with the ATG start codOn or alternative start codons such as GTG and TTG and ends with a stop codon such as TAA, TAG, and TGA. The coding sequence may be a DNA, cDNA, synthetic, or recombinant nucleotide sequence.
Expression: The temn "expression" includes any step involved in the production of the polypeptide including, but not limited to, transcription, post-transcriptional modification, translation, post-translational modifk:ation, and secretion.
Expression vector: The term "expression vector^ is defined herein as a linear or circular DNA molecule that comprises a polynucleotide encoding a polypeptide of the present invention and is operably linked to additional nucleotides that provide for Its expression.
Host ceil: The term "host cell", as used herein, includes any cell type tM l£ susceptible to transformation, transfection, transduction, and the like with a nuclek; acid construct or expression vector comprising a polynucleotide.
13

Modification: The term "modification" means herein any chemical modification of a mature polypeptide or a homologous sequence thereof; as well as genetic manipulation of the DNA encoding such a polypeptide. The modification can be a substitution, a deletion and/or an insertion of one or more (several) amino acids as well as replacements of one or more (several) amino acid side chains.
Artificiai variant: When used herein, the teim "artificial variant" means a polypeptide produced by an organism expressing a modified nucleotide sequence of a mature polypeptide coding sequence or a homologous sequence thereof. The modified nucleotide sequence is obtained through human' intervention by modification of the polynucleotide sequence or a homologous sequence thereof.
Detailed Description of the Invention
The present invention relates to filamentous fungal host cells, comprising: (a) a first polynucleotide encoding a native or heterologous polypeptide having celiulolytic enhancing activity; (b) a second polynucleotide encoding a native or heterologous beta^; glucosidase; and (c) one or more (several) third polynucleotides encoding native or heterologous one or more (several) celiulolytic enzymes selected from the group consisting of a Trichoderma reesel cellobiohydrolase I (CEL7A), a Trichoderma reesei cellobiohydrolase II (CEL6A), and a Trichoderma reesei endoglucanase i (CEL7B), and orthologs or variants thereof.
The present invention also relates to methods of producing a celiulolytic protein composition, comprising: (a) cultivating such filamentous fungal host cells under conditions conducive for production of the celiulolytic protein composition; and (b) recovering the celiulolytic protein composition. The present invention also relates to celiulolytic protein compositions obtained by such a methods.
The present invention also relates to celiulolytic protein compositions, comprising: (a) a polypeptide having celiulolytic enhancing activity; {b) a beta-glucosidase; and (c) one or more (several) celiulolytic enzymes selected from the group consisting of a Trichoderma reesel cellobiohydrolase I (CEL7A), a Trichoderrna reesel cellobiohydrolase II (CEL6A), and a Trichoderma reesel endoglucanase I (CEL7B), and orthologs or variants thereof.
In a preferred aspect, the filamentous fungal host cell produces a celluiolytic protein composition comprising a polypeptide having celiulolytic enhancing activity of the mature polypeptide of SEQ ID NO: 8; a beta-glucosidase fusion protein of SEQ ID NO: 106; a Trichoderma reesei cellobiohydrolase I (CEL7A) of the mature polypeptide qf SEQ ID NO: 52, a Trichoderma reesei cellobiohydrolase II (CEL6A) of the mature
14

polypeptide of SEQ ID NO: 54, and a Trichoderma mese/endoglucanase I (CEL7B) of the mature polypeptide of SEQ ID NO: 56. In a preferred aspect, the filamentous fungal host cell is Trichoderma reesei, in particular Trichoderma reesei RutC30.
In another preferred aspect, the filamentous fungal host cell produces a cellulolytic protein composition comprising a polypeptide having cellulolytic enhancing activity of the mature polypeptide of SEQ ID NO: 8; a iMta-glucosidase fusion protein of SEQ ID NO: 106; a Trichoderma reesei cellobiohydrolase I (CEL7A) of the mature polypeptide of SEQ ID NO: 52, a Trichoderma reesei cellobiohydrolase II (CEL6A) of the mature polypeptide of SEQ ID NO: 54, and a Trichoderma reesei endoglucanase I (CEL7B) of the mature polypeptide of SEQ ID NO: 56, and further produces one or more (several)'enzymes selected from the group consisting of a Trichoderma reesei endoglucanase II (CEL5A) of the mature polypeptide of SEQ ID NO: 58, a Trichoderma reesei endoglucanase V (CEL45A) of the mature polypeptide of SEQ ID NO: 62, and a Trichoderma reesei endoglucanase III (CEL12A) of the mature polypeptide of SEQ ID NO: 60. In another preferred aspect, the filamentous fungal host cell is Trichoderma reese/, in particular 7rto/)odem7a rsesd/RutC30.
in another preferred aspect, the filamentous fungal host cell produces a cellulolytic protein composition comprising a polypeptide halving cellulolytic enhancirllg activity of the mature polypeptide of SEQ ID NO: 8; a beta-glucosidase fusion protein oY SEQ ID NO: 106; a Trichoderma reesei cellobiohydrolase I (CEL7A) of the mature polypeptide of SEQ ID NO: 52, a rrfc/ioc/ermaroese/cellobiohydrolase II (CEL6A)of the mature polypeptide of SEQ ID NO: 54, and a Trichoderma reesei endoglucanase I (CEL7B) of the mature polypeptide of SEQ ID NO: 56, and further produces a Thielavia terrestris cellobiohydrolase of the mature polypeptide of SEQ ID NO: 64. In another preferred aspect, the filamentous fungal host cell Is Trichoderma reesei, in particular T/7c/)oderma rsese/RutC30.
In another preferred aspect, the filamentous fungal host cell produces a cellulolytic protein composition comprising a polypeptide having cellulolytic enhancing activity of the mature polypeptide of SEQ ID NO: 8; a beta-glucosidase fusion protein of SEQ ID NO: 106; a Trichoderma reesei cellobiohydrolase I (GEL7A) of the.mature polypeptide of SEQ ID NO: 52, a Trichoderma raefia/cellobiohydrolase II (CEL^A) of the mature polypeptide of SEQ ID NO: 54, and a Trichoderma reesei endoglucanase I (CEL7B) of the mature polypeptide of SEQ ID NO: 56, and further produces (1) one or more (several) enzymes selected from the group consisting of a Trichoderma reesei endoglucanase II (CEL5A) of the mature polypeptide of SEQ ID NO: 58, a Trichoderma reesei endoglucanase V (CEL45A) of the mature polypeptide of SEQ ID NO: 62, and a Trichoderma reesei endoglucanase III (CEL12A) of the mature polypeptide of SEQ ID
15

NO: 60, and/or (2) further produces a Thielavia terrestris celloblohydrolase of the mature polypeptide of SEQ ID NO: 64. In another preferred aspect, the filamentous fungal host cell is Trichoderma reesei, in particular Trichoderma reesei RutC30.
In another preferred aspect, the cellulolytic protein composition comprises a polypeptide having cellulolytic enhancing activity of the mature polypeptide of SEQ ID NO: 8; a beta-glucosidase fusion protein of SEQ ID NO: 106; a Trichoderma reesei celloblohydrolase I (CEL7A) of the mature polypeptide of SEQ ID NO: 52, a Trichoderma reesei celloblohydrolase II (CEL6A) of the mature polypeptide of SEQ ID NO: 54, and a Trichoderma rsese/endogiucanase I (CEL7B) of the mature polypeptide of SEQ ID NO: 56.
In another preferred aspect, the cellulolytic protein composition cornprises a polypeptide having cellulolytic enhancing activity of, the mature polypeptide of SEQ ID NO: 8; a beta-glucosidase fusion protein of SEQ ID NO: 106; a Trichodenria reewi cellobiohydroiase I (CEL7A) of the mature polypeptide of SEQ ID NO: 52, a Trichoderma reesei celloblohydrolase II (CEL6A) of the mature polypeptide of SEQ ID NO: 54, and a Trichoderma reesei endogiucanase I (CEL7B) of the mature polypeptide of SEQ ID NO: 56 and further comprises one or more (several) enzymes selected from the group consisting of a Trichoderma reesei endogiucanase II (0EL5A) of the mature polypeptide of SEQ ID NO: 58, a Trichoderma reese/endogiucanase V (CEL45A) of the mature polypeptide of SEQ ID NO: 62, and a Trichoderma reesei endogiucanase III (CEL12A) of the mature polypeptide of SEQ ID NO: 60.
In another preferred aspect, the cellulolytic protein composition comprises a polypeptide having cellulolytic enhancing activity of the mature polypeptide of SEQ ID NO: 8; a beta-glucosidase fusion protein of SEQ ID NO: 106; a Trichoderma reesei cellobiohydroiase I (CEL7A) of the mature polypeptide of SEQ ID NO: 52, a Trichoderma reesei celloblohydrolase II (CEL6A) of the mature polypeptide of SEQ ID NO: 54, and a Trichoderma mese/endogiucanase I (CEL7B) of the mature polypeptide of SEQ ID NO: 56 and further comprises a Thielavie terrestris cellobiohydroiase of the mature polypeptide of SEQ ID NO: 64.
In another preferred aspect, the cellulolytic protein composition comprises a polypeptide having cellulolytic enhancing activity of the mature polypeptide of SEQ ID NO: 8; a beta-glucosidase fusion protein of SEQ ID NO: 106; a Trichdderrrid reesei cellobiohydroiase I (CEL7A) of the mature polypeptide of SEQ ID NO: 52, a Trichoderma reesei cellobiohydroiase II (CEL6A) of the mature polypeptide of SEQ ID NO: 54, and a Trichoderma reese/endogiucanase I (CEL7B) of the mature polypeptide of SEQ ID NO: 56 and further comprises (1) one or more (several) enzymes selected from the group consisting of a Trichoderma reesei endogiucanase II (CEL5A) of the
16

mature polypeptide of SEQ ID NO: 58, a TrichodermQ reese/ endogiucanase V (CEL45A) of the mature polypeptide of SEQ ID NO: 62, and a Trichoderma reesei endoglucanase III (CEL12A) of the mature polypeptide of SEQ ID NO: 60, and/or (2) further comprises a Thielavia termstrls cellobiohydrolase of the mature polypeptide of SEQ ID NO: 64.
Polypeptides Having Ceiiuiolytic Enliancing Activity and Polynucieotidet Thereof
Any polypeptide having ceiiuiolytic enhancing activity that is useful in processing cellulose-containing material can be used in the compositions of the present invention. A polynucleotide encoding such a polypeptide having ceiiuiolytic enhancing activity can be used to produce the compositions.
In a first aspect, isolated polypeptides having ceiiuiolytic enhancing activity, comprise the following motifs:
[ILM\/]-P-X(4,5)-G-X-Y-IILMV]-X-R-X-[EQ]-X(4)-[HNQ]andlFW]-[TF^K-IAIVl, wherein X is any amino acid, X(4,5) is any amino acid at 4 or 5 contiguous positbns, and X(4) is any amino add at 4 contiguous positions.
jf. The isolated polypeptide comprising the above-noted motifs may further
p comprise:
H-X(1,2)-G-P-X(3)-[YW]-[AILMV].
[EQ]-X-Y-X(2)-C-X-[EHQNHFILV]-X-[ILV],or
H-X(1,2)-G-P-X(3)-[YW]-IAILMV] and [EQ]-X-Y-X(2)-C-X-[EHQNh[FILV]-k-
[ILV], wherein X Is any amino acid, X(1,2) Is any amino acid at 1 position or 2 contiguous positions, X(3) Is any amino acid at 3 contiguous positions, and X(2) is any amino acid at 2 contiguous positions, in the above motifs, the accepted lUPAC single letter amino acid abbreviation is employed.
In a preferred embodiment, the isolated polypeptide having ceiiuiolytic enhancing activity further comprises H-X(1,2)-G-P-X(3)-[YW]-[AILMV]. In another preferred embodiment, the isolated polypeptide having ceiiuiolytic enhancing activity further comprises [EQ]-X-Y-X(2)-C-X-[EHQN]-[FILV]-X-[ILV]. in another preferred embodiment, the isolated polypeptide having ceiiuiolytic enhancing activity further comprises H-X(1,2)-G-P-X(3)-tYW]-[AILMVl and [EQ]-X-Y-X(2)-C-X-IEHQNHFILV1-X-[IL\/].
In a second aspect, isolated polypeptides having ceiiuiolytic enhancing activity, compriiSe the following motif:
[ILMV]-P-x(4,5)-G-x-Y-tlLMVhx.R-x-[EQl-x(3)-A-[HNQ], wherein x Is any amino acid, x(4,5) is any amino acid at 4 or S contiguous positions, and x(3) is any amino acid at 3 contiguous positions. In the above motif, the accepted
17

lUPAC single letter amino acid abbreviation is empioyed.
In a third aspect, isolated polypeptides having celluloiytic enhancing activity comprise or consist of amino acid sequences that have a degree of identity to the mature polypeptide of SEQ ID NO: 2, SEQ ID NO: 4, SEQ ID NO: 6, SEQ ID NO: 8, or SEQ ID NO: 10, SEQ ID NO: 12, or SEQ ID NO: 14 of preferably at least 60%, more preferably at least 65%, more preferably at least 70%, more preferably at least 75%, more preferably at least 80%, more preferably at least 85%, even more preferably at least 90%, most preferably at least 95%, and even most preferably at least 96%, 97%, 98%, or 99%, which have cellulolytic enhancing activity (hereinafter "homologous polypeptides"). In a preferred aspect, the homologous polypeptides comprise or consist of an amino acid sequence that differs by ten amino acids, preferably by five amino acids, more preferably by four amino acids, even more preferably by three amino acids, most preferably by two amino acids, and even most preferably by one amino acid from the mature polypeptide of SEQ ID NO: 2, SEQ ID NO: 4, SEQ ID NO: 6. SEQJD NO: 8, or SEQ ID NO: 10, SEQ ID NO: 12, or SEQ ID NO: 14.
A polypeptide having cellulolytic enhancing activity preferably comprises the amino acid sequence of SEQ ID NO: 2 or an allelic variant thereof; or a fragment thereof that has cellulolytic enhancing activity. In a preferred aspect, the polypeptide comprises the amino acid sequence of SEQ ID NO: 2. In another preferred aspect, the polypeptide comprises the mature polypeptide of SEQ ID NO: 2. In another preferred aspect, the polypeptide comprises amino acids 20 to 326 of SEQ ID NO: 2, or an allelic variant thereof; or a fragment thereof that has cellulolytic enhancing activity, In anbther preferred aspect, the polypeptide comprises amino acids 20 to 326 of SEQ ID NO: 2. In another preferred aspect, the polypeptide consists of the amino acid sequence of SEQ ID NO: 2 or an allelic variant thereof; or a fragment thereof that has cellulolytic enhancing activity. In another preferred aspect, the polypeptide consists of the amino acid sequence of SEQ ID NO: 2. In another preferred aspect, the polypeptide consists of the mature polypeptide of SEQ ID NO: 2. In another preferred aspect, the polypeptide consists of amino acids 20 to 326 of SEQ ID NO: 2 or an allelic variant thereof; or a fragment thereof that has cellulolytic enhancing activity. In another preferred aspect, the polypeptide consists of amino acids 20 to 326 of SEQ ID NO: 2.
A polypeptide having cellulolytic enhancing activity preferably comprises the amino acid sequence of SEQ ID NO: 4 or an allelic variant thereof; or a fragment thereof that has cellulolytic enhancing activity. In a preferred aspect, the polypeptide comprises the amino acid sequence of SEQ ID NO: 4. In another preferred aspect, the polypeptide comprises the mature polypeptide of SEQ ID NO: 4. In another preferred aspect, the polypeptide comprises amino acids 18 to 240 of SEQ ID NO: 4, or an allelic variant
18

thereof; or a fragment thereof that has cellulolytic enhancing activity. In another preferred aspect, the polypeptide comprises amino acids 18 to 240 of SEQ ID NO: 4. In another preferred aspect, the polypeptide consists of the amino acid sequence of SEQ ID NO: 4 or an allelic variant thereof; or a fragment thereof that has cellulolytic enhancing activity. In another preferred aspect, the polypeptide consists of the amino acid sequence of SEQ ID NO: 4. In another prefen^ed aspect, the polypeptide consists of the mature polypeptide of SEQ ID NO: 4. In another preferred aspect, the polypeptide consists of amino acids 18 to 240 of SEQ ID NO: 4 or an allelic variant thereof; or a fragment thereof that has cellulolytic enhancing activity. In another preferred aspect, the polypeptide consists of amino acids 18 to 240 of SEQ ID NO: 4.
A polypeptide having cellulolytic enhancing activity preferably comprises the amino acid sequence of SEQ ID NO: 6 or an allelic variant thereof; or a fragment thereof that has cellulolytic enhancing activity. In a preferred aspect, the polypeptide comprises the amino acid sequence of SEQ ID NO: 6. In another prefen-ed aspect, the polypeptide comprises the mature polypeptide of SEQ ID NO: 6. In another preferred aspect, the polypeptide comprises amino acids 20 to 258 of SEQ ID NO: 6, or an allelic variant thereof; or a fragment thereof that has cellulolytic enhancing activity. In another preferred aspect, the polypeptide comprises amino acids 20 to 258 of SEQ ID NO: 6. In another preferred aspect, the polypeptide consists of the amino acid sequence of SEQ ID NO: 6 or an allelic variant thereof; or a fragment thereof that has cellulolytic enhancing activity. In another prefenred aspect, the polypeptide consists of the amino acid sequence of SEQ ID NO: 6. In another preferred aspect, the polypeptide consists of the mature polypeptide of SEQ ID NO: 6. In another preferred aspect, the polypeptide consists of amino acids 20 to 258 of SEQ ID NO: 6 or an allelic variant thereof; or a fragment thereof that has cellulolytic enhancing activity. In another preferred aspect, the polypeptide consists of amino acids 20 to 258 of SEQ ID NO: 6.
A polypeptide having cellulolytic enhancing activity preferably comprises the amino acid sequence of SEQ ID NO: 8 or an allelic variant thereof; or a fragment thereof that has cellulolytic enhancing activity. In a preferred aspect, the polypeptide comprises the amino acid sequence of SEQ ID NO: 8. In another preferred aspect, the polypeptide comprises the mature polypeptide of SEQ ID NO: 8. In another preferred aspect, the polypeptide comprises amino acids 19 to 226 of SEQ ID NO: 8, or an allelic variant thereof; or a fragment thereof that has cellulolytic enhancing activity. In another preferred aspect, the polypeptide comprises amino acids 19 to 226 of SEQ ID NO: 8. In another preferred aspect, the polypeptide consists of the amino acid sequence of SEQ ID NO: 8 or an allelic variant thereof; or a fragment thereof that has cellulolytic enhancing activity. In another preferred aspect, the polypeptide consists pf the amino
19


acid sequence of SEQ ID NO: 8. in another preferred aspect, the polypeptide consists of the mature polypeptide of SEQ ID NO: 8. In another preferred aspect, the polypeptide consists of amino acids 19 to 226 of SEQ ID NO: 8 or an allelic variant thereof; or a fragment thereof that has cellulolytic enhancing activity. In another preferred aspect, the polypeptide consists of amino acids 19 to 226 of SEQ ID NO: 8.
A polypeptide having cellulolytic enhancing activity preferably comprises the amino acid sequence of SEQ ID NO: 10 or an allelic variant thereof; or a fragment thereof that has cellulolytic enhancing activity, in a prefen'ed aspect, the polypeptide comprises the amino acid sequence of SEQ ID NO: 10. In another preferred aspect, the polypeptide comprises the mature polypeptide of SEQ ID NO: 10. In another prefenred aspect, the polypeptide comprises amino acids 20 to 304 of SEQ ID NO: 10, or an allelic variant thereof; or a fragment thereof that has cellulolytic enhancing activity. In another preferred aspect, the polypeptide comprises amino acids 20 to 304 of SEQ ID NO: 10. In another preferred aspect, the polypeptide consists of the amino acid sequence of SEQ ID NO: 10 or an allelic variant thereof; or a fragment thereof that has cellulolytic enhancing activity. In another preferred aspect, the polypeptide consists of the amino acid sequence of SEQ ID NO: 10. In another prefened aspect, the polypeptide consists of the mature polypeptide of SEQ ID NO: 10.' Irt another preferred aspect, the polypeptide consists of amino acids 20 to 304 of SEQ ID NO: 10 or an allelic variant thereof; or a fragment thereof that has cellulolytic enhancing activity. In another preferred aspect, the polypeptide consists of amino acids 20 to 304 of SEQ ID NO: 10.
A polypeptide having cellulolytic enhancing activity preferably comprises the amino acid sequence of SEQ ID NO: 12 or an allelic variant thereof; or a fragment thereof that has cellulolytic enhancing activity, in a preferred aspect, the polypeptide comprises the amino acid sequence of SEQ ID NO: 12. In another preferr,i^ aspect, the polypeptide comprises the mature polypeptide of SEQ ID NO: 12. In another prefen'ed aspect, the polypeptide comprises amino acids 23 to 250 of SEQ ID NO: 12, or an allelic variant thereof; or a fragment thereof that has celiulotytic enhancing activity. In another preferred aspect, the polypeptide comprises amino acids 23 to 250 of SEQ ID NO: 12. In another preferred aspect, the polypeptide consists of the amino acid sequence of SEQ ID NO: 12 or an allelic variant thereof; or a fragment thereof that has cellulolytic enhancing activity. In another preferred aspect, the polypeptide consists of the amino acid sequence of SEQ ID NO: 12. In another prefenred aspect, the polypeptide consists of the mature polypeptide of SEQ ID NO: 12. In another prefenred aspect, the polypeptide consists of amino acids 23 to 250 of SEQ ID NO: 12 or an allelic variant thereof; or a fragment thereof that has cellulolytic enhancing activity. In another preferred aspect, the polypeptide consists of amino adds 23 to 250 of SEQ ID NO: 12.
20

A polypeptide having cellulolytic enhancing activity preferably comprises the amino acid sequence of SEQ ID NO: 14 or an allelic variant thereof; or a fragment thereof that has cellulolytic enhancing activity. In a preferred aspect, the polypeptide comprises the amino acid sequence of SEQ ID NO: 14. In another preferred aspect, the polypeptide comprises the mature polypeptide of SEQ ID NO: 14. In another preferred aspect, the polypeptide comprises amino acids 20 to 249 of SEQ ID NO: 14, or an aill^lic, variant thereof; or a fragment thereof that has cellulolytic enhancing activity. In another preferred aspect, the polypeptide comprises amino acids 20 to 249 of SEQ ID NO: 14. In another preferred aspect, the polypeptide consists of the amino acid sequence of SEQ ID NO: 14 or an allelic variant thereof; or a fragment thereof that has cellulolytic enhancing activity. In another preferred aspect, the polypeptide consists of the amino acid sequence of SEQ ID NO: 14. In another preferred aspect, the polypeptide consists of the mature polypeptide of SEQ ID NO: 14. In another preferred aspect, the polypeptide consists of amino acids 20 to 249 of SEQ ID NO: 14 or an allelic variant thereof; or a fragment thereof that has cellulolytic enhancing activity. In another preferred aspect, the polypeptide consists of amino acids 20 to 249 of SEQ ID NO: 14.
Preferably, a fragment of the mature polypeptide of SEQ ID NO: 2 contains at least 277 amino acid residues, more preferably at least 287 amino acid residues, and most preferably at least 297 amino acid residues. Preferably, a fragment of the mature polypeptide of SEQ ID NO: 4 contains at least 185 amino acid residues, more preferably at least 195 amino acid residues, and most preferably at least 205 amino acid residues. Preferably, a fragment of the mature polypeptide of SEQ ID NO: 6 contains at least 200 amino acid residues, more preferably at least 212 amino acid residues, and most preferably at least 224 amino acid residues. Preferably, a fragment of the mature polypeptide of SEQ ID NO: 8 contains at least 175 amino acid residues, more preferably at least 185 amino acid residues, and most preferably at least 195 amino acid residues. Preferably, a fragment of the mature polypeptide of SEQ ID NO: 10 contairHs at least 240 amino acid residues, more preferably at least 255 amino acid residues, and most preferably at least 270 amino acid residues. Preferably, a fragment of the mature polypeptide of SEQ ID NO: 12 contains at least 175 amino acid residues, more preferably at least 190 amino acid residues, and most preferably at least 205 amino acid residues. Preferably, a fragment of the mature polypeptide of SEQ ID NO: 14 contains at least 200 amino acid residues, more preferably at least 210 amino acid residues, and most preferably at least 220 amino acid residues.
Preferably, a subsequence of the mature polypeptide coding sequence of SEQ ID NO: 1 contains at least 831 nucleotides, more preferably at least 861 nucleotides, and most preferably at least 891 nucleotides. Preferably, a subsequence of the mature
21

polypeptide coding sequence of SEQ ID NO: 3 contains at least 555 nucleotides, more preferably at least 585 nucleotides, and most preferably at least 615 nucleotides. Preferably, a subsequence of the mature polypeptide coding sequence of SEQ ID NO: 5 contains at least 600 nucleotides, more preferably at least 636 nucleotides, and most preferably at least 672 nucleotides. Preferably, a subsequence of the mature polypeptide coding sequence of SEQ ID NO: 7 contains at least 525 nucleotides, more preferably at least 555 nucleotides, and most preferably at least 585 nucieoticles, Preferably, a subsequence of the mature polypeptide coding sequence of ^EQ ID NO: 9 contains at least 720 nucleotides, more preferably at least 765 nucleotides, and most preferably at least 810 nucleotides. Preferably, a subsequence of the mature polypeptide coding sequence of SEQ ID NO: 11 contains at least 525 nucleotides, more preferably at least 570 nucleotides, and most preferably at least 615 nucleotides. Preferably, a subsequence of the mature polypeptide coding sequence of SEQ ID NO: 13 contains at least 600 nucleotides, more preferably at least 630 nucleotides, and most preferably at least 660 nucleotides.
In a fourth aspect, isolated polypeptides having cellulolytic enhancing activity are encoded by polynucleotides comprising nucleotide sequences that hybridize under at least very low stringency conditions, preferably at least low stringency conditions, more preferably at least medium stringency conditions, more preferably at least medium-high stringency conditions, even more preferably at least high stringency cdnditiohs, and most preferably at least very high stringency conditions with (i) the mature polypeptide coding sequence of SEQ ID NO: 1, SEQ ID NO: 3, SEQ ID NO: 5. SEQ ID NO: 7, SEQ ID NO: 9, SEQ ID NO: 11, or SEQ ID NO: 13, (ii) the cDNA sequence contained in the mature polypeptide coding sequence of SEQ ID NO; 1, SEQ ID NO: 3, SEQ ID NO: 5, or SEQ ID NO; 11, or the genomic DNA sequence comprising the mature polypeptide coding sequence of SEQ ID NO: 7, SEQ ID NO: 9, or SEQ ID NO; 13, (ill) a subsequence of (i) or (ii), or (iv) a full-length complementary strand of (i), (ii), or (ill) (J J. Sambrook, E.F. Fritsch, and T. Maniatis, 1989, Molecular Cloning, A Laboratory MariuH, 2d edition, Cold Spring Harbor, New Yorit). A subsequence of the mature polypeptide coding sequence of SEQ ID NO: 1, SEQ ID NO: 3, SEQ ID NO; 5, SEQ ID NO: 7, SEQ ID NO: 9, SEQ ID NO: 11, or SEQ ID NO: 13 contains at least 100 contiguous nucleotides or preferably at least 200 contiguous nucleotides. Moreover, the subsequence may encode a polypeptide fragment that has cellulolytic enhancing activity. In a preferred aspect, the mature polypeptide coding sequence is nucleotides 388 to 1332 of SEQ ID NO: 1, nucleotides 98 to 821 of SEQ ID NO: 3, nucleotides 126 to 978 of SEQ ID NO: 5, nucleotides 55 to 678 of SEQ ID NO: 7, nucleotides 58 to 912 of SEQ ID NO: 9, nucleotides 67 to 796 of SEQ ID NO: 11, or nucleotides 77 to 766 of
22

SEQ ID NO: 13.
The nucleotide sequence of SEQ ID NO: 1, SEQ ID NO: 3. SEQ ID NO: 5, SEQ ID NO: 7, SEQ ID NO: 9, SEQ ID NO: 11, or SEQ ID NO: 13, or a subsequence thereof; as well as the amino acid sequence of SEQ ID NO: 2, SEQ ID NO: 4. SEQ ID NO: 6, SEQ ID NO: 8, or SEQ ID NO: 10, SEQ ID NO: 12, or SEQ ID NO: 14, or a fragment thereof, may be used to design a nucleic acid probe to Identify and clone DNA encoding polypeptides having celiuiolytic enhancing activity from strains of different genera or species. In particular, such probes can be used for hybridization with the genomic or cDNA of the genus or species of interest, following standard Southern blotting procedures, in order to identify and isolate the con'esponding gene therein. Such probes can be considerably shorter than the entire sequence, but should be at least 14, preferably at least 25, more preferably at least 35, and most preferably at least 70 nucleotides in length. It is, however, prefen^ed that the nucleic acid probe is at least 100 nucleotides in length. For example, the nucleic add probe may be at least 200 nucleotides, preferably at least 300 nucleotides, more preferably at least 400 nucleotides, or most preferably at least 500 nucleotides in length, Even longer probes may be used, e.g., nucleic acid probes that are at least 600 nucleotides, at least preferably at least 700 nucleotides, more preferably at least 800 nucleotides, or most preferably at least 900 nucleotides in length. Both DNA and RNA probes can be used. The probes are typically labeled for detecting the con'esponding gene (for example, with 32p 3|^ 35g i^iQ^j^ oravidin). Such probes are encompassed by the present invention.
A genomic DNA or cDNA library prepared from such other strains may, therefore, be screened for DNA that hybridizes with the probes described above and encodes a polypeptide having celiuiolytic enhancing activity. Genomic or other DNA from such other strains may be separated by agarose or potyacrylamide gel electrophoresis, or other separation techniques. DNA from the libraries or the separated DNA may be transferred to and Immobilized on nitrocellulose or other suitable carrier material. In order to identity a clone or DNA that is homologous with SEQ ID NO: 1; or a subsequence thereof; the carrier material is preferably used in a Southem blot. "' , t
For purposes of the present invention, hybridization indicates that the nucleotide sequence hybridizes to a labeled nucleic acid probe corresponding to the mature polypeptide coding sequence of SEQ ID NO: 1, SEQ ID NO: 3, SEQ ID NO: 5, SEQ ID NO: 7, SEQ ID NO: 9. SEQ ID NO: 11, or SEQ ID NO: 13, the cDNA sequence contained in the mature polypeptide coding sequence of SEQ ID NO: 1, SEQ ID NO: 3, SEQ ID NO: 5, or SEQ ID NO: 11, or the genomic DNA sequence comprising the mature polypeptide coding sequence of SEQ ID NO: 7, SEQ ID NO: 9, or SEQ ID NO: 13, its full-length complementary strand, or a subsequence thereof, under very low to
23

very high stringency conditions. Molecules to which the nucleic acid probe hybridizes under these conditions can be detected using, for example, X-ray film.
Ip a preferred aspect, the nucleic acid probe is the mature polypeptide coding sequence of SEQ ID NO: 1. In another preferred aspect, the nucleic acid probe is nucleotides 388 to 1332 of SEQ ID NO: 1. In another prefenred aspect, the nucleic acid probe is a polynucleotide sequence that encodes the polypeptide of SEQ ID NO: 2, or a subsequence thereof. In another preferred aspect, the nucleic acid probe Is SEQ ID NO: 1. In another preferred aspect, the nucleic acid probe is the polynucleotide sequence contained in plasmid pEJG120 which Is contained in E. coli NRRL B-30699, wherein the polynucleotide sequence thereof encodes a polypeptide having cellulQ|yti9 enhancing activity. In another preferred aspect, the nucleic acid probe (s the mature polypeptide coding region contained in plasmid pEJG120 which is contained In £. coli NRRL B-30699.
In another preferred aspect, the nucleic acid probe Is the mature polypeptide coding sequence of SEQ ID NO: 3. In another prefen-ed aspect, the nucleic acid probe is nucleotides 98 to 821 of SEQ ID NO: 3. In another preferred aspect, the nucleic acid probe is a polynucleotide sequence that encodes the polypeptide of SEQ ID NO: 4, or a subsequence thereof. In another preferred aspect, the nucleic acid probe is SEQ ID NO: 3. In another preferred aspect, the nucleic acid profc>e is the polynucleotide sequence contained in plasmid pTter61C which is contained in E. coli NRRL B-30813, wherein the polynucleotide sequence thereof encodes a polypeptide having cellulolytic enhancing activity. In another preferred aspect, the nucleic acid probe is the mature polypeptide coding region contained in plasmid pTter61C which is contained in £. coli NRRL 8^30813.
In another preferred aspect, the nucleic acid probe Is the mature polypeptide coding sequence of SEQ ID NO: 5. In another preferred aspect, the nucleic add probe is nucleotides 126 to 978 of SEQ ID NO: 5. In another preferred aspect, the nucleic acid probe is a polynucleotide sequence that encodes the polypeptide of SEQ ID NO; 6, or a subsequence thereof. In another preferred aspect, the nucleic acid probe Is SEQ ID NO: 5. in another preferred aspect, the nucleic acid probe is the polynuclQpti In another preferred aspect, the nucleic acid probe is the mature polypeptide coding sequence of SEQ ID NO: 7. In another prefen-ed aspect, the nucleic acid probe
24

is nucleotides 55 to 678 of SEQ ID NO: 7. In another preferred aspect, the nucleic acid probe Is a polynucleotide sequence that encodes the polypeptide of SEQ ID NO: 8, or a subsequence thereof. In another preferred aspect, the nucleic acid probe is SEQ ID NO: 7. In another preferred aspect, the nucleic acid probe is the polynucleotide sequence contained In piasmid pTter61E which is contained in E co// NRRL B-30814. wherein the polynucleotide sequence thereof encodes a polypeptide having cellulolytic enhancing activity. In another preferred aspect, the nucleic acid probe Is the mature polypeptide coding region contained In piasmid pTter61E which is contained in E coll NRRL B-30814.
In another preferred aspect, the nucleic acid probe is the mature polypeptide coding sequence of SEQ ID NO: 9. In another preferred aspect, the nucleic acid prot>e is nucleotides 58 to 912 of SEQ ID NO: 9 In another prefenred aspect, the nucleic iaci probe is a polynucleotide sequence that encodes the polypeptide of SEQ ID NO: 10, or a subsequence thereof. In another preferred aspect, the nucleic add probe Is SEQ ID NO: 9. In another preferred aspect, the nucleic acid probe is the polynucleotide sequence contained in piasmid pTter61G which is contained in E coli NRRL B-30811, wherein the polynucleotide sequence thereof encodes a polypeptide having cellulolytic enhancing activity. In another preferred aspect, the nucleic acid probe is the mature polypeptide coding region contained In piasmid pTter61G which Is contained h E coll NRRL B-30811.
In another preferred aspect, the nucleic acid probe is the mature polypeptide coding sequence of SEQ ID NO: 11. In another preferred aspect, the nucleic acid probe Is nucleotides 67 to 796 of SEQ ID NO: 11. In another prefenred aspect, the nucleic acid probe is a polynucleotide sequence that encodes the polypeptide of SEQ ID MO: 12, or a subsequence thereof. In another preferred aspect, the nucleic add probe is SEQ ID NO: 11. In another prefen-ed aspect, the nucleic acid probe is the polynucleotide sequence contained in piasmid pDZA2-7 which Is contained in E coli NRRL B-30704, wherein the polynucleotide sequence thereof encodes a polypeptide having cellulolytic enhancing activity. In another preferred aspect, the nucleic acid probe is the mature polypeptide coding region contained in piasmid pDZA2-7 which is contained in E coli NRRL B-30704.
In another preferred aspect, the nucleic acid prolse, is the mature polypi9pti(;le coding sequence of SEQ ID NO: 13. In another preferred aspect, the nucleic acid probe is nucleotides 77 to 766 of SEQ ID NO: 13. in another prefenred aspect, the nucleic acid probe is a polynucleotide sequence that encodes the polypeptide of SEQ ID NO: 14, or a subsequence thereof. In another preferred aspect, the nucleic acid probe is SEQ ID NO: 13. In another preferred aspect, the nucleic acid probe Is the polynucleotide
25

sequence contained in plasmid pTr333 which is contained in E. coli NRRL B-30878, wherein the polynucleotide sequence thereof encodes a polypeptide having cellulolytic enhancing activity. In another preferred aspect, the nucleic acid probe is the mature polypeptide coding region contained in plasmid pTr333 which Is contained in E. coll NRRL B-30878.
For long probes of at least 100 nucleotides in length, very low to very high stringency conditions are defined as prehybridization and hybridization at 42'C in 5X SSPE, 0.3% SDS, 200 i^g/ml sheared and denatured salmon sperm DNA, and either 25% formamide for very low and low stringencies, 35% formamide for medium and medium-high stringencies, or 50% formamide for high and very high stringencies, following standard Southem blotting procedures for 12 to 24 hours optimally.
For long probes of at least 100 nucleotides In length, the carrier material is finally washed three times each for 15 minutes using 2X SSC, 0.2% SDS preferably at least 9t 45'C (very low stringency), more preferably at least at 50'C (low stringency), more preferably at least at 55'C (medium stringency), more preferably at least at GO'C (medium-high stringency), even more preferably at least at 65'C (high stringency), and most preferably at least at 70°C (very high stringency).
For short probes of about 15 nucleotides to about 70 nucleotides in length, stringency conditions are defined as prehybridization, hybridization, and washing post-hybridization at about 5'C to about 10'C below the calculated Tm using the calculation according to Bolton and McCarthy (1962, Proceedings of ttie National Academy of Sciences USA 48:1390) in 0.9 M NaCI, 0.09 M Tris-HCI pH 7.6, 6 mM EDTA, 0.5% HP-40, IX Denhardt's solution, 1 mM sodium pyrophosphate, 1 mM sodium monobasic phosphate, 0.1 mM ATP, and 0.2 mg of yeast RNA per ml following standard Southem blotting procedures for 12 to 24 hours optimally.
For short probes of about 15 nucleotides to about 70 nucleotide&l in length, the carrier n^aterial is washed once in 6X SCC plus 0.1% SDS for 15 minuteis and twice each for 15 minutes using 6X SSC at 5'C to 10'C below the calculated Tm-
In a fifth aspect, isolated polypeptides having cellulolytic enhancing activity are encoded by polynucleotides comprising or consisting of nucleotide sequences that have a degree of identity to the mature polypeptide coding sequence of SEQ ID NO: 1, SEQ ID NO: 3, SEQ ID NO: 5, SEQ ID NO: 7. SEQ ID NO: 9, SEQ ID NO: 11, or SEQ ID NO: 13 of preferably at least 60%, more preferably at least 65%, more preferably at least 70%, more preferably at least 75%, more preferably at least 80%, morepref^rebV at least 85%, even more preferably at least 90%, most preferably at least 95%, and-even most preferably at least 96%, 97%, 98%, or 99%, which encode an active polypeptide having cellulolytic enhancing activity.
26

In a preferred aspect, the mature polypeptide coding sequence is nucleotides 388 to 1332 of SEQ ID NO: 1, nucleotides 98 to 821 of SEQ ID NO: 3, nucleotides 126 to 978 of SEQ ID NO: 5, nucleotides 55 to 678 of SEQ ID NO: 7, nucleotides 58 to 912 of SEQ ID NO: 9, nucleotides 67 to 796 of SEQ ID NO: 11, or nucleotides 77 to 766 of SEQ ID NO: 13. See polynucleotide section herein.
In a sixth aspect, isolated polypeptides having cellulolytic enhancing activity are artificial variants comprising a substitution, deletion, and/or insertion of ond or' triOre (or several) amino acids of the mature polypeptide of SEQ ID NO: 2, SEQ lb NO:* 4, SEQ ID NO: 6. SEQ ID NO: 8, or SEQ ID NO: 10, SEQ ID NO: 12, or SEQ ID NO: 14; or a homologous sequence thereof. Preferably, amino acid changes are of a minor nature, that is conservative amino acid substitutions or insertions that do not signiflcantiy affect the folding and/or activity of the protein; small deletions, typically of one to about 30 amino acids; small amino- or carboxyl-terminal extensions, such as an amino-termlnai methionine residue; a small linker peptide of up to about 20-25 residues; or a sniall extension that facilitates purification by changing net charge or another function,,such a^ a poly-histidine tract, an antigenic epitope or a binding domain.
Examples of conservative substitutions are within the group of basic amino acids (arginine, lysine and histidine), acidic amino acids (glutamic acid and aspaitic acid), polar amino acids (glutamine and asparagine), hydrophobic amino acids (leucine, isoleucine and valine), aromatic amino acids (phenylalanine, tryptophan and tyrosine), and small amino acids (glycine, alanine, serine, threonine and methionine). Amino acid substitutions that do not generally alter specific activity are itnown in the art and are described, for example, by H. Neurath and R.L. Hill, 1979, In, The Proteins, Academic Press, New Yorlt. The most commonly occurring exchanges are Ala/Ser, Val/lle, Asp/Glu, Thr/Ser, Ala/Gly, Alan"hr, Ser/Asn, Ala/Val, Ser/Gly, Tyr/Phe, Ala/Pro, Lys/Arg, Asp/Asn, Leu/lle, LeuA/al, Ala/Glu, and Asp/Gly.
In addition to the 20 standard amino acids, non-standard amino aci(^8 (such as 4-hydroxyproiine, 6-A/-methyl lysine, 2-aminoisobutyric acid, isovaiine, and alpha-methyl serine) may be substituted for amino acid residues of a wild-type polypeptide. A limited number of non-conservative amino acids, amino acids that are not encoded by the genetic code, and unnatural amino acids may be substituted for amino acid residues. "Unnatural amino acids" have been modified after protein synthesis, and/or have a chemical structure in their side chain(s) different from that of the standard amino acids. Unnatural amino acids can be chemically synthesized, and preferably, are commercially
available, and include pipecolic acid, thiazolidine carboxylic acid, dehydroproline, 3^ and
' ' ■ ■ f,', ' J.' "'
4-methylproiine, and 3,3-dimethylproline.
27

Alternatively, the amino acid changes are of such a nature that the physico-chemical properties of the polypeptides are altered. For example, amino acid changes may improve the thermal stability of the polypeptide, alter the substrate specificity, change the pH optimum, and the like.
Essential amino acids in the parent polypeptide can be identified according to procedures known in the art, such as site-directed mutagenesis or alanine-scannlng mutagenesis (Cunningham and Wells, 1989, Science 244: 1081-1085). In the latter technique, single alanine mutations are introduced at every residue in the moleicule, and the resultant mutant molecules are tested for biolpglcal activity to identify amino acid residues that are critical to the activity of the molecule. See also, Hilton et ai, 1996, J. Biol. Chem. 271: 4699-4708. The active site of the enzyme or other biological interaction can also be determined by physical analysis of structure, as determined by such techniques as nuclear magnetic resonance, crystallography, electron diffraction, or photoaffinity labeling, in conjunction with mutation of putative contact site amino acids. See, for example, de Vos et al., 1992, Science 255: 306-312; Smith et a/., 1992, J. Mo/ Biol. 224: 899-904; Wlodaver et al., 1992, FEBS Lett. 309: 59-64. The identities pf essential amino acids can also be inferred from analysis of identities with polypeptides that are related to a polypeptide according to the invention.
Single or multiple amino acid substitutions, deletions, and/or insertions can be made and tested using known methods of mutagenesis, recombination, and/or shuffling, followed by a relevant screening procedure, such as those disclosed by Reidhaar-Olson and Sauer, 1988, Science 241: 53-57; Bowie and Sauer, 1989, Proc. Natl. Acad. Sci. USA 86: 2152-2156; WO 95/17413; or WO 95/22626. Other methods that can be used include error-prone PCR, phage display (e.g., Lowman ef al., 1991, Biochem. 30: 10832-10837; U.S. Patent No. 5,223,409; WO 92/06204), and region-directed mutagenesis (Derbyshire et al., 1986, Gene 46:145; Ner et al., 1988, DNA 7:127).
I\^utagenesis/shuffling methods can be combined with high-throughput, automated screening methods to detect activity of cloned, mutagenized polypeptides expressed by host cells (Ness et al., 1999, Nature Biotechnology 17: 893^896). Mutagenized DNA molecules that encode active polypeptides can be recovered from the host cells and rapidly sequenced using standard methods in the art. These methods allow the rapid determination of the importance of individual amino acid residues in a polypeptide of interest, and can be applied to polypeptides of unknown structure.
The total number of amino acid substitutions, deletions and/or insertions of the mature polypeptide of SEQ ID NO: 2, SEQ ID NO: 4, SEQ ID NO: 6, SEQ ID NO: 8, SEQ ID NO: 10, SEQ ID NO: 12, or SEQ ID NO: 14, Is 10, preferably 9, more pr«ferably 8, more preferably 7, more preferably at most 6, more preferably 5, more preferably 4',
28

even more preferably 3, most preferably 2, and even most preferably 1.
A polypeptide having cellulolytic enhancing activity may be obtained from microorganisms of any genus. For purposes of the present invention, the term "obtained from" as used herein in connection with a given source shall mean that the polypeptide encoded by a nucleotide sequence is produced by the source or by a strain in which the nucleotide sequence from the source has been inserted. In a preferred aspect, the polypeptide obtained from a given source is secreted extracellularly.
A polypeptide having cellulolytic enhancing activity may be a bacterial polypeptide. For example, the polypeptide may be a gram positive bacterial polypeptide such as a Bacillus, Streptococcus, StrBptomyces, Staphylococcus, Enterococcus, Lactobacillus, Lactococcus, Clostridium, Geobaclllus, or Oceanobaclllus polypeptide having cellulolytic enhancing activity, or a Gram negative bacterial polypeptide such as an E coil, Pseudomonas, Salmonella, Campylobacter, Helicobacter, Flavobacterium, Fusobacterium, llyobacter, Neisseria, or Ureaplasma polypeptide having cellulolytic enhancing activity.
In a preferred aspect, the polypeptide is a Bacillus alkalophilus, Bacillus amyloliquefaciens, Bacillus brevis. Bacillus circulans, Bacillus clausii, BafilHus coagulans. Bacillus firmus, Bacillus lautus, Bacillus lBr)tus, Bacillus'lichaniformis, Bacillus megaterium. Bacillus pumllus, Bacillus stearottiermophllus, Bacillus subtills, or Bacillus thuringiensis polypeptide having cellulolytic enhancing activity.
In another preferred aspect, the polypeptide is a Streptococcus equlsimllls, Streptococcus pyogenes, Streptococcus uberis, or Streptococcus equi subsp. Zooepidemicus polypeptide having cellulolytic enhancing activity.
In another preferred aspect, the polypeptide is a Streptomyces achromogenes, Streptomyces avennitilis, Streptomyces coellcolor, Streptomyces griseus, or Streptomyces lividans polypeptide having cellulolytic enhancing activity.
The polypeptide having cellulolytic enhancing activity may also be a fungal polypeptide, and more preferably a yeast polypeptide such as a Candida, Kluyveromyces, Pichia, Saccharomyces, Sct\lzosaccharomyces, or Yanowla polypeptide having cellulolytic enhancing activity; or more preferably k filarhehtous fungal polypeptide such as an Acramonium,' Agaricus, Altemarla, Aspergttlus, Aureobasidium, Botryospaeria, Ceriporiopsis, ChaatomkUum, Chrysosporium, Claviceps, Cochlloboius, Coprinopsis, Coptotermes, Corynascus, Cryphonectria, Cryptococcus, Diplodia, Exidia, Filibasldium, Fusarium, Glbberella, Holomastigotoldes, l-lumicola, Irpex, Lentinula, Leptospaeria, Magnaporthe, Melanocarpus, MeripHus, Mucor, Myceliophthora, Neocalllmastix, Neurospora, Paecllomyces, Pen'iclllium, Plianerochaete, PIromyces, Poitrasia, Pseudoplectanla, Pseudotrlchonyrnpha,
29

Rhizomucor, Schizophyllum, Scytalldlum, Talaromyces, Thermoascus, Thielavia, Tolypocladium, Tiichodeiwa, Trichophaea, VerticiWum, Volvaiiella, or Xylaria polypeptide having cellulolytic enhancing activity.
In a preferred aspect, the polypeptide is a Saccharomyces caiisbergensis, Saccharomyces cerevlsiae, Sacchammyces diastattous, Sacchammyces douglasH, Sacchammyces kluyveri, Saccharomyces norbensis, or Saccharomyces ovUbmfiis polypeptide having cellulolytic enhancing activity.
In another preferred aspect, the polypeptide Is an Acremonlum cellulolyttous, Aspergillus aculeatus, Aspergillus awamorl, Aspergillus fumlgatus, Aspergillus foetidus, Aspergillus japonicus, Aspergillus nidulans, AsperglHus niger, Aspeiyillus oryzae, Chrysosporlum keratinophllum, Chrysosporlum lucknowense, Chrysosporiumtrof^um, Chrysosporium merdarlum, Chrysosporlum Ihops, Chrysosporlum pannlcola, Chrysosporlum queenslandlcum, Chrysosporlum zonatum, Fusarlum bactrldtokles, Fusarium cerealls, Fusarlum crookwellense, Fusarlum culmorum, Fusarlum graminearum, Fusarlum gramlnum, Fusarlum heterosporum, Fusarium negundl, Fusarium oxysporum, Fusarium neticulatum, Fusarium roseum, Fusarlum sambuclnum, Fusarium sarcochroum, Fusarium spomtrichtoides, Fusarlum sulphureum, Fusarlum torulosum, Fusarlum trichothecloldes, Fusarium venenatum, Humlcola grisea, Humkx)la hsolens, Humlcola lanuginosa, irpex lacteus, Mucor. miehel, Myceltopttihora thermophlla, Neurospora crassa, Peniciillum funbulosum, Penlcllllum purpurogenum, Phanerochaete chrysosporium, Thielavia achromatksa, Thielavia albomyces, Thielavia albopiiosa, Thielavia australelnsis, Thielavia flmeti, Thielavia mhrospora, Thielavia ovispora, Thielavia peruviana, Thielavia spededonlum, Thielavia setosa, Thielavia subthermophlla, Thielavia terrestrls, Trkihoderma hartianum, Tnchoderma koningll, Trichoderma longibrachiatum, Trkihoderma reesel, Trkshoderma vldde, or Trkihhaea saccate polypeptide having cellulolytic enhancing activity.
In a more preferred aspect, the polypeptide is a Thielavia terrestris polypeptide having cellulolytic enhancing activity. In a most preferred embodiment, the polypeptide is a Thielavia terrestris NRRL 8126 polypeptide having cellulotytic enhancing activity, e.g., the mature polypeptide of SEQ ID NO: 2, 4, 6,8, or 10, or fragments thereof that have cellulolytic enhancing activity.
In another more prefen^ed aspect, the polypeptide Is a Thermoascus aurantlacus (jolypeptide having cellulolytic enhancing activity, e.g., the mature polypeptide of SEQ ID NO: 12.
In another more preferred aspect, the polypeptide is a Trichoderma reesei polypeptide having cellulolytic enhancing activity. In another most preferred aspect, the polypeptide is a Trichoderma reesei RutC30 (ATCC 56765) polypeptide having
30

cellulolytic enhancing activity e.g., the mature polypeptide of SEQ ID NO: 14, or fragments thereof that have cellulolytic enhancing activity.
It will be understood that for the aforementioned species the invention encompasses both the perfect and imperfect states, and other taxonomic equivalents, e.g., anamorphs, regardless of the species name by which they are known. Those skilled in the art will readily recognize the identity of appropriate equivaliants.
Strains of these species are readily accessible to the public In a number of culture collections, such as the American Type Culture Collection (ATCC), Deutsche Sammlung von Mikroorganismen und Zellkulturen GmbH (DSM), Centraalbureau Voor Schimmelcuitures (CBS), and Agricultural Research Service Patent Culture Collection, Northern Regional Research Center (NRRL).
Furthermore, such polypeptides may be identified and obtained from other sources including microorganisms isolated from nature (e.g., soil, composts, water, etc.) using the above-mentioned probes. Techniques for isolating microorganisms from natural habitats are well known in the art. The polynucleotide may then be obtained by similarly screening a genomic or cDNA library of such a microorganism. Once a polynucleotide sequence encoding a polypeptide has been detected with the probe(s), the polynucleotide can be isolated or cloned by utilizing techniques that are well known to those of ordinary skill in the art (see, e.g., Sambrook et al., 1980, supra).
Polypeptides having cellulolytto enhancing activity also include fused polypeptides or cleavable fusion polypeptides in which another polypeptide is fused at the N-terminus or the C-terminus of the polypeptide or fragment thereof hdving cellulolytic enhancing activity. A fused polypeptide is produced by fusing a nucleotide sequence (or a portion thereof) encoding another polypeptide to a nucleotide sequence (or a portion thereof) encoding a polypeptide h having cellublytic enhancing activity. Techniques for producing fusion polypeptides are known In the art, and include iigatlng the coding sequences encoding the polypeptides so that they are In frame and that expression of the fused polypeptide is under control of the same promoter(8) and terminator.
For further details on polypeptides having cellulolytic enhancing activity and polynucleotides thereof, see WO 2005/074647, WO 2005/074656, and U.S. Published Application Serial No. 2007/0077630, which are incorporated herein by reference.
Polynucleotides comprising nucleotide sequences that encode' polypeptides having cellulolytic enhancing activity can be isolated and utilized to practice the methods of the present invention, as described herein.
The polynucleotides comprise or consist of nucleotide sequences that have a degree of identity to the mature polypeptide coding sequence of SEQ ID NO: 1, SEQ ID
31

NO: 3, SEQ ID NO: 5, SEQ ID NO: 7, SEQ ID NO: 9. SEQ ID NO: 11, or SEQ ID NQ: 13 of preferably at least 60%, more preferably at least 65%, more preferably at least 70*)^. more preferably at least 75%, more preferably at least 80%, more preferably at least 85%, even more preferably at least 90%, most preferably at least 95%, and even most preferably at least 96%, 97%, 98%, or 99%, that encode a polypeptide having cellulolytic enhancing activity.
In a preferred aspect, the nucleotide sequence comprises or consists of SEQ ID NO: 1. In another more preferred aspect, the nucleotide sequence comprises or consists of the sequence contained in plasmid pEJG120 that is contained in Escherichia coll NRRL B-30699. In another preferred aspect, the nucleotide sequence comprises or consists of the mature polypeptide coding region of SEQ ID NO: 1. In another more preferred aspect, the nucleotide sequence comprises or consists of the mature polypeptide coding region contained in plasmid pEJG120 that is contained in Escherichia coll NRRL B-30699. The present invention also encompasses nucleotide sequences that encode a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO: 2 or the mature polypeptide thereof, which differ from SEQ ID NO: 1 by virtue of the degeneracy of the genetic code. The present invention also relates to subsequences of SEQ ID NO: 1 that encode fragments of SEQ ID NO; 2 that have cellulolytic enhancing activity.
In another preferred aspect, the nucleotide sequence comprises or consists of SEQ ID NO: 3. In another more prefen^ed aspect, the nucleotide sequence comprises or consists of the sequence contained in plasmid pTter61C that is contained jn Escherichia coli NRRL 6-30813. In another preferred aspect, the nucleotide sequence comprises or consists of the mature polypeptide coding region of SEQ ID NO: 3. In another more preferred aspect, the nucleotide sequence comprises or consists of the mature polypeptide coding region contained in plasmid pTter610 that is contained in Escherichia coli NRRL B-30813. The present invention also encompasses nucleotide sequences that encode a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO: 4 or the mature polypeptide thereof, which differ from SEQ ID NO: 3 by virtue of the degeneracy of the genetic code. The present invention also relates to subsequences of SEQ ID NO: 3 that encode fragments of SEQ ID NO: 4 that have cellulolytic enhancing activity.
In another preferred aspect, the nucleotide sequence comprises or consists of SEQ ID NO: 5. In another more preferred aspect, the nucleotide sequence comprises or consists of the sequence contained in plasmid pTtQr61D that is contained in Escherichia coll NRRL B-30812. In another preferred aspect, the nucleotide sequence comprises or consists of the mature polypeptide coding region of SEQ ID NO: 5. In another more
32

preferred aspect, the nucleotide sequence comprises or consists of the mature polypeptide coding region contained in plasmid pTter61D that is contained in Escherichia coli NRRL B-30812, The present invention also encompasses nuctootidi» sequences that encode a polypeptide comprising or consisting of the amino .acid sequence of SEQ ID NO: 6 or the mature polypeptide thereof, which differ from SEQ ID NO: 5 by virtue of the degeneracy of the genetic code. The present invention also relates to subsequences of SEQ ID NO: 5 that encode fragments of SEQ ID NO: 6 that have cellulolytic enhancing activity.
In another preferred aspect, the nucleotide sequence comprises or consists of SEQ ID NO: 7. In another more preferred aspect, the nucleotide sequence comprises or consists of the sequence contained in plasmid pTter61E that is contained in Escherichia coli NRRL B-30814. In another preferred aspect, the nucleotide sequence comprises or consists of the mature polypeptide coding region of SEQ ID NO: 7. In another more preferred aspect, the nucleotide sequence comprises or consists of the mature polypeptide coding region contained in plasmid pTter61E that is pon:taii:ied in Escherichia coli NRRL B-30B14. The present invpntioh also encompasses nucleiotide sequences that encode a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO: 8 or the mature polypeptide thereof, which differ from SEQ ID NO: 7 by virtue of the degeneracy of the genetic code. The present invention also relates to subsequences of SEQ ID NO: 7 that encode fragments of SEQ ID NO: 8 that have cellulolytic enhancing activity.
in another prefenred aspect, the nucleotide sequence comprises or consists of SEQ ID NO: 9. In another more preferred aspect, the nucleotide sequence compristas qr consists of the sequence contained in plasmid pTterSIG that is contained in Esoherichlei coli NRRL B-30811. In another preferred aspect, the nucleotide sequence comprises or consists of the mature polypeptide coding region of SEQ ID NO: 9. In another more preferred aspect, the nucleotide sequence comprises or consists of the mature polypeptide coding region contained in plasmid pTter61G that is contained in Escherichia coli NRRL B-30811. The present invention also encompasses nucleotide sequences that encode a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO: 10 or the mature polypeptide thereof, which differ from SEQ ID NO: 9 by virtue of the degeneracy of the genetic code. The present invention also relates to subsequences of SEQ ID NO: 9 that encode fragments of SEQ ID NO: 10 that have cellulolytic enhancing activity.
In another preferred aspect, the nucleotide sequence comprises or consists of SEQ ID NO: 11. In another more preferred aspect, the nucleotide sequehce cornprises or consists of the sequence contained in plasmid pDZA2-7 that is contliined in
33

Escherichia coli NIRRL B-30704. In another preferred aspect, the nucleotide sequence comprises or consists of the mature polypeptide coding region of SEQ ID NO: 11. In another more preferred aspect, the nucleotide sequence comprises or consists of the mature polypeptide coding region contained in plasmid pDZA2-7 that is contained ii) Escherichia coll NRRL B-30704. The present invention also encompasses nucleotide sequences that encode a polypeptide comprising or consisting of the amino add sequence of SEQ ID NO: 12 or the mature polypeptide thereof, which differ from SEQ ID NO: 11 by virtue of the degeneracy of the genetic code. The present invention also relates to subsequences of SEQ ID NO: 11 that encode fragments of SEQ ID NO: 12 that have cellulolytic enhancing activity.
In another prefen-ed aspect, the nucleotide sequence comprises or consists of SEQ ID NO: 13. in another more preferred aspect, the nucleotide sequence comprises or consists of the sequence contained in plasmid pTr3337.which is contained in Escherichia coli NRRL B-30878. In another preferred aspect, the nucleotide sequence comprises or consists of the mature polypeptide coding region of SEQ ID NO: 13. In another more preferred aspect, the nucleotide sequence comprises or consists of the mature polypeptide coding region contained in plasmid pTr3337 which isi coiitained in Escherichia coil NRRL B-30878. The present invention also encompasses nucleotide sequences that encode a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO: 14 or the mature polypeptide thereof, which differ from SEQ ID NO: 13 or the mature polypeptide coding sequence thereof by virtue of the degeneracy of the genetic code. The present invention also relates to subsequences of SEQ ID NO: 13 that encode fragments of SEQ ID NO: 14 that have cellulolytic enhancing activity.
The present invention also relates to mutant polynucleotides comprising at least dne mutation in the mature polypeptide coding sequence of SEQ ID NO: t. SEQ JD^ld: 3, SEQ ID NO: 5, SEQ ID NO: 7, SEQ ID NO: 9. SEQ ID NO: 11. or SEQ ID NO: 13, ih which the mutant nucleotide sequence encodes the mature polypeptide of SEQ ID NO: 2, SEQ ID NO: 4, SEQ ID NO: 8, SEQ ID NO: 8. or SEQ ID NO: 10, SEQ ID NO: 12, or SEQ ID NO: 14. in a preferred aspect, the mature polypeptide is amino acids 20 to 326 of SEQ ID NO: 2, amino acids 18 to 240 of SEQ ID NO: 4, amino acids 20 to 258 of SEQ ID NO: 6, amino acids 19 to 226 of SEQ ID NO: 8, or amino acids 20 to 304 of SEQ ID NO: 10, amino acids 23 to 250 of SEQ ID NO: 12, or amino acids 20 to 249 of SEQ ID NO: 14. In another preferred aspect, the mature polypeptide coding sequence is nucleotides 388 to 1332 of SEQ ID NO: 1, nucleotides 98 to 821 of SEQ ID NO: 3, nucleotides 126 to 978 of SEQ ID NO: 5, nucleotides 55 to 678 of SEQ ID NO: 7, nucleotides 58 to 912 of SEQ ID NO: 9. nucleotides 67 to 796 of SEQ ID NO: 11, or nucleotides 77 to 766 of SEQ ID NO: 13.
34

As described earlier, the techniques used to isolate or clone a polynucleotide encoding a polypeptide are l The polynucleotide may also be a polynucleotide comprising or consisting of a nucelotide sequence encoding a polypeptide having cellulolytic enhancing activity that hybridize under at least very low stringency conditions, preferably at least low strjngWt^ conditions, more preferably at least medium stringency conditions, more preferably at least medium-high stringency conditions, even more preferably at least high stringency conditions, and most preferably at least very high stringency conditions with (i) the mature polypeptide coding sequence of SEQ ID NO: 1, SEQ ID NO: 3, SEQ ID NO: 5, SEQ ID NO: 7, SEQ ID NO: 9, SEQ ID NO: 11, or SEQ ID NO: 13, (ii) the cDNA sequence contained in the mature polypeptide coding sequence of SEQ ID NO: 1, SEQ ID NO: 1, SEQ ID NO; 5, or SEQ ID NO: 11, or the genomic DNA sequence comprising the mature polypeptide coding sequence of SEQ ID NO: 7, SEQ ID NO: 9, or SEQ ID NO: 13, or (Hi) a full-length complementary strand of (i) or (ii); or allelic variants and subsequences thereof (Sambrooi Beta-Glucosldases and Polynucleotides Thereof
Any polypeptide having beta-giucosidase activity useful in processing cellulose-
containing material can be a component of the compositions of the present invention. A
polynucleotide encoding Such a polypeptide having beta-giucosidase activity can be
used to produce the compositions. ), ,.
In a first aspect, isolated polypeptides having beta-giucosidase adivity comprise or consist of amino acid sequences that have a degree of identity to the mature polypeptide of SEQ ID NO; 16, SEQ ID NO: 18, SEQ ID NO; 20, SEQ ID NO: 22, SEQ ID NO: 24, SEQ ID NO: 26, or SEQ ID NO; 28 of preferably at least 60%, more preferably at least 65%, more preferably at least 70%, more preferably at least 75%, more preferably at least 80%, even more preferably at least 85%, even more preferably at least 90%, most preferably at least 95%, and even most preferably at least 96%, 97%, 98%, or 99%, which have beta-giucosidase activity (hereinafter "homologous ^ polypeptides"). In a preferred aspect, the homologous polypeptides comprise or consist of an amino acid sequence that differs by ten amino acids, preferably by five amino acids, more preferably by four amino acids, even more preferably by three amino acids.
35

■ ■ ■ ■ , ,J , • ■•
most preferably by two amino acids, and even most preferably by one amino acid from
the mature polypeptide of SEQ ID NO: 16. SEQ ID NO: 18, SEQ ID NO: 20. SEQ ID
NO: 22, SEQ ID NO: 24, SEQ ID NO: 26, or SEQ ID NO: 28.
A polypeptide having beta-glucosidase activity preferably comprises the amino acid sequence of SEQ ID NO: 16 or an allelic variant thereof; or a fragment thereof that has beta-glucosidase activity. In a preferred aspect, the polypeptide comprises the amino acid sequence of SEQ ID NO: 16. In another preferred aspect, the polypeptide comprises the mature polypeptide of SEQ ID NO: 16. In another preferred aspect, the polypeptide comprises amino acids 20 to 861 of SEQ ID NO: 16, or an allelic variant thereof; or a fragment thereof that has beta-glucosidase activity. In another preferred aspect, the polypeptide comprises amino acids 20 to 861 of SEQ ID NO: 16. In another preferred aspect, the polypeptide consists of the amino acid sequence of SEQ ID NO: 16 or an allelic variant thereof; or a fragment thereof that has beta-glucosidase activity. In another preferred aspect, the polypeptide consists of the amino acid sequence of SEQ ID NO: 16. in another preferred aspect, the polypeptide consists of the mature polypeptide of SEQ ID NO: 16. In another prefeaed aspect, the polypeptide consists of amino acids 20 to 861 of SEQ ID NO: 16 or an alielic variant thereof; or a fragment thereof that has beta-glucosidase activity. In another preferred aspect, the polypeptide consists of amino acids 20 to 861 of SEQ ID NO: 16.
A polypeptide having beta-glucosidase activity preferably comprises the amino acid sequence of SEQ ID NO: 18 or an allelic variant thereof; or a fragment thereof that has beta-glucosidase activity. In a prefen-ed aspect, the polypeptide comprises the amino acid sequence of SEQ ID NO: 18. In another preferred aspect, the polypeptide comprises the mature polypeptide of SEQ ID NO: 18. In another preferred aspect, the polypeptide comprises amino acids 20 to 861 of SEQ ID NO: 18, or an allelic variant thereof; or a fragment thereof that has beta-glucosidase activity. In another prefen'ed aspect, the polypeptide comprises amino acids 20 to 861 of SEQ ID NO: 18. In another preferred aspect, the polypeptide consists of the amino acid'sequence of SEQ ID Nd': 18 or an allelic variant thereof; or a fragment thereof that has beta-glucosidase activity'. In another preferred aspect, the polypeptide consists of the amino acid sequence of SEQ ID NO: 18. In another preferred aspect, the polypeptide consists of the mature polypeptide of SEQ ID NO: 18. In another preferred aspect, the polypeptide consists of amino acids 20 to 861 of SEQ ID NO: 18 or an allelic variant thereof; or a fragment thereof that has beta-glucosidase activity. In another preferred aspect, the polypeptide consists of amino acids 20 to 861 of SEQ ID NO: 18.
A polypeptide having beta-glucosidase activity preferably comprises the amino acid sequence of SEQ ID NO: 20 or an allelic variant thereof; or a fragment thereof that
36

has beta-glucosidase activity. In a preferred aspect, the polypeptide coniprises the amino acid sequence of SEQ ID NO: 20. In another preferred aspect, the polypeptide comprises the mature polypeptide of SEQ ID NO: 20. In another preferred aspect, the polypeptide comprises amino acids 20 to 863 of SEQ ID NO: 20, or an allelic variant thereof; or a fragment thereof that has beta-glucosMase activity. In another preferred aspect, the polypeptide comprises amino acids 20 to 863 of SEQ ID NO: 20. In another preferred aspect, the polypeptide consists of the amino acid sequence of SEQ ID NO: 20 or an allelic variant thereof; or a fragment thereof that has beta-glucosidase activity. In another preferred aspect, the polypeptide consists of the amino acid sequence 6f SEQ ID NO: 20. In another prefen'ed aspect, the polypeptide consists of the matured polypeptide of SEQ ID NO: 20. In another preferred aspect, the polypeptide consists of amino acids 20 to 863 of SEQ ID NO: 20 or an allelic variant thereof; or a fragment thereof that has beta-glucosidase activity. In another preferred aspect, the polypeptide consists of amino acids 20 to 863 of SEQ ID NO: 20.
A polypeptide having beta-glucosidase activity preferably comprises the amino acid sequence of SEQ ID NO: 22 or an allelic variant thereof; or a fragment thereof that has beta-glucosidase activity. In a preferred aspect, the polypeptide comprises the amino acid sequence of SEQ ID NO: 22. In another preferred aspect, the polypeptide comprises the mature polypeptide of SEQ ID NO: 22. In another prefenred aspect, the polypeptide comprises amino acids 37 to 878 of SEQ ID NO; 22, or an allelic variant thereof; or a fragment thereof that has beta-glucosidase activity. In another, preferred aspect, the polypeptide comprises amino acids 37 to 878 of SEQ ID NO: 22. Ir^ another preferred aspect, the polypeptide consists of the amino acid sequence of SEQ ID NO: 22 or an allelic variant thereof; or a fragment thereof that has beta-glucosidase activity. In another preferred aspect, the polypeptide consists of the amino acid sequence of SEQ ID NO: 22. In another preferred aspect, the polypeptide consists of the mature polypeptide of SEQ ID NO: 22. In another prefened aspect, the polypeptide consists of amino acids 37 to 878 of SEQ ID NO: 22 or an allelic variant thereof; or a fragment thereof that has beta-glucosidase activity. In another preferred aspect, the polypeptide consists of amino acids 37 to 878 of SEQ ID NO: 22.
A polypeptide having beta-glucosidase activity preferably comprises the amino acid sequence of SEQ ID NO: 24 or an allelic variant thereof; or a fragment thereof that has beta-glucosidase activity. In a preferred aspect, the polypeptide comprises the amino acid sequence of SEQ ID NO: 24. In another prefered aspect, the polypeptide comprises the mature polypeptide of SEQ ID NO: 24. In another preferred aspect, the polypeptide comprises amino acids 32 to 744 of SEQ ID NO: 24, or an allelic variant thereof; or a fragment thereof that has beta-glucosidase activity. In another prefenvd
37

aspect, the polypeptide comprises amino sicids 32 to 744 of SEQ |D NO: 24. In another preferred aspect, the polypeptide consists of the amino acid sequence of SEQ ID NO: 24 or an allelic variant thereof; or a fragment thereof that has beta-glucosida8e„ activity, in another preferred aspect, the polypeptide consists of the amino acid sequence of SEQ ID NO: 24. In another preferred aspect, the polypeptide consists of the mature polypeptide of SEQ ID NO: 24. In another preferred aspect, the polypeptide consists of amino acids 32 to 744 of SEQ ID NO: 24 or an allelic variant thereof; or a fragment thereof that has beta-glucosidase activity. In another preferred aspect, the polypeptide consists of amino acids 32 to 744 of SEQ ID NO: 24.
A polypeptide having beta-glucosidase activity preferably comprises the ah:iinO acid sequence of SEQ ID NO: 26 or an allelic variant thereof; or a fragment thereof that has beta-glucosidase activity. In a preferred aspect, the polypeptide comprises the amino acid sequence of SEQ ID NO: 26. In another preferred aspect, the polypeptide comprises the mature polypeptide of SEQ ID NO: 26. In another prefenred aspect, the polypeptide comprises amino acids 20 to 860 of SEQ ID NO: 26, or an allelic variant thereof; or a fragment thereof that has beta-glucosidase activity, in another preferred aspect, the polypeptide comprises amino acids 20 to 860 of SEQ ID NO: 26. In another preferred aspect, the polypeptide consists of the amino acid sequence of SEQ ID NO: 26 or an allelic variant thereof; or a fragment thereof that has beta-glucosidase activity. In another preferred aspect, the polypeptide consists of the amino acid sequence of SEQ ID NO: 26. In another preferred aspect, the polypeptide consists of the mature polypeptide of SEQ ID NO: 26. In another preferred aspect, the polypeptide consists of amino acids 20 to 860 of SEQ ID NO: 26 or an allelic variant thereof; or a fragment thereof that has beta-glucosidase activity. In another preferred aspect, the polypeptide consists of amino acids 20 to 860 of SEQ ID NO: 26.
A polypeptide having beta-glucosidase activity preferably comprises the amino acid sequence of SEQ ID NO: 28 or an allelic variant thereof; or a fragment thereof that has beta-glucosidase activity. In a preferred aspect, the polypeptide comprises the amino acid sequence of SEQ ID NO: 28. In another preferred aspect, the polypeptide comprises the mature polypeptide of SEQ ID NO: 28. In another preferred aspeqt^th^ polypeptide comprises amino acids 20 to 860 of SEQ ID NO: 28, or an allelic variant thereof; or a fragment thereof that has beta-glucosidase activity. In another preferred aspect, the polypeptide comprises amino acids 20 to 860 of SEQ ID NO: 28. In another preferred aspect, the polypeptide consists of the amino acid sequence of SEQ ID NO: 28 or an allelic variant thereof; or a fragment thereof that has beta-glucosidase activity. In another preferred aspect, the polypeptide consists of the amino acid sequence of SEQ ID NO: 28. In another preferred aspect, the polypeptide consists of the mature
38

polypeptide of SEQ ID NO: 28. In another preferred aspect, the polypeptide consists of
amino acids 20 to 860 of SEQ ID NO: 28 or an allelic variant thereof; or a fragment
thereof that has beta-glucosidase activity. In another preferred aspect, the polypeptide
consists of amino acids 20 to 860 of SEQ ID NO: 28. ,
Preferably, a fragment of the mature polypeptide of SEQ ID NO: 16 contains at least 720 amino add residues, more preferably at least 760 amino add residues, and most preferably at least 800 amino acid residues. Preferably, a fragment of the mature polypeptide of SEQ ID NO: 18 contains at least 720 amino add residues, more preferably at least 760 amino acid residues, and most preferably at least 800 amino acid residues. Preferably, a fragment of the mature polypeptide of SEQ ID NO: 20 contains at least 770 amino acid residues, more preferably at least 800 amino acid residues, and most preferably at least 830 amino acid residues. Preferably, a fragment of the mature polypeptide of SEQ ID NO: 22 contains at least 720 amino acid residues, more preferably at least 760 amino acid residues, and most preferably at least 800 amino add residues. Preferably, a fragment of the mature polypeptide of SEQ ID NO: 24 contains at least 620 amino acid residues, more preferably at least 650 amino acid residues, and most preferably at least 680 amino acid residues. Preferably, a fragment of the mature polypeptide of SEQ ID NO: 26 contains at least 720 amino acid residues, more preferably at least 760 amino acid residues, and most preferably at least 800 amino add residues. Preferably, a fragment of the mature polypeptide of SEQ ID NO: 28 contains at least 720 amino acid residues, more preferably at least 760 amino acid residues, and most preferably at least 800 amino acid residues.
Preferably, a subsequence of the mature polypeptide coding sequence of SEQ ID NO: 15 contains at least 2160 nucleotides, more preferably at least 2280 nucleotides, and most preferably at least 2400 nucleotides. Preferably, a subsequence of the mature polypeptide coding sequence of SEQ ID NO: 17 contains at least 2160 nucleotides, more preferably at least 2280 nucleotides, and most preferably at least 2400 nucleotides. Preferably, a subsequence of the mature polypeptide coding sequence of SEQ ID NO: 19 contains at least 2310 nucleotides, more preferably at least 2400 nucleotides, and most preferably at least 2490 nucleotides. Preferably, a subsequence of the mature polypeptide coding sequence of SEQ ID NO: 21 contains at least 2160 nucleotides, more preferably at least 2280 nucleotides, and most preferably at IjbdSt 2400 nucleotides. Preferably, a subsequence of the mature polypeptide coding sequence of SEQ ID NO: 23 contains at least 1860 nucleotides, more preferably at least 1950 nucleotides, and most preferably at least 2040 nucleotides. Preferably, a subsequence of the mature polypeptide coding sequence of SEQ ID NO: 25 contains at least 2160 nucleotides, more preferably at least 2280 nucleotides, and most preferably
39

at least 2400 nucleotides. Preferably, a subsequence of the mature polypeptide coding sequence of SEQ ID NO: 27 contains at least 2160 nucleotides, more preferably at least 2280 nucleotides, and most preferably at least 2400 nucleotides.
In a preferred aspect, the polypeptide having beta-glucosidase activity is encoded by a polynucleotide obtained from an AspergSlus oryzae beta-glucosidase gene. In a most preferred embodiment, the polypeptide having beta-glucodidase activity is encoded by a polynucleotide obtained from an Aspergillus oryzae beta-glucosidaae gene comprising the mature polypeptide coding sequence of SEQ ID NO: 15 that encodes the mature polypeptide of SEQ ID NO: 16.
In another preferred aspect, the polypeptide having beta-glucosidase activity is encoded by a polynucleotide obtained from an Aspergillus oryzae beta-glucosldiise mutant gene. In a most preferred embodiment, the polypeptide having beta-gluco^ase activity is encoded by a polynucleotide obtained from an Aspergillus pryza$ }i0iCBtT glucosidase gene comprising the mature polypeptide coding sequence of SEQ ID NO: 17 that encodes the mature polypeptide of SEQ ID NO: 18.
In another preferred aspect, the polypeptide having beta-glucosidase activity is encoded by a polynucleotide obtained from an Aspergillus fumlgatus beta-glucosidase gene. In a most preferred embodiment, the polypeptide having beta-glucosidase activity is encoded by a polynucleotide obtained from an Aspergillus fumlgatus beta-glucosidase gene comprising the mature polypeptide coding sequence of SEQ ID NO: 19 that encodes the mature polypeptide of SEQ ID NO: 20.
In another preferred aspect, the polypeptide having beta-glucosidase activity is encoded by a polynucleotide obtained from a Perricllllum braslllanum strain IBT 20888 beta-glucosidase gene. In a most preferred embodiment, the polypeptide having beta-glucosidase activity is encoded by a polynucleotide obtained from a Penbllllum brasiliar)um strain IBT 20888 beta-glucosidase gene comprising the mature polypeptide coding sequence of SEQ ID NO: 21 that encodes the mature polypeptide of SEQ ID NO: 22.
In another preferred aspect, the polypeptide having beta-glucosidase activity is
encoded by a polynucleotide obtained from a Trtoht^rma reesel strain No. QM9414
beta-glucosidase gene. In another most preferred embodiment, the polypeptide having
beta-glucosidase activity is encoded by a polynucleotide obtained from a Trichoderma
reesei strain No. QM9414 beta-glucosidase gene comprising the mature polypeptide
coding sequence of SEQ ID NO: 23 that encodes the mature polypeptide of SEQ ID NO:
24 (GenBankTM accession no. U09580), *
In another preferred aspect, the polypeptide having beta-glucosidase activity Is encoded by a polynucleotide obtained from an Aspergillus n^er beta-glucosidase gene.
40

in a most preferred embodiment, the polypeptide having beta-glucosidase activity is encoded by a polynucleotide obtained from an Aspergillus niger beta-glucosidase gene comprising the mature polypeptide coding sequence of SEQ ID NO: 25 that encode^ the mature polypeptide of SEQ ID NO: 26 (GenBank^**' accession no. AJ13238@).
In another preferred aspect, the polypeptide having beta-glucosidase activity is encoded by a polynucleotide obtained from an Aspergillus aculeatus beta-glucosidase gene. In a most prefenred embodiment, the polypeptide having beta-glucosidase activity is encoded by a polynucleotide obtained from an Aspergillus aculeatus t>eta-glucosida8e gene comprising the mature polypeptide coding sequence of SEQ ID NO: 27 that encodes the mature polypeptide of SEQ ID NO: 28 (EMBL accession no. D64088).
The Aspergillus oryzae polypeptide having beta-glucosidase activity can be obtained according to WO 2002/095014. The AspergUus oryzae polypeptide variant having beta-glucosidase activity can be obtained according to WO 2004/069228. The Aspergillus fumigatus polypeptide having beta-glucosidase activity can be obtained according to WO 2005/047499. The PenlcHllum brasHlar)um polypeptide having beta-glucosidase activity can be obtained according to WO 2007/019442. T\\i Trhlioderma reese/strain No. QM9414 polypeptide having beta-giucosidase activity can be^obtained according to U.S. Patent No. 6,022,725. The AsperglHus n^er polypeptide having beta-glucosidase activity can be obtained according to Dan ef a/., 2000, J. Biol. Chem. 275: 4973-4980. The Aspergillus aculeatus polypeptide having beta-glucosidase activity can be obtained according to Kawaguchi et al., 1996, Gene 173:287-288.
In a preferred aspect, the mature polypeptide of SEQ ID NO: 16 is encoded by a polynucleotide contained in the plasmid which is contained in £ coH DSM 14240.^ In another preferred aspect, the mature polypeptide of SEQ ID NO: 20 is encoded by tfie polynucleotide contained in plasmid pEJG113 which is contained in E coll NRRL B-30695. In another preferred aspect, the mature polypeptide of SEQ ID NO: 22 is encoded by a polynucleotide contained in plasmid pKKAB which is contained in £. co// NRRLB-30860.
in a second aspect, isolated polypeptides having beta-glucosidase activity are encoded by polynucleotides comprising or consisting of nucleotide sequences that hybridi2e under at least very low stringency conditions, preferably at least low stringency conditions, more preferably at least medium stringency conditions, more preferably at least medium-high stringency conditions, even more preferably at least high stringency conditions, and most preferably at least very high stringency conditions with (1) the mature polypeptide coding sequence of SEQ ID NO: 15, SEQ ID NO: 17, SEQ ID NO: 19, SEQ ID NO: 21, SEQ ID NO: 23, SEQ ID NO: 25, SEQ ID NO: ?7. (11) the cDNA sequence contained in or the genomic DNA sequence comprisinq tlje n^ature
41
■ '^'i* ■ 'J*'

polypeptide coding sequence of SEQ ID NO: 15, SEQ ID NO: 17, SEQ ID NO: 19, SEQ ID NO: 21, SEQ ID NO: 23, SEQ ID NO; 25. SEQ ID NO: 27, (III) a subsequence of (i) or (ii), or (Iv) a full-length complementary strand of (i), (il), or (Hi) (J. Sambrook, E.F. Fritsch, and T. Maniatus, 1989, supra). A subsequence of the mature polypeptide coding sequence of SEQ ID NO: 15, SEQ ID NO: 17, SEQ ID NO: 19, SEQ IDNO: 21,SE'd itf NO: 23, SEQ ID NO: 25, SEQ ID NO: 27 contains at least 100 contiguous nucleotides ot preferably at least 200 contiguous nucleotides. Moreover, the subsequence may encode a polypeptide fragment that has beta-glucosidase activity.
The nucleotide sequence of SEQ ID NO: 15, SEQ ID NO: 17. SEQ ID NO: 19, SEQ ID NO: 21, SEQ ID NO: 23, SEQ ID NO: 25, SEQ ID NO: 27, or a subsequence thereof; as well as the amino acid sequence of SEQ ID NO: 16, SEQ ID NO: 18, SEQ ID NO: 20, SEQ ID NO: 22. SEQ ID NO: 24. SEQ ID NO: 26, or SEQ ID NO: 28, or a fragment thereof, may be used to design a nucleic acid probe to identify and clone DNA encoding polypeptides having beta-glucosidase activity from strains of different genera or species, as described supra.
For purposes of the present invention, hybridization indicates that the nucleotide sequence hybridizes to a labeled nucleic acid probe corresponding to tt)e.mature polypeptide coding sequence of SEQ ID NO: 15, SEQ ID NO: 17, SEQ ID NO: 19, SEQ ID NO: 21, SEQ ID NO: 23, SEQ ID NO: 25, SEQ ID NO: 27, the cDNA sequence contained in or the genomic DNA sequence comprising the mature polypeptide coding sequence of SEQ ID NO: 15, SEQ ID NO: 17, SEQ ID NO: 19, SEQ ID NO: 21, SEQ ID NO: 23, SEQ ID NO: 25, SEQ ID NO: 27, its full-length complementary strand, or a subsequence thereof, under very low to very high stringency conditions, as described supra.
In a preferred aspect, the nucleic acid probe is the, mature polypeptide bbdipg sequence of SEQ ID NO: 15. In another preferred aspect, the nucleic acid probe js nucleotides 58 to 2584 of SEQ ID NO: 15. In another preferred aspect, the nucleic acid probe is a polynucleotide sequence that encodes the polypeptide of SEQ ID NO: 16, or a subsequence thereof. In another preferred aspect, the nucleic acid probe is SEQ ID NO: 15. In another preferred aspect, the nucleic acid probe is the polynucleotide contained in E. coll DSM 14240, wherein the polynucleotide sequence thereof encodes a polypeptide having beta-glucosidase activity. In another preferred aspect, the nucleic acid probe is the mature polypeptide coding region contained in E. coli DSM 14240.
In another preferred aspect, the nucleic acid probe is the mature polypeptide coding sequence of SEQ ID NO: 17. In another preferred aspect, the nucleic acid probe is nucleotides 58 to 2584 of SEQ ID NO: 17. In another preferred aspect, the nucleic acid probe is a polynucleotide sequence that encodes the polypeptide of SEQ ID NO:
42

18, or a subsequence thereof. In another preferred aspect, the nucleic acid probe is SEQIDN0:17.
In another preferred aspect, the nucleic acid probe is the mature polypeptide coding sequence of SEQ ID NO: 19. In another prefen-ed aspect, the nucleic acid probe Is nucleotides 73 to 1413 of SEQ ID NO: 19. In another preferred aspect, the nucleic acid probe is a polynucleotide sequence that encodes the polypeptide of SEQ IP;N0: 20, or a subsequence thereof. In another preferred aspdct, the nucleic Icid probe is SEQ ID NO: 19. In another preferred aspect, the nucleic acid probe is'the polynucleotide contained in plasmid pEJGIIS which is contained in E. coll NRRL B-30695, wherein the polynucleotide sequence thereof encodes a polypeptide having beta-glucosidase activity. In another preferred aspect, the nucleic acid probe is the mature polypeptide coding region contained in plasmid pEJG113 which is contained in E CO//NRRLB-30695.
In another prefen'ed aspect, the nucleic acid probe is the mature polypeptide coding sequence of SEQ ID NO: 21. In another preferred aspect, the nucleic acid probe is nucleotides 67 to 1377 of SEQ ID NO: 21. In another preferred aspect, the nucleic acid probe is a polynucleotide sequence that encodes the polypeptide of SEQ ID NO: 22, or a subsequence thereof. In another preferred aspect, the nucleic acid probe is SEQ ID NO: 21. In another preferred aspect, the nucleic acid probe is the polynucleotide contained in plasmid pKKAB which is contained in £ coll NRRL B-30860, wherein the polynucleotide sequence thereof encodes a polypeptide having beta-glucosidase activity. In another preferred aspect, the nucleic acid probe is the mature polypeptide coding region contained in £ coll NRRL B«30860.
In another preferred aspect, the nucleic acid probe is the mature polypeptide coding sequence of SEQ ID NO: 23. In another preferred aspect, the nucleic acid probe is nucleotides 88 to 2232 of SEQ ID NO: 23. In another prefen'ed aspect, the nucleic acid probe is a polynucleotide sequence that encodes the polypeptide of SEQ ID'1^6: 24, or a subsequence thereof. In another prefenred aspect, the nucleic acid probe Is SEQ ID NO: 23.
In another preferred aspect, the nucleic acid probe Is the mature polypeptide coding sequence of SEQ ID NO: 25. In another preferred aspect, the nucleic acid probe is nucleotides nucleotides 59 to 2580 of SEQ ID NO: 25. In another prefeaed aspect, the nucleic acid probe is a polynucleotide sequence that encodes the polypeptide of SEQ ID NO: 26, or a subsequence thereof. In another preferred aspect, the nucleic acid probe is SEQ ID NO: 25.
In another preferred aspect, the nucleic acid probe is the mature polypeptide coding sequence of SEQ 10 NO: 27. In another preferred aspect, the nucleic acid probe
43

is nucleotides 59 to 2580 of SEQ ID NO: 27. In another preferred aspect, the'nucleic acid probe is a polynucleotide sequence that encodes the polypeptide of SEQ ID NO: 28, or a subsequence thereof. In another preferred aspect, the nucleic acid probe is SEQ ID NO: 27.
For long probes of at least 100 nucleotides in length, very low to very high stringency conditions are as defined herein.
For long probes of at least 100 nucleotides In length, the carrier material is finally washed as defined herein.
For short probes of about 15 nucleotides to about 70 nucleotides in length, stringency conditions are as defined herein.
For short probes of about 15 nucleotides to about 70 nucleotides in length, the carrier material is washed as defined herein.
In a third aspect, the polypeptides having beta-glucosidase activity are encoded by polynucleotides comprising or consisting of nucleotide sequences that have a degree of identity to the mature polypeptide coding sequence of SEQ ID NO: 15, SEQ ID NO: 17, SEQ ID NO: 19, SEQ ID NO: 21, SEQ ID NO: 23, SEQ ID NO: 25, SEQ ID NO: 27 of preferably at least 60%, more preferably at least 65%, more preferably at least 70%, more preferably at least 75%, more preferably at least 80%, more preferably at least 85%, even more preferably at least 90%, most preferably at least 95%, and even most preferably at least 96%, 97%, 98%, or 99%, which encode an active polypeptide.
In a sixth aspect, the polypeptides having'betii-giucosidase activity are artificial variants comprising a substitution, deletion, and/or insertion of one or more (or several) amino acids of the mature polypeptide of SEQ ID NO: 16, SEQ ID NO: 18, SEQ ID NO: 20. SEQ ID NO: 22, SEQ ID NO: 24, SEQ ID NO: 26. or SEQ ID NO: 28; or a homologous sequence thereof. Methods for preparing such artificial variants are described supra.
The total number of amino acid substitutions, deletions and/or insertions of the mature polypeptide of SEQ ID NO: 16. SEQ ID NO: 18, SEQ ID NO: 2q, SEQ it NO: 22, SEQ ID NO: 24. SEQ ID NO: 26, or SEQ ID NO: 28, is 10, preferably 9, more preferably 8, more preferably 7, more preferably at most 6, more preferably 5, more preferably 4, even more preferably 3, most preferably 2, and even most preferably 1.
In another preferred aspect, the polypeptide having beta-glucosidase activity is encoded by a polynucleotide obtained from a Family 1 beta-glucosidase gene.
In another preferred aspect, the polypeptide having beta-glucosidase activity is encoded by a polynucleotide obtained from a Family 3 beta-glucosidase gene.
In another preferred aspect, the polypeptide having beta-glucosidase activity is encoded by a polynucleotide obtained from a Family 5 beta-glucosidase gene.
44

Examples of other beta-glucosidases that can be used as soufcef for the polynucleotides in the present invention include, but are not limited to, an Aspergillus oryzae beta-glucosidase (WO 02/095014; WO 04/099228); Aspergillus acu/eatus beta-glucosidase (Kawaguchi et al., 1996, Gene 173: 287-288); Aspergillus avenaceus beta-glucosidase (GenBank^*^ accession no. AY943971); Aspengf^Z/us fumigatus beta-glucosidase (GenBank'T'^ accession no. XM745234); Aspergillus kawachli beta-glucosidase (GenBank™ accession no. AB003470); Aspergillus niger beta-glucosidase (GenBank™ AJ132386); Magnaporthe grfsea beta-glucosWase (GenBank'" accession no. AY849670); Phanerochaete chrysosporium beta-glucosidase (GenBank^»*acce6?lprt no. AB253327); Talaromyces emersonii beta-glucosidase (GenBank^^ accession no. AY072918), and Jrichoderma reesel beta-glucosklase (GenBank™ accesston nos. U09580, AB003110. AY281374, AY281375, AY281377. AY281378, and AY281379). Variants of beta-glucosidases may also be used as sources for the polynucleotides such as those described in WO 04/099228.
Other beta-glucosidases are disclosed in more than 13 of the Glycosyl Hydrolase families using the classification according to Henrissat B., 1991, A classificatbn of glycosyl hydrolases based on amino-acid sequence similarities, Biochem. J. 280: 309-316, and Henrissat B., and Bairoch A., 1996, Updating the sequence-based classification of glycosyl hydrolases, 5/oc/)0m. J. 316:695-696.
A polypeptide having beta-glucosidase activity may also be obtained from microorganisms of any genus. For purposes of the present invention, the terrr? "obtained from" is used as defined herein. In a preferred aspect, the polypeptide obtained fix>m a given source is secreted extracellularly.
A polypeptide having beta-glucosidase activity may be a bacterial polypeptkte. For example, the polypeptide may be a gram positive bacterial polypeptide such as a Bacillus, Streptococcus, Streptomyces, Stapliylococcus, Enterococcus, Lactobacillus, Lactococcus, Clostridium, Geobaciilus, or Oceanobacilius polypeptkte having beta-glucosidase activity, or a Gram negative bacterial polypeptide such as an E coil, Pseudomtinas, Salmonella, Campylobacter, Helicpbacter, Flavobac^rii^, Fusobacterium, liyobacter. Neisseria, or Ureaplasma polypeptide having beta-glucosidase activity.
In a preferred aspect, the polypeptkie \s a Bacillus aifialophiius. Bacillus amyloliquefaciens, Bacillus brevis, Bacillus circulans, Bacillus clausll, Bacillus coagulans, Bacillus finmus. Bacillus lautus. Bacillus lentus, Bacillus libbentformis, Bacillus megaterlum. Bacillus pumilus, Bacillus stearotiiermophilus, Bacillus subtUis, or Bacillus ttiuringiensis polypeptide having beta-glucosidase activity.
In another preferred aspect, the polypeptide is a Streptococcus equlsimilis.
45

streptococcus pyogenes. Streptococcus uberis, or Streptococcus equi subsp. Zooepidemicus polypeptide having beta-giucosidase activity.
In another preferred aspect, the polypeptide is a Streptamyces achmrmgenes, Streptomyces avermltilis, Streptamyces coelicolor, Streptamyces grisdius, or Streptomyces/Mdaws polypeptide having beta-glucosWase activity..
The polypeptide having beta-glucosidase activity may also be a fungal polypeptide, and more preferably a yeast polypeptide such as a Candida, Kluyveromyces, Pichia, Saccharomyces, Schizosaccharomyces, or Yarrowla polypeptide having beta-glucosidase activity; or more preferably a filamentous fungal polypeptide such as an Acremonium, Agaricus, Altemaria, Aspergillus, Aureobasidium, Botryospaeria, Ceriporiopsis, Chaetomldium, Ch/ysosporfum, Clavtoeps,'Coct^lobplu&'i Caprinopsis, Captotermes, Caryr\ascus, Cryphonectria, Cryptococcus, DIplodIa, Sxldla, Fillbasldium, Fusarium, GIbberella, Holomastigotoldes, Humicola, Irpex, LenVnuta, Leptospaeria, Magriaporthe, Melanocarpus, Mertpllus, Mucor, Myceliophthora, Neocallimastix, Neurospora, Paecilomyces, Peniclllium, Phanerochaete, PIromyces, Poitrasia, Pseudoplectariia, Pseudotrichonympha, Rhizomucor, Schizophyllum, Scytalidium, Talaromyces, Thermoascus, Thielavia, Tolypocladlum, Trichoderma, Thchopf)aea, Verticlllium, Volvariella, or Xylarla polypeptide having beta-glucosidase activity.
In a preferred aspect, the polypeptide is a Saccharomyces carlsbergenais, Saccharomyces cerevisiae, Saccharomyces dIastaOcus, Saccharomyces douglasii, Saccharomyces kluyveri, Saccharomyces norbensis, or Saccharomyces oviformis polypeptide having beta-glucosidase activity.
In another preferred aspect, the polypeptide is an Acremonium cBlltjitolyticus, Aspergillus aculeatus, Aspergillus awamorl, Aspergillus fumigatus, Aspergillus foetldus, Aspergillus japonicus, Aspergillus nidulans, Aspergillus niger, Aspergillus oryzae, Chrysasporium keratlnophllum, Chrysosporium lucknowense, Chrysosporium tropkium, Chrysospon'um merdarium, Chrysosporium Inops, Chrysosporium panntoola, Chrysosporium queensiandicum, Chrysosporium zonatum, Fusarium bactridkikies, Fusarium cerealis, Fusarium crookwellense, Fusarium culmorum, Fusarium graminearum, Fusarium graminum, Fusarium heterosporum, Fusarium nSguadi, Fusarium oxysporum, Fusarium mtbulatum, Fusarium roseum, Fusarium sambuclnum, Fusarium sarcochroum, Fusarium sporotrichtokies, Fusarium sulphureum, Fusarium torulosum, Fusarium trichothecioldes, Fusarium venenetum, Humtoola grisea, Humteola insalens, Humicola lanuginosa, Irpex lacteus, Mucor miehei, Myceltophthora thermophila, Neurospora crassa, Penksilllum funiculosum, Peniciiiium purpurogenum, Phanerochaete chrysosporium, Thielavia achromattoa, Thielavia albamyces, Thielavia
46

albopilosa, Thlelavla australeinsis, Thielavia fiimti, Thielavia microspora, Thielavia ovispora, Thielavia peruviana, Tlilelavia spededonium, Thielavia setosa, Thielavia subthermophila, Thielavia terrestris, Trichoderma harzianum, Tilchoderma koningii, Trichoderma longibrachiatum, Trichoderma reesei, Trichoderma viride, or Trighophaea saccate polypeptide having beta-glucosidase activity.
It will be understood that for the aforementioned species the invention encompasses both the perfect and imperfect states, and other taxonomic equivalents, e.g., anamorphs, regardless of the species name by which they are known. Those skilled in the art wilt readily recognize the identity of appropriate equivalents.
Strains of these species are readily accessible to the public in a number of culture collections, such as the American Type Culture Collection (ATCC), Deutsche Sammlung von IVIikroorganismen und Zellkulturen GmbH (D3M). Centraalbureaut'Vdor Schimmelcultures (CBS), and Agricultural Research Servtee Patent Culture Collectton, Northern Regional Research Center (NRRL).
Furthermore, such polypeptides may tte identified and obtained from other sources including microorganisms isolated from nature (e.g., soil, composts, water, etc.) using the above-mentioned probes, as described herein.
Polypeptides having beta-glucosidase activity also include fused polypeptides or cleavable fusion polypeptides in which another polypeptide is fused at the N-tenninus or the C-terminus of the polypeptide or fragment thereof having beta-glucosidase activity. A fused polypeptide is produced as described herein.
Polynucleotides comprising or consisting of nucleotide sequences that encode polypeptides having beta-glucosidase activity can be isolated and utilized to practtoe the methods of the present invention, as described herein.
The polynucleotides comprise or consist of nucleotide sequences that have a degree of identity to the mature polypeptide coding sequence of SEQ ID NO:'15, SEQ ID NO: 17. SEQ ID NO: 19. SEQ ID NO: 21, SEQ ID NO: 23, SEQ ID NO: 25, SEQ ID NO: 27 of preferably at least 60%, more preferably at least 65%, more preferably at least 70%, more preferably at least 75%, more preferably at least 80%, more preferably at least 85%, even more preferably at least 90%, most preferably at least 95%, and even most preferably at least 96%, 97%, 98%, or 99%, which encode a polypeptide having beta-glucosidase activity.
In another preferred aspect, the nucleotide sequence comprises'or consists of SEQ ID NO: 15. In another more preferred aspect, the nucleotide sequence comprises or consists of the sequence contained in E coll DSM 14240. In another preferred aspect, the nucleotide sequence comprises or consists of the mature polypeptide coding region of SEQ ID NO: 15. In another prefered aspect, the nucleotide sequence
47

comprises or consists of nucleotides 58 to 2584 of SEQ ID NO: 15. In another more preferred aspect, the nucleotide sequence comprises or consists of the mature polypeptide coding region contained in £ coli DSM 14240. The present invention also encompasses nucleotide sequences that encode a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO: 16 or the mature polypeptide thereof, which differ from SEQ ID NO: 15 or the mature polypeptide coding sequence theireof by virtue of the degeneracy of the genetic code. The' present invention also relates to subsequences of SEQ ID NO: 15 that encode fragments of SEQ ID NO: 16 that have beta-glucosidase activity.
In another preferred aspect, the nucleotide sequence comprises or consists of SEQ ID NO: 17. In another preferred aspect, the nucleotide sequence comprises or consists of the mature polypeptide coding region of SEQ ID NO: 17. In another preferred aspect, the nucleotide sequence comprises or consists of nucleotides 58 to 2584 of SEQ ID NO: 17. The present invention also' encompasses nucleotide sequences that encode a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO: 18 or the mature polypeptide thereof, which differ from SEQ ID NO: 17 or the mature polypeptide coding sequence thereof by virtue of the degeneracy of the genetic code. The present invention also relates to subsequences of SEQ ID NO: 17 that encode fragments of SEQ ID NO: 18 that have beta-glucosidase activity.
In another preferred aspect, the nucleotide sequervce comprises or consists of
SEQ ID NO: 19. In another more preferred aspect, the nucleotide sequence comprises
or consists of the sequence contained in plasmid pEJG113 which is contained in £ coli
NRRL 6-30695. In another preferred aspect, the nucleotide sequence comprises or
consists of the mature polypeptide coding region of SEQ ID NO: 19. In another
prefen-ed aspect, the nucleotide sequence comprises or consists of nucleotides 58 to
2580 of SEQ ID NO: 19. In another more preferred aspect, the nucleotide, sequence
comprises or consists of the mature polypeptidef coding region contained in plasmid
pEJGlia which is contained in £ coli NRRL B-30695. The present invention also
encompasses nucleotide sequences that encode a polyiseptide comprising or consisting
of the amino acid sequence of SEQ ID NO: 20 or the mature polypeptide thereof, which
differ from SEQ ID NO: 19 or the mature polypeptide coding sequence thereof by virtue
of the degeneracy of the genetic code. The present invention also relates to
subsequences of SEQ ID NO: 19 that encode fragments of SEQ ID NO: 20 that have
beta-glucosidase activity. , , ■ • M-,
In another preferred aspect, the nucleotide sequence comprises or condists of SEQ ID NO: 21. In another more preferred aspect, the nucleotide sequence comprises or consists of the sequence contained in plasmid pKKAB which is contained in E coli
48

NRRL B-30860. In another preferred aspect, the nucleotide sequence comprises or consists of the mature polypeptide coding region of SEQ ID NO: 21. In another preferred aspect, the nucleotide sequence comprises or consists of nucleotides 109 to 2751 of SEQ ID NO: 21. in another more preferred aspect, the nucleotide sequence comprises or consists of the mature polypeptide coding region contained in piasmid pKKAB which is contained in E. coli NRRL B-30860. The present invention also encompasses nucleotide sequences that encode a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO: 22 or the mature polypeptide thereof, which differ from SEQ ID NO: 21 or the mature polypeptide coding sequence thereof by virtue of the degeneracy of the genetic code. The present invention also relates to subsequences of SEQ ID NO: 21 that encode fragments of SEQ ID NO: 22 that have beta-glucosidase activity.
in another preferred aspect, the nucleotide sequence comprises or consists of SEQ ID NO: 23. In another preferred aspect, the nucleotide sequence comprises or consists of the mature polypeptide coding region of SEQ ID NO: 23. In ahothef preferred aspect, the nucleotide sequence comprises or consists of nucleotides 88 to 2232 of SEQ ID NO: 23. The present invention also encompasses nucleotide sequences that encode a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO: 24 or the mature polypeptide thereof, which differ from SEQ ID NO: 23 by virtue of the degeneracy of the genetic code. The present invention also relates to subsequences of SEQ ID NO: 23 that encode fragments of SEQ ID NO: 24 that have beta-giucosidase activity.
in another preferred aspect, the nucleotide sequence comprises or consists of SEQ ID NO: 25. In another preferred aspect, the nucleotide sequence comprises or consists of the mature polypeptide coding region of SEQ ID NO: 25. In another preferred aspect, the nucleotide sequence comprises or consists of nucleotides 88 to 2232 of SEQ ID NO: 23. The present invention also encompasses nucleotide sequences that encode a polypeptide comprising or consisting of ttie amino acid sequence of SEQ ID NO: 26 or the mature polypeptide thereof, which differ from SEQ ID NO: 25 by virtue of the degeneracy of the genetic code. The present invention also relates to subsequences of SEQ ID NO: 25 that encode fragments of SEQ ID NO: 26 that have beta-glucosidase activity.
In another preferred aspect, the nucleotide sequence comprises or consists of SEQ ID NO: 27. In another preferred aspect, the nucleotide sequence comprises or consists of the mature polypeptide coding region of SEQ ID NO: 27. in another preferred aspect, the nucleotide sequence comprises or consists of nucleotides 88 to 2232 of SEQ ID NO: 23. The present invention also encompasses nucleotide
49

sequences that encode a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO: 28 or the mature polypeptide thereof, which differ from SEQ ID NO: 27 by virtue of the degeneracy of the genetic co(de. The present invention also relates to subsequences of SEQ ID NO: 27 that encode fragments of SEQ ID NO: 28 that have beta-glucosidase activity.
The present invention also relates to mutant polynucleotides comprising at least one mutation in the mature polypeptide coding sequence of SEQ ID NO: 15, SEQ ID NO: 17, SEQ ID NO: 19. SEQ ID NO: 21, SEQ ID NO: 23, SEQ ID NO: 25, SEQ ID NO: 27, in which the mutant nucleotide sequence encodes the mature polypeptide of SEQ ID NO: 16. SEQ ID NO: 18, SEQ ID NO: 20, SEQ ID NO: 22, SEQ ID NO: 24, SEQ ID NO: 26, or SEQ ID NO: 28.
As described earlier, the techniques used to isolate or clone a polynucleotide encoding a polypeptide are Itnown in the art and include Isolatbn from genomic Dl^, preparation from cDNA, or a combination thereof
The polynucleotide may also be a polynucleotide comprising or consisting of a nucleotide sequence encoding a polypeptide having beta-giucosidase activity, that hybridizes under at least very low stringency conditions, preferably at least tow stringency conditions, more preferably at least medium stringency conditions, more preferably at least medium-high stringency conditions, even more preferably at least high stringency conditions, and most preferably at least very high stringency conditions with (i) the mature polypeptide coding sequence of SEQ ID NO: 15, SEQ ID NO: 17, SEQ ID NO: 19, SEQ ID NO: 21, SEQ ID NO: 23, SEQ ID NO: 25, SEQ ID NO: 27. (H) the cDNA sequence contained in or the genomic DNA sequence comprising the mature polypeptide coding sequence of SEQ ID NO: 15, SEQ ID NO: 17, SEQ ID NO: 19, SEQ ID NO: 21, SEQ ID NO: 23, SEQ ID NO: 25, SEQ ID NO: 27, (ill) a full-length complementary strand of (i) or (ii); or allelic variants and subsequences thereof (Sambrooi In each of the preferred aspects above, the mature polypeptide is amino acids 20 to 861 of SEQ ID NO: 16, amino acids 20 to 661 of SEQ ID NO: 18, amino BcldS 20 to 863 of SEQ ID NO: 22, amino acids 37 to 878 of SEQ ID NO: 22, amino iacid8:32 to 744 of SEQ ID NO: 24, amino acids 20 to 860 of SEQ ID NO: 26, or amino acids 20 to 860 of SEQ ID NO: 28, and the mature polypeptide coding sequence is nucleotides 58 to 2584 of SEQ ID NO: 15, nucleotides 58 to 2584 of SEQ ID NO: 17, nucleotides 58 to 2580 of SEQ ID NO: 19, nucleotides 109 to 2751 of SEQ ID NO: 21, nucleotides 88 to 2232 of SEQ ID NO: 23, nucleotides 59 to 2580 of SEQ ID NO: 25, or nucleotides 59 to 2580 of SEQ ID NO: 27.
50

Beta-Qlucosidase Fusion Polypeptides and Polynucleotides Thereof
The beta-glucosidase can also be in the form of a beta-glucosidase fusion protein. A beta-glucosidase fusion polypeptide is produced by fusing a nucleotide sequence (or a portion thereof) encoding a polypeptide having beta-glucosidase activity to a nucleotide sequence (or a portion thereof) encoding a polypeptide having endoglucanase activity and a nucleotide sequence encoding a signal peptide operabiy linked to the nucleotide sequence (or a portion thereof) encoding the polypeptide having endoglucanase activity. Techniques for producing fusion polypeptides are i(nown in the art, and include, for example, ligating the coding sequences encoding the polypeptides 80 that they are in frame and expression of the fused polypeptide is under control of the same promoter(8) and terminator. Fusion proteins'may also be constructed; usipg intein technology in which fusions are created post-translationally (Cooper et at., 1993, EMBO J. 12:2575-2583; Dawson etal., 1994, Science 266:776-779).
The fusion protein having beta-glucosidase activity comprising at least the catalytic domain of an endoglucanase linl(ed in frame to a signal peptide increases secretion of the fusion protein compared to the absence of at least the catalytic domain of the endoglucanase. The increase in secretion of the fusion protein having beta-glucosidase activity is at least 25%, preferably at least 50%, more preferably at ieasit 100%, even more preferably at least 150%, even more preferably at least 200%, most preferably at least 500%, and even most preferably at least 1000% compared to the absence of at least the catalytic domain of the endoglucanase.







Claims What is claimed is:
1. A filamentous fungal host cell, comprising: (a) a first polynucleotide encoding a native or heterologous polypeptide having cellulolytic enhancing activity; (b) a second polynucleotide encoding a native or heterologous beta-glucosldase or a fusion protein thereof; and (c) one or more (several) third polynucleotides encoding native or heterologous cellulolytic enzymes selected from the group consisting of a Trichoderma reesei cellobiohydrolase I (CEL7A), a Trichoderma reesei celloblohydrolase II (CEL6A), and a Tric/)oder/na reese/endoglucanase I (CEL7B), and orthologs or variants thereof.
2. The filamentous fungal host cell of claim 1, wherein the polypeptide having cellulolytic enhancing activity is selected from the group consisting of:

(a) a polypeptide having cellulolytic enhancing activity comprising [ILMV]-P-X{4.5)-G-X-Y-[ILMV]-X-R-X-[EQ]-X(4)-[HNQ] and [FW]-[TF]-K-[AIV], wherein X is any amino acid, X(4,5) is any amino acid at 4 or 5 contiguous positions, and X(4) is any amino acid,/ft 4 contiguous positions;
(b) a polypeptide having cellulolytic enhancing activity comprising [ILMV]-P-x(4,5)-G-x-Y-[ILMV]-x-R-x-[EQ]-x(3)-A-[HNQ], wherein x is any amino acid, x(4,5) is any amino acid at 4 or 5 contiguous positions, and x(3) is any amino acid at 3 contiguous positions;
wherein the polypeptide having cellulolytic enhancing activity comprising [iLI^\/]-P-X(4.5)-G-X-Y-[ILIVIV]-X-R-X-[EQ]-X(4HHNQ] and [FW]-[TF]-K-[AIV] optionally further comprises:
H-X{1,2)-G-P-X(3HYW]-[AILMV],
[EQ]-X-Y-X(2)-C-X-[EHQNHFILV]-X-[ILV],or
H-X(1,2)-G-P-X{3HYW]-[AILMV] and [EQ]-X-Y-X(2)-C-X-[EHQN]-[FILV]-X-[ILV], wherein X Is any amino acid, X(1,2) is any amino acid at 1 position or 2 contiguous positions, X(3) is any amino acid at 3 contiguous positions, and X(2) is any amino acid at' 2 dbhtiguous positions;
(c) a polypeptide comprising an amino acid sequence having preferably at least
60%, more preferably at least 65%, more preferably at least 70%, more preferably at least
75%, more preferably at least 80%, even more preferably at least 85%, most preferably at
I least 90%, and even most preferably at least 95% identity to the mature polypeptide of SEQ ID NO: 2, SEQ ID NO: 4, SEQ ID NO: 6, SEQ ID NO: 8, or SEQ ID NO: 10, SEQ ID NO: 12, or SEQ ID NO: 14;

(d) a polypeptide encoded by a polynucleotide which hybridizes under preferably at least medium stringency conditions, more preferably at least medium-high stringency conditions, and most preferably at least high stringency conditions with (i) the mature polypeptide coding sequence of SEQ ID NO: 1, SEQ ID NO: 3, SEQ ID NO: 5, SEQ ID NO: 7, SEQ ID NO: 9, SEQ ID NO: 11, or SEQ ID NO: 13, (ii) the cDNA sequence contained in the mature polypeptide coding sequence of SEQ ID NO: 1, SEQ ID NO: 3, SEQ ID NO: 5, or SEQ ID NO: 11, or the genomic DNA sequence comprising the mature polypeptide coding sequence of SEQ ID NO: 7, SEQ ID NO: 9, or SEQ ID NO: 13, or (iii) a full-length complementary strand of (i) or (ii);
(e) a polypeptide encoded by a polynucleotide comprising a nucleotide sequence having preferably at least 60%, more preferably at least 65%, more preferably at least 70%, more preferably at least 75%, more preferably at least 80%, even more preferalsly &t least 85%, most preferably at least 90%, and even most preferably at least 95% identity to the mature polypeptide coding sequence of SEQ ID NO: 1, SEQ ID NO: 3, SEQ ID NO: 5, SEQ ID NO: 7, SEQ ID NO: 9, SEQ ID NO: 11. or SEQ ID NO: 13;
(f) a variant comprising a substitution, deletion, and/or insertion of one or more (several) amino acids of the mature polypeptide of SEQ ID NO; 2, SEQ ID NO: 4, SEQ ID NO: 6, SEQ ID NO: 8, or SEQ ID NO: 10, SEQ ID NO: 12, or SEQ ID NO: 14; and
(g) a polypeptide having cellulolytic enhancing activity comprising or consistiiig of:, the mature polypeptide of SEQ ID NO: 2, SEQ ID NO: 4, SEQ ID NO: 6, SEQ ID NO: 8, or SEQ ID NO: 10, SEQ ID NO: 12, or SEQ ID NO: 14; or a fragment thereof having cellulolytic enhancing activity.

3. The filamentous fungal host cell of claim 1, wherein the beta-glucosidase fusion protein, comprises: (a) a first amino acid sequence comprising a signal peptide; (b) a second amino acid sequence comprising at least a catalytic domain of an endoglucanase; and (c) a third amino acid sequence comprising at least a catalytic domain of a beta-glucosidase; wherein the C-terminal end of the first amino acid sequence is linked in frame to the N-terminal end of the second amino acid sequence and the C-termlnal end of the second amino acid sequence is linked in frame to the N-terminal end of the third amino acid sequence.
4. The filamentous fungal host cell of claim 3, wherein the beta-glucosidase fusion protein comprises or consists of SEQ ID NO: 103 or SEQ ID NO: 105.

5. The filamentous fungal host cell of claim 1, wherein the heterologous beta-glucosidase is encoded by a polynucleotide obtained from an Aspergillus fumigatus beta-glucosidase gene and/or Penlcillium brasilianum beta-glucosidase gene.
6. The filamentous fungal host cell of claim 1, wherein the Trlchoderma reesei cellobiohydrolase I (CEL7A) comprises or consists of the mature polypeptide of SEQ ID NO: 52; the Trlchoderma reesei cellobiohydrolase II (CEL6A) comprises or consists of the mature polypeptide of SEQ ID NO: 54; and the Trlchoderma reesei endoglucanase I (CEL7B) comprises or consists of the mature polypeptide of SEQ ID NO: 56.
7. The filamentous fungal host cell of claim 1, wherein the orthologs or the variants of the Trlchoderma reesei cellobiohydrolase I (CEL7A) are selected from the group consisting of:

(a) a polypeptide comprising an amino acid sequence having preferably at least 60%, more preferably at least 65%, more preferably at least 70%, more preferably at least 75%, more preferably at least 80%, even more preferably at least 85%, most preferably at least 90%, and even most preferably at least 95% identity to the mature polypeptide of SEQ ID NO: 52;
(b) a polypeptide encoded by a polynucleotide which hybridizes under preferably at least nriedium stringency conditions, more preferably at least medium-high stringency conditions, and most preferably at least high stringency conditions with (i) the mature polypeptide coding sequence of.SEQ ID NO: 51, (ii) the cDNA sequence contaihed in or the genomic DNA sequence comprising the mature polypeptide coding sequence of SEQ ID NO: 51, or (ill) a full-length complementary strand of (i) or (il);
(c) a polypeptide encoded by a polynucleotide comprising a nucleotide sequence having preferably at least 60%, more preferably at least 65%, more preferably at least 70%, more preferably at least 75%, more preferably at least 80%, even more preferably at least 85%, most preferably at least 90%, and even most preferably at least 95% identity to the mature polypeptide coding sequence of SEQ ID NO: 51; and
(d) a variant comprising a substitution, deletion, and/or insertion of one or rtlbre (several) amino acids of the mature polypeptide of SEQ ID NO: 52; and
wherein the orthologs or the variants of the Trlchoderma reesei cellobiohydrolase II (CEL6A) are selected from the group consisting of:
(a) a polypeptide comprising an amino acid sequence having preferably at least 60%, more preferably at least 65%, more preferably at least 70%, more preferably at least 75%, more preferably at least 80%, even more preferably at least 85%, most preferably at least 90%, and even most preferably at least 95% Identity to the mature polypeptide of SEQ

ID NO: 54;
(b) a polypeptide encoded by a polynucleotide which hybridizes under preferably at least medium stringency conditions, more preferably at least medium-high stringency conditions, and most preferably at least high stringency conditions with (i) the mature polypeptide coding sequence of SEQ ID NO: 53, (ii) the cDNA sequence contained in or the genomic DNA sequence comprising the mature polypeptide coding sequence of SEQ ID NO: 53, or (iii) a full-length complementary strand of (i) or (ii);
(c) a polypeptide encoded by a polynucleotide comprising a nucleotide sequence having preferably at least 60%, more preferably at least 65%, more preferably at least 70%, more preferably at least 75%, more preferably at least 80%, even more preferably at least 85%, most preferably at least 90%, and even most preferably at least 95% identity to \h9* mature polypeptide coding sequence of SEQ ID NO: 53; and
(d) a variant comprising a substitution, deletion, and/or insertion of one or more (several) amino acids of the mature polypeptide of SEQ ID NO: 54; and
wherein the orthologs or the variants of the Trichoderma reesei endoglucanase I (CEL7B) are selected from the group consisting of:
(a) a polypeptide comprising an amino acid sequence having preferably at least 60%, more preferably at least 65%, more preferably at least 70%, more preferably at least 75%, more preferably at least 80%, even more preferably at least 85%, most preferably at least 90%, and even most preferably at least 95% identity with the mature polypeptide of SEQ ID NO: 56;
(b) a polypeptide encoded by a polynucleotide which hybridizes under preferably at least medium stringency conditions, more preferably at least medium-high stringency conditions, and most preferably at least high stringency conditions with (i) the mature polypeptide coding sequence of SEQ ID NO: 55, (ii) the cDNA sequence contained in or the genomic DNA sequence comprising the mature polypeptide coding sequence of SEQ ID NO: 55, or (iii) a full-length complementary strand of (i) or (ii);
(c) a polypeptide encoded by a polynucleotide comprising a nucleotide sequence having preferably at least 60%, more preferably at least 65%, more preferably at least 70%, more preferably at least 75%, more preferably at least 80%, even more preferably at least 85%, inost preferably at least 90%, and even most preferably at least 95% identity tpttie mature polypeptide coding sequence of SEQ ID NO: 55; and
(d) a variant comprising a substitution, deletion, and/or insertion of one or hiore (several) amino acids of the mature polypeptide of SEQ ID NO: 56.
8. The filamentous fungal host cell of any of claims 1-6, further comprising one or more (several) fourth polynucleotides encoding cellulolytic enzymes selected from the group
IMA

consisting of a Trichoderma reesei endoglucanase II (CEL5A), a Trichoderma reesei endoglucanase III (CEL12A), and a Trichoderma reesei endoglucanase V (CEL45A), and orthologs or variants thereof.
9. The filamentous fungal host cell of claim 8, wherein the Trichoderma reesei endoglucanase II (CEL5A) comprises or consists of the mature polypeptide of SEQ ID NO: 58; the Trichoderma reesei endoglucanase III (CEL12A) is the mature polypeptide comprising or consisting of SEQ ID NO: 60; and the Trichoderma reesei endoglucanase V (CEL45A) is the mature polypeptide comprising or consisting of SEQ ID NO: 61.
10. The filamentous fungal host cell of claim 8, wherein the orthologs or variants of the Trichoderma reesei endoglucanase II (CEL5A) are selected from the group consisting of:. ..

(a) a polypeptide comprising an amino acid sequence having prefei^ably at leasf 60%, more preferably at least 65%, more preferably at least 70%, more preferably at least' 75%, more preferably at least 80%, even more preferably at least 85%, most preferably at least 90%, and even most preferably at least 95% identity with the mature polypeptide of SEQ ID NO: 58;
(b) a polypeptide encoded by a polynucleotide which hybridizes under preferably at least medium stringency conditions, more preferably at least medium-high stringency conditions, and most preferably at least high stringency conditions with (1) the mature polypeptide coding sequence of SEQ ID NO: 57, (ii) the cDNA sequence contained in or the genomic DNA sequence comprising the mature polypeptide coding sequence of SEQ ID NO: 57, or (iii) a full-length complementary strand of (i) or (ii);
(c) a polypeptide encoded by a polynucleotide comprising a nucleotide sequence having preferably at least 60%, more preferably at least 65%, more preferably at least 70%, more preferably at least 75%, more preferably at least 80%, even more preferably at least 85%, most preferably at least 90%, and even most preferably at least 95% identity to the mature polypeptide coding sequence of SEQ ID NO: 57; and
(d) a variant comprising a substitution, deletion, and/or insertion of one or more (several) amino acids of the mature polypeptide of SEQ ID NO: 58;
wherein the orthologs or variants of ihe Trichoderma reesei endoglucanase III (CEL12A) are selected from the group consisting of:
(a) a polypeptide comprising an amino acid sequence having preferably at least 60%, more preferably at least 65%, more preferably at least 70%, more preferably at least 75%, more preferably at least 80%, even more preferably at least 85%, most preferably at least 90%, and even most preferably at least 95% identity with the mature polypeptide of SEQ ID NO: 60;


(b) a polypeptide encoded by a polynucleotide which hybridizes under preferably at least medium stringency conditions, more preferably at least medium-high stringency conditions, and most preferably at least high stringency conditions with (i) the mature polypeptide coding sequence of SEQ ID NO: 59, (ii) the cDNA sequence contained in or the genomic DNA sequence comprising the mature polypeptide coding sequence of SEQ ID NO: 59, or (iii) a full-length complementary strand of (i) or (11);
(c) a polypeptide encoded by a polynucleotide comprising a nucleotide sequence having preferably at least 60%, more preferably at least 65%, more preferably at least 70%, more preferably at least 75%, more preferably at least 80%, even more preferably at least 85%, most preferably at least 90%, and even most preferably at least 95% identity to the mature polypeptide coding sequence of SEQ ID NO: 59; and
(d) a variant comprising a substitution, deletion, and/or insertion of one or more (several) amino acids of the mature polypeptide of SEQ ID NO: 60; and
wherein the orthologs or variants of the Trichoderma reesel endoglucanase V (CEL45A) are selected from the group consisting of:
(a) a polypeptide comprising an amino acid sequence having preferably at least 60%, more preferably at least 65%, more preferably at least 70%, more preferably at least 75%, more preferably at least 80%, even more preferably at least 85%, most preferably at least 90%, and even most preferably at least 95% identity with the mature polypeptide of SEQ ID NO: 62;
(b) a polypeptide encoded by a polynucleotide which hybridizes under preferably at least medium stringency conditions, more preferably at least medium-high stringency conditions, and most preferably at least high stringency conditions with (i) the mature polypeptide coding sequence of SEQ ID NO: 61, (ii) the cDNA sequence contained in or the genomic DNA sequence comprising the mature polypeptide coding sequence of SEQ ID NO: 61, or (iii) a full-length complementary strand of (i) or (11);
(c) a polypeptide encoded by a polynucleotide comprising a nucleotide sequence having preferably at least 60%, more preferably at least 65%, more preferably at least 70%, more preferably at least 75%, more preferably at least 80%, even more preferably at least 85%, nribst preferably at least 90%, and even most preferably at least 95% identltysto the mature polypeptide coding sequeni:;^ of SEQ ID NO: 61; and
(d) a variant comprising a substitution, deletion, and/or insertion of ope or cnore (several).amino acids of the mature polypeptide of SEQ ID NO: 62.
11. The filamentous fungal host cell of claim 1, which produces a cellulolytic protein composition comprising a polypeptide having cellulolytic enhancing activity of the mature polypeptide of SEQ ID NO: 8; a beta-giucosidase fusion protein of SEQ ID NO: 106; a

Trichoderma reesei cellobiohyrfrolase I (CEL7A) of the mature polypeptide of SEQ ID NO: 52, a Trichoderma reese/cellobiohydrolase II (CEL6A) of the mature polypeptide of SEQ ID NO: 54, and a Trichoderma reesei endoglucanase I (CEL7B) of the mature polypeptide of SEQ ID NO: 56.
12. The filamentous fungal host cell of claim 10, which further produces one or more (several) enzymes selected from the group consisting of a Trichoderma reesei endoglucanase II (CEL5A) of SEQ ID NO: 58, a Trichoderma reesei endoglucanase V (CEL45A) of SEQ ID NO: 62, and a Trichoderma reese/endoglucanase III (CEL12A) of SEQ ID NO: 60.
13. The filamentous fungal host cell of any of claims 1-12, which further produces a Thielavia terrestris cellobiohydrolase of the mature polypeptide of SEQ ID NO: 64.
14. A method of producing a celluloiytic protein composition, comprising: (a) cultivating the host cell of any of claims 1-13 under conditions conducive for production of the composition; and (b) recovering the composition.
15. A celluloiytic protein composition, comprising: (a) a polypeptide having celluloiytic enhancing activity; (b) a beta-giucosidase or a fusion protein thereof; and (c) one or more (several) celluloiytic enzymes selected from the group consisting of a Trichoderma reesei cellobiohydrolase I (CEL7A), a Trichoderma reesei cellobiohydrolase II (CEL6A), and a Trichoderma reesei endoglucanase I (CEL7B), and orthologs or variants thereof.
16. The celluloiytic protein composition of claim 15, wherein the polypeptide having celluloiytic enhancing activity is selected from the group consisting of:

(a) a polypeptide having celluloiytic enhancing activity comprising [ILMV]-P-X(4,5)-G-X-Y-[ILMV]-X-R-X-tEQ]-X(4)-[HNQ] and [FW]-rrF]-K-[AIV], wherein X is any amino acid, X(4,5) is any amino acid at 4 or 5 contiguous positions, and X(4) is any amino acid at 4 contiguous positions;
(b) a polypeptide having celluloiytic enhancing activity comprising [JLMS/]-P-x(4,5)-G-x-Y-{ILMV]-x-R-x-[EQ]-x(3)-A-[HNQ], wherein x is any amino add, x(4,5) is any amino acid at 4 or 5 contiguous positions, and x(3) is any amino acid at 3 contiguous positions;
wherein the polypeptide having celluloiytic enhancing activity comprising {ILMV]-P-X(4,5)-G-X-Y-IILIVIV]-X-R-X-tEQ]-X(4)-[HNQ] and [FW]-[TF]-K-[AIV1 further comprises: H-X(1,2)-G-P-X(3)-[YW]-[AILMV],

[EQ]-X-Y-X(2)-C-X-[EHQN]-[FILVI-X-[ILV],or
H-X(1,2)-G-P-X(3)-{YW]-[AILMy] and [EQ]-X-Y-X(2)-C-X-[EHQN]-[FILV]-X-[ILV], wherein X is any amino acid, X(1,2) is any amino acid at 1 position or 2 contiguous positions, X(3) is any amino acid at 3 contiguous positions, and X(2) is any amino acid at 2 contiguous positions;
(c) a polypeptide comprising an amino acid sequence having preferably at least 60%, more preferably at least 65%, more preferably at least 70%, more preferably at least 75%, more preferably at least 80%, even more preferably at least 85%, most preferably at least 90%, and even most preferably at least 95% identity with the mature polypeptide of SEQ ID NO: 2, SEQ ID NO: 4, SEQ ID NO: 6, SEQ ID NO: 8, or SEQ ID NO: 10, SEQ ID NO: 12, or SEQ ID NO: 14;
(d) a polypeptide encoded by a polynucleotide which hybridizes under preferably at least medium stringency conditions, more preferably at least medium-high stringency conditions, and most preferably at least high stringency conditions with (i) the mature polypeptide coding sequence of SEQ ID NO: 1, SEQ ID NO: 3, SEQ ID NO: 5, SEQ ID NO: 7, SEQ ID NO: 9, SEQ ID NO: 11, or SEQ ID NO: 13, (11) the cDNA sequence contained in the mature polypeptide coding sequence of SEQ ID NO: 1, SEQ ID NO: 3, SEQ ID NO: 5, or SEQ ID NO: 11, or the genomic DNA sequence comprising the mature polypeptide coding sequence of SEQ ID NO: 7, SEQ ID NO: 9, or SEQ ID NO: 13, or (iii) a full-length complementary strand of (i) or (ii);
(e) a polypeptide encoded by a polynucleotide comprising a nucleotide sequence having preferably at least 60%, more preferably at least 65%, more preferably at least 70%, more preferably at least 75%, more preferably at least 80%, even more preferably at least 85%, most preferably at least 90%, and even most preferably at least 95% identity to the mature polypeptide coding sequence of SEQ ^D NO: 1, SEQ ID NO: 3, SEQ ID NO: 5, SEQ ID NO: 7, SEQ ID NO: 9, SEQ ID NO: 11, or SEQ ID NO: 13;
(f) a variant comprising a substitution, deletion, and/or insertion of one or more (several) amino acids of the mature polypeptide of SEQ ID NO: 2, SEQ ID NO: 4, SEQ ID NO: 6, SEQ ID NO: 8, or SEQ ID NO: 10, SEQ ID NOr 12, or SEQ ID NO: 14; and
(g) a polypeptide having cellulolytic enhancing activity comprising or consisting of the mature polypeptide of SEQ ID NO: 2, SEQ ID NO: 4, SEQ ID NO: 6, SEQ ID NO: 8, or SEQ ID NO: ,10, SEQ ID NO: 12, or SEQ ID NO: 14; or a fragment thereof having cellulolytic enhancing activity.
17. The cellulolytic protein composition of claim 15, wherein the beta-g|ucosidase fusion protein, comprises: (a) a first amino acid sequence comprising a signal peptide; (b) a second amino acid sequence comprising at least a catalytic domain of an endoglucanase; and (c) a
.; f^8 :.;;■

third amino acid sequence comprising at least a catalytic domain of a beta-glucosidase; wherein the C-terminal end of the first amino acid sequence is linked in frame to the N-terminal end of the second amino acid sequence and the C-terminal end of the second amino acid sequence is linked in frame to the N-temriinal end of the third amino acid sequence.
18. The cellulolytic protein composition of claim 17, wherein the beta-glucosidase fusion protein comprises or consists of SEQ ID NO: 103 or SEQ ID NO: 105.
19. The cellulolytic protein composition of claim 15, wherein the heterologous beta-glucosidase protein is derived from Aspergillus fumigatuse and/or Penicillium brasllianum.
20. The cellulolytic protein composition of claim 15, wherein the Thchoderma reesei cellobiohydrolase I (CEL7A) comprises or consists of the mature polypeptide of SEQ ID NO: 52; the Thchoderma reese; cellobiohydrolase II (CEL6A) comprises or consists of the mature polypeptide of SEQ ID NO: 54; and the Thchoderma reesei endoglucanase I (CEL7B) comprises or consists of the mature polypeptide of SEQ ID NO: 56.
21. The cellulolytic protein composition of claim 15, wherein the orthologs or the variants of the Thchoderma reesei cellobiohydrolase I (CEL7A) are selected from the group consisting of:

(a) a polypeptide comprising an amino acid sequence having preferably at least 60%, more preferably at least 65%, more preferably at least 70%, more preferably at least 75%, more preferably at least 80%, even more preferably at least 85%, most preferably at least 90%, and even most preferably at least 95% identity to the mature polypeptide of SEQ ID NO: 52;
(b) a polypeptide encoded by a polynucleotide which hybridizes under preferably at least medium stringency conditions, more preferably at least medium-high stringency conditions, and most preferably at least high stringency conditions with (i) the mature polypeptide coding sequence of SEQ ID NO: 51, (11) the cDNA sequence contained in or the genomic DNA sequence comprising the mature polypeptide coding sequence of SEQ ID N0;> 51, or (iii) a full-length complementary strand of (i) or (ii);,
(c) a polypeptide encoded by a polynucleotide cornprising a nucleotide sequence having preferably at least 60%, more preferably at least 65%, more preferably at least 70%, more preferably at least 75%, more preferably at least 80%, even more preferably at least 85%, most preferably at least 90%, and even most preferably at least 95% kJentity to the mature polypeptide coding sequence of SEQ ID NO: 51; and
JlfQ

(d) a variant comprising a substitution, deletion, and/or insertion of one or more (several) amino acids of the mature polypeptide of SEQ ID NO: 52; and
wherein the orthologs or the variants of the Thchoderma reesei cellobiohydrolase II (CEL6A) are selected from the group consisting of:
(a) a polypeptide comprising an amino acid sequence having preferably at least 60%, more preferably at least 65%, more preferably at least 70%, more preferably at least 75%, more preferably at least 80%, even more preferably at least 85%, most preferably at least 90%, and even most preferably at least 95% identity to the mature polypeptide of SEQ ID NO: 54;
(b) a polypeptide encoded by a polynucleotide which hybridizes under preferably at least medium stringency conditions, more preferably at least medium-high stringency conditions, and most preferably at least high stringency conditions with (i) the mature polypeptide coding sequence of SEQ ID NO: 53, (ii) the cDNA sequence contained in or the genomic DNA sequence comprising the mature polypeptide coding sequence of SEQ ID NO: 53, or (iii) a full-length complementary strand of (i) or (ii);
(c) a polypeptide encoded by a polynucleotide comprising a nucleotide sequence having preferably at least 60%, more preferably at least 65%, more preferably at least 70%, more preferably at least 75%, more preferably at least 80%, even more preferably at least 85%, most preferably at least 90%, and even most preferably at least 95% identity to the mature polypeptide coding sequence of SEQ ID NO: 53; and
(d) a variant comprising a substitution, deletion, and/or insertion of one or more (several) amino adds of the mature polypeptide of SEQ ID NO: 54; and
wherein the orthologs or the variants of the Trichoderma reesei endoglucanase I (CEL7B) are selected from the group consisting of:
(a) a polypeptide comprising an amino acid sequence having preferably at least 60%, more preferably at least 65%, more preferably at least 70%, more preferably at least 75%, more preferably at least 80%, even more preferably at least 85%, most preferably at least 90%, and even most preferably at least 95% identity with the mature polypeptide of SEQ ID NO: 56;
(b) a polypeptide encoded by a polynucleotide .which hybridizes under preferably at least medium stringency conditions, more preferably*at least medium-high stringency conditions, and most preferably at least high stringency conditions with (i) the mature polypeptide coding sequence of SEQ ID NO: 55, (ii) the cDNA sequence contained in or the genomic DNA sequence comprising the mature polypeptide coding sequence of SEQ IP NO: 55, or (iii) a full-length complementary strand of (i) or (Ii);
(c) a polypeptide encoded by a polynucleotide comprising a nucleotide sequence having preferably at least 60%, more preferably at least 65%, more preferably at least 70%,

more preferably at least 75%, more preferably at least 80%, even more preferably at least 85%, most preferably at least 90%, and even most preferably at least 95% identity to the mature polypeptide coding sequence of SEQ ID NO: 55; and
(d) a variant comprising a substitution, deletion, and/or insertion of one or more (several) amino acids of the mature polypeptide of SEQ ID NO: 56.
22. The celluloiytic protein composition of any of claims 15-21, further comprising one or more (several) fourth polynucleotides encoding celluloiytic enzymes selected from the group consisting of a Trichoderma reesei endoglucanase II (CEL5A), a Triclioderma reesei endoglucanase III (CEL12A), and a Trichoderma reesei endoglucanase V (CEL45A), and orthologs or variants thereof.
23. The celluloiytic protein composition of claim 22, wherein the Trichoderma reesei endoglucanase II (CEL5A) comprises or consists of the mature polypeptide of SEQ ID NO: 58; the Trichoderma reesei endoglucanase ill (CEL12A) is the mature polypeptide comprising or consisting of SEQ ID NO: 60; and the Trichoderma reesei endoglucanase V (CEL45A) is the mature polypeptide comprising or consisting of SEQ ID NO: 61.
24. The celluloiytic protein composition of claim 22, wherein the orthologs or variants of the Trichoderma reesei endoglucanase II (CEL5A) are selected from the group consisting of:

(a) a polypeptide comprising an amino acid sequence having preferably at least 60%, more preferably at least 65%, more preferably at least 70%, more preferably at least 75%, more preferably at least 80%, even more preferably at least 85%, most preferably at least 90%, and even most preferably at least 95% identity with the mature polypeptide of SEQ ID NO: 58;
(b) a polypeptide encoded by a polynucleotide which hybridizes under preferably aX least medium stringency conditions, more preferably at least medium-high stringency conditions, and most preferably at least high stringency conditions with (1) the mature polypeptide coding sequence of SEQ ID NO: 57, (11) the cDNA sequence contained in or the genomic DNA sequence comprising the mature polypeptide coding sequence of SEQ ID NO: 57, or (Hi) a full-length complementary strand of (i) or (ii);
(c) a polypeptide encoded by a polynucleotide comprising a nucleotide sequence having preferably at least 60%, more preferably at least 65%, more preferably at least 70%, more preferably at least 75%, more preferably at least 80%, even more preferably at least 85%, most preferably at least 90%, and even most preferably at least 95% identity to the mature polypeptide coding sequence of SEQ ID NO: 57; and
(d) a variant comprising a substitution, deletion, and/or insertion of one or more

(several) amino acids of the mature polypeptide of SEQ ID NO: 58;
wherein the orthologs or variants of the Trichoderma reesei endogiucanase III (CEL12A) are selected from the group consisting of:
(a) a polypeptide comprising an amino acid sequence having preferably at least 60%, more preferably at least 65%, more preferably at least 70%, more preferably at least 75%, more preferably at least 80%, even more preferably at least 85%, most preferably at least 90%, and even most preferably at least 95% identity with the mature polypeptide of SEQ ID NO: 60;
(b) a polypeptide encoded by a polynucleotide which hybridizes under preferably at least medium stringency conditions, more preferably at least medium-high stringency conditions, and most preferably at least high stringency conditions with (i) the mature polypeptide coding sequence of SEQ ID NO: 59, (ii) the cDNA sequence contained in or the genomic DMA sequence comprising the mature polypeptide coding sequence of SEQ ID NO: 59, or (iii) a full-length complementary strand of (i) or (ii);
(c) a polypeptide encoded by a polynucleotide comprising a nucleotide sequence having preferably at least 60%, more preferably at least 65%, more preferably at least 70%, more preferably at least 75%, more preferably at least 80%, even more preferably at least 85%, most preferably at least 90%, and even most preferably at least 95% identity to the mature polypeptide coding sequence of SEQ ID NO: 59; and
(d) a variant comprising a substitution, deletion, and/or insertion of one or nnore (several) amino acids of the mature polypeptide of SEQ ID NO: 60; and
wherein the orthologs or variants of the Trichoderma reesei endogiucanase V (CEL45A) are selected from the group consisting of:
(a) a polypeptide comprising an amino acid sequence having preferably at least 60%, more preferably at least 65%, more preferably at least 70%, more preferably at least 75%, more preferably at least 80%, even more preferably at least 85%, most preferably at least 90%, and even most preferably at least 95% identity with the mature polypeptide of SEQ ID NO: 62;
(b) a polypeptide encoded by a polynucleotide which hybridizes under preferably at least medium stringency conditbns, more preferably at least medium-high stringency conditions, and most preferably at least high stringency conditions with (i) the mature polypeptide coding sequence of.SEQ ID NO: 61, (ii) the cDNA sequence contained in or the genomic DNA sequence comprising the mature polypeptide coding sequence of SEQ ID NO: 61, or (ill) a full-length complementary strand of (i) or (II);
(c) a polypeptide encoded by a polynucleotide comprising a nucleotide sequence having preferably at least 60%, more preferably at least 65%, more preferably at least 70%, more preferably at least 75%, more preferably at least 80%, even more preferably at least


85%, most preferably at least 90%, and even most preferably at least 95% identity to the mature polypeptide coding sequence of SEQ ID NO: 61; and
(d) a variant comprising a substitution, deletion, and/or insertion of one or more (several) amino acids of the mature polypeptide of S£Q ID NO: 62.
25. The cellulolytic protein composition of claim 15, which comprises a polypeptide having cellulolytic enhancing activity of the mature polypeptide of SEQ ID NO: 8; a beta-glucosidase fusion protein of SEQ ID NO: 106; a Trichoderma reesei cellobiohydrolase I (CEL7A) of the mature polypeptide of SEQ ID NO: 52, a Trichoderma reesei ' cellobiohydrolase II (CEL6A) of the mature polypeptide of SEQ ID NO: 54, and a Trictioderma reesei endoglucanase I (CEL7B) of the mature polypeptide of SEQ ID NO: 56.
26. The cellulolytic protein composition of claim 25, which further comprises one or more (several) enzymes selected from the group consisting of a Trichoderma reesei endoglucanase II (CEL5A) of SEQ ID NO: 58, a Trichoderma reesei endoglucanase V (CEL45A) of SEQ ID NO: 62, and a Trichoderma reese/endoglucanase III (CEL12A) of SEQ ID NO: 60.
27. The cellulolytic protein composition of any of claims 15-26, which further comprises a Thielavia terrestris cellobiohydrolase of the mature polypeptide of SEQ ID NO: 64.
28. A method for degrading or converting a cellulose-containing material, comprising: treating the cellulose-containing material with the cellulolytic protein composition of any of claims 15-27.
29. A method for producing a fermentation product, comprising: (a) saccharifying a cellulose-containing material with the cellulolytic protein composition of any of claims 15-27; (b) fermenting the saccharified cellulose-containing material of step (a) with one or more (several) fermenting microorganisms to produce the fermentation product; and (c) recovering the fermentation product from the fermentation.

Documents:

http://ipindiaonline.gov.in/patentsearch/GrantedSearch/viewdoc.aspx?id=9TqefH7NFJmYvxDqeoC2wg==&loc=egcICQiyoj82NGgGrC5ChA==


Patent Number 277134
Indian Patent Application Number 7441/CHENP/2009
PG Journal Number 47/2016
Publication Date 11-Nov-2016
Grant Date 11-Nov-2016
Date of Filing 18-Dec-2009
Name of Patentee NOVOZYMES, INC
Applicant Address 1445 DREW AVENUE, DAVIS, CA 95618
Inventors:
# Inventor's Name Inventor's Address
1 CHERRY, JOEL 3319 MORRO BAY, DAVIS, CA 95618
2 MCFARLAND, KEITH 5319 COWELL BLVD., DAVIS, CALIFORNIA 95616
3 MERINO, SANDRA 1580 UNION SQUARE ROAD, WEST SACRAMENTO, CA 95691
4 TETER, SARAH 870 MENLO OAKS DRIVE, MENLO PARK, CA 95025
PCT International Classification Number C12N9/42
PCT International Application Number PCT/US08/65417
PCT International Filing date 2008-05-30
PCT Conventions:
# PCT Application Number Date of Convention Priority Country
1 60/941,251 2007-05-31 U.S.A.