Title of Invention

A PROCESS FOR CONSTRUCTING DNA BASED MOLECULAR MARKER FOR ENABLING SELECTION OF DRAUGHT AND DISEASES RESISTANCE GERMPLASM SCREENING

Abstract This invention relates to a process to constructing DNA-based molecular markers in plants to help identify genes and gene products that playa crucial role in detennining -how plants responds to a wide ngc of abiotic and biotic factors.
Full Text The present invention relates to a process tor constructing DNA-based molecular markers in plants to delect molecular markers for various kinds stress tolerance traits in plants using a bioinformatic method.
Background
Plants are exposed to various adverse environmental conditions such as drought, high salt and high/low temperature"etc. and to different kinds of pathogens during their life cycle. These environmental stimuli are commonly known as abiotic stress, Biotic stress on the other hand is caused by various pathogens found in the environment.
Plants respond to various kinds of stress by displaying complex, quantitative traits that involve the cumulative effect of several genes. The activation of response to any kind of stress recognition and inintiation of signal transduction processes finally result in a spatially and temporally regulated gene expression.
Numerous stress inducible proteins have been identified and their corresponding genes have been isolated and sequenced. Regulatory Elements of stress-modulated genes have also been deciphered, for example Abscisic Acid Responsive Element
(ABRE),
Recent developments in molecular biology and statistics along with application of information technology have opened the possibility of identifymg and using genomic variation and major genes for the improvement of commercially important crops. Application of marker based selection can be more effective in characteristics that are expressed late in plants or due to certain environmental conditions or affected by few genes.

When it is not possible to distinguish plant materials visually or by simple measurements, molecular markers can someiimes be used. The Molecular markers can used to easily discern phenotypic trails. These Molecular Markers are used as a probe a mark nucleus or chromosome. Molecular Markers may be applied for a number of purposes including determining :
- Genetic identity
- Parentage (maternity and paternity)
- Extended kinship
- Differentiation of geographic population
- Differentiation of close related relationship
- Phylogenetic relationship of species, family, genera, orders, phyla.
- Differentiation of Populations for various genetic traits like disease resistance, drought tolerance etc.
There are two general types of molecular markers available for use depending on the plant and the type of assay required ;
isoenzymes (isozymes) and
DNA-based markers
DNA -BASED MARKERS
DNA is the fundamemal molecule of heredity consisting a double helix of linked nucleotides. DNA based Molecular markers are small sequences of DNA which are-associated with or "linked" to regions in a plants DNA that are responsible for a specific trait (eg. disease resistance, yield, etc.).

There are various kinds of Conventional Markers used such as :
1. Restriction Fragment Length Polymorphism : Polymorphisms in the lengths of particular restriction fragments can be used as molecular markers. The DNA Molecule is fragmented using restriction endonuclease. Restriction endonucleases are prolein enzymes that recognize specific nucleotide sequences and cleave both strands of the DNA containing those sequences.
2. Random amplified polymorphic DNA The complexity of DNA is sufficiently high that by chance pairs of sites complementary to single octa-or decanucleotides may for amplification.

3. Microsatelliles : Polymorphisms in the lengths of tandemly repeated short sequences can be used as molecular markers
4. Single-Stranded Conformation Polymorphism (SSCP) : Polymorphisms in sequence, as well as in sequence length, can be used as molecular markers. The mobility in gel electrophoresis of double-stranded DNA of a given length is relatively independent of nucleotide sequence. In contrast, the mobility of single strands can vary considerably as a result of only small changes in nucleotide sequence. This fact led to the development of single-stranded conformation polymorphism (SSCP) techniques.
5. Single nucleotide Polymorphisms : Single nucleotide polymorphisms (SNP"s) can be used as Molecular markers

However the conventional metho !s of developing markers in the laboratory is a very tedious process.
Summary of the Invention
The objective of the present invention is to correlatc the occurrence of Motifs (highly conserved amino acid sequences) in various stress related proteins for molecular marker development.
Another objective is to identify a method for finding new markers from already existing sequences for the various kind of stress in plants.
Further objective is to classify these markers for the different kinds of abiotic and biotic stress the plant face.
To achieve the said objects, the present invention relates to a process for constructing DNA-based molecular markers in plants comprising:
identifying and selecting the gene sequences relating to stress from
available databases and literature
submitting the selected gene sequence for similarity search to obtain
other sequences from the database similar to the selected gene sequence
subjecting the sequences obtained from similarity search to multiple
alignment
removing redundant sequences if any, to get a data set of proteins
involved in biotic and abiotic stress response
picking blocks or motifs from the data set of proteins on basis of
statistical significance

subjecting the data set of proteins to Blockmaker to pick the same set of
blocks or motifs
analysing the motifs for the functionality
The invention can be used over a broad range of types of plants and organisms. Such plants inter alia includes cotton, maize, rice, soybeans, sugar beet, wheat, fruit, vegetables and vines. The major of use of the markers will be very useful to identify different varieties of plants that show stress tolerance.
The protein sequences are of length 8 and 18.
Detailed Description of the invention with the accompanying figures : Figure 1 displays the three motifs of the stress dataset along with the entropy plot, which is the n\easure of the information content at each position
Figure 2 shows the motifs are mapped on to the Mannose binding letcin
Table 1 shows the sequences details with their Swissprot codes.
Table 2 shows the details of the evaluation of the first motif
A Sequence analysis of stress related sequences, was done as follows:
Stress related sequences were downloaded from Swissprot and the PIR databases and a literature study of the sequences were carried out to pick a protein, which was well characterized experimentally to be involved in stress.

The salT gene of Oryza sativa was selected for further studies.
Example 1:
The salT protein was submitted for similarity search and around 65 proteins were obtained. 15 proteins were selected based on the threshold of 35% similarity and the set was reduced to 12 after removing the redundant sequences. The data set of the
twelve sequences consisted of proteins involved in various biotic and abiotic stress responses.
An analysis was conducted to discover potential regions of sequence homology between twelve biotic and abiotic stress-related genes. The homology analysis resulted in 3 non-overlapping motifs that were common to both biotic and abiotic stress-related genes.
A total of 113 new genes were identified. The annotation present for each of the genes supports the hypothesis that they are involved in stress-related response.
Multiple alignment and statistical significance
The length of sequences used for making the blocks or motifs are varied and the motifs do not occur in a specific position in all these sequences. Besides, since the proteins are made up of only 20 amino acids, a statisucai analysis is done to check whether the identified motif has occurred by chance, or whether its presence in the sequence is of any significance. The end result is of the probability of occurrence is as follows :
a. if the occurrence of this pattern is high then it is of no significance,
b. it the probability of occurrence is very low , then this probability has also a
biological significance.

The twelve sequences were then subjected to multiple alignment using clustalW. Three non-overlapping motifs were picked up manually by "eye". The statistical significance of blocks of similarity was evaluated using the MACAW (Multiple Alignment Construction and Analysis Workbench)
The same data set was submitted to Blockmaker and analysed for the presence of Blocks. The same sets of blocTcs were picked up by the Program.
Analysis of Motifs using MEME (Multiple Expectation Maximization for Motif Elicitation). The three strongest motifs in the set of 12 sequences of twelve divergent sequences were determined using MEME 2.0.
These motifs were used to generate a Position Specific Scoring Matrix (PSSM) in order to identify further stress-related genes from the public sequence databases. The Position Specific Scoring Matrix of the MEME output was then used to search the Genbank and Swissprot 39.4 using the MAST (Motif Alignment and search tool)
The three motifs map on to functionally important domains. The first motif relates to a common epitope and the third motif maps on lo an important N-glycosylation site.
Motif listings:
1 18 VrrSLTFKTNKKTYGPFG
2 8 GPWGGNGG
3 16 IVGFFGRSGWYLDAIG

REFERENCES:
1. Thompson, J. D., Higgins, D.G. and Gibson, T.J. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice. Nucleic Acids Research, 22:4673-4680.
2. Schuler, G.D., Altschul, S.F, Lipman, D.J. (1991) A workbench for multiple alignment construction and ai;alysis. Proteins: Struclure, Function and Genetics 9:180-190,
3. http://blocks.fhcrc.org/blocks/blockmkr/make_blocks.html
4. Henikoff, S., Heoikoff, J.G, Alford, W.J, and Pietrokovski, S. (1995), Automated construction and graphical presentation of protein blocks from unaligned sequences. Gene 163:GC17-26,
5. Timothy L. Bailey and Charles Elkan, "l-"itting a mixture model by expectation maximization to discover motifs in biopolymers", Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, pp. 28-36, AAAI Press, Menlo Park, California, 1994.
6. http://meme.sdsc.edu/meme/wcbsite/masl.htmi
7. Timothy L. Bailey and Michael Gribskov, "Combining evidence using p-values: application to sequence homology searches", Bioinformatics, 14:1, pp. 48-54.
S.Tsuda.M (1979) Purification and characterisation of a lectin from rice.
J.Biochem. 86 : 1451-1461 9. Ko Hirano Tohru Teraoka, Homare Yamanaka, Akane Harashima, Akiko
Kunisaki, Hideki Takashi and Daiiro Hosokawa Novel Mannose-Binding Rice
Lectin Composed of some Isolectins and its relation to a Stress-lnducible salT
Gene, Plant Cell Physiol. 41(3): 258-267 (2000)


We claim:
1. A process for constructing DNA-based molecular markers in plants comprising:
identifying and selecting the gene sequences relating to stress from
available databases and literature
submitting the seiccled gene sequence for similarity search to obtain
other sequences from the database similar to the selected gene sequence
subjecting the sequences obtained from similarity search to multiple
alignment
removing redundant sequences if any, to gel a data set of proteins
involved in biotic and abiotic stress response
picking blocks or motifs from the data set of proteins on basis of
statistical significance
subjecting the data set of proteins to Blockmaker lo pick the same set of
blocks or motifs
analysing the motifs "or the functionality
2. A process for constructing molecular markers as claimed in claim I wherein the gene selected is that of Oryza sativa
3. A process for constructing molecular markers as claimed in claim I wherein the database used is Swissprot and PIR
4. A process for constructing molecular markers as claimed in claim 1 wherein the software used to subject the sequences to multiple alignment is clustalW

5. A process for constructing molecular markers as claimed in claim 1 wherein the software used to conduct the similarity search is Multiple Alignment Construction and Analysis Workbench (MACAW)
6. A process for constructing molecular markers as claimed in claim 1 wherein the software used for marking blocks are the Blockmakers
7. A process for constructing molecular markers as claimed in claim 1 wherein the motifs are analyzed using Multiple Expectation Maximization for Motif Elicitation (MEME)
8. A process for constructing molecular markers as claimed in claim 1 wherein the amino acid sequence or the notif in the isolated protein sequences are 8 to 18
9. A process for constructing molecular markers as claimed in claim 1 wherein the motif 1 is VITSLTFKTNKKTYGPFG
10. A process for constructing molecular markers as claimed in claim 1 wherein the motif 2 is GPWGGNGG
11. A process for constructing molecular markers as claimed in claim 1 wherein themotif 3 is IVGFFGRSGNYLDAIG
12. A process for constructing molecular markers as claimed in claim 9 wherein the motif I relates to a common epitope

13. A process for constructing molecular markers as claimed in claim 11 wherein
the motif 3 maps an important n-glycosylation site
14. A process for constructing DNA-based markers molecular markers in plants
using bioinformatics methods substantially as herein described with reference to
the accompanying drawings.

Documents:

0749-che-2003 abstract duplicate.pdf

0749-che-2003 abstract.pdf

0749-che-2003 claims duplicate.pdf

0749-che-2003 claims.pdf

0749-che-2003 correspondance others.pdf

0749-che-2003 correspondance po.pdf

0749-che-2003 description(complete) duplicate.pdf

0749-che-2003 description(complete).pdf

0749-che-2003 drawings duplicate.pdf

0749-che-2003 drawings.pdf

0749-che-2003 form-1.pdf

0749-che-2003 form-19.pdf

0749-che-2003 form-26.pdf

0749-che-2003 form-3.pdf


Patent Number 221403
Indian Patent Application Number 749/CHE/2003
PG Journal Number 37/2008
Publication Date 12-Sep-2008
Grant Date 23-Jun-2008
Date of Filing 18-Sep-2003
Name of Patentee AVESTHA GENGRAINE TECHNOLOGIES PVT. LTD.
Applicant Address SUNBEAM 106 PRENDERGHAST ROAD, SECUNDERABAD -500 003
Inventors:
# Inventor's Name Inventor's Address
1 DR. PATELL VILLOO MORAWALA SUNBEAM 106 PRENDERGHAST ROAD, SECUNDERABAD 500 003,
2 JAGANNATHAN VIDYA 21, 3RD STREET, TATABAD, COIMBATORE 641 012,
PCT International Classification Number C12Q1/68
PCT International Application Number N/A
PCT International Filing date
PCT Conventions:
# PCT Application Number Date of Convention Priority Country
1 NA