Title of Invention

SYSTEM, METHOD AND MAIL SERVER FOR FILTERING OUT SPAM MAIL

Abstract Embodiments of the present invention disclose a system and method for filtering out a spam mail, the system includes: a mail client, configured to generate a learning library by performing feature learning for mail samples selected; a mail server, configured to consolidate the learning library from the mail client and an original feature library of the mail client, generate an up-to-date feature library of the mail client, and filtering mails corresponding to the mail client according to the up-to-date feature library of the mail client. A mail server is also disclosed. In the embodiments of the present invention, spam mails may be filtered out by the cooperation of the mail client and the mail server, so as to reduce the usage of network bandwidth, shorten the receiving time of mails, improve the identification probability of spam mails and reduce the misjudging rate of non-spam mails.
Full Text

System, Method and Mail Server for filtering out
Spam Mail
Field of the Invention
The present invention relates to spam mail processing technologies in the email field, and particularly to a system, method and mail server for filtering out a spam mail.
Background of the Invention
At present, the email serving as a measure for advertising and propagandizing is popular with more and more businessmen. Correspondingly, spam mails are flooding in networks increasingly, which not only occupies network bandwidth, but also wastes the processing time and system resources of email users, so as to hinder basic applications of networks.
An important technology for blocking spam mails is to filter emails. At present, emails may be filtered at a mail client or at a mail server. If emails are filtered at the mail client, the emails cannot be filtered until the mail client of each user receives the emails corresponding to the user. In this way, network bandwidth and the system resources of users are occupied greatly, the receiving time of mails is prolonged, and user experience is reduced. Emails may also be filtered at the mail server. However, because criteria for determining whether an email is a spam mail are different for various users, e.g., an advertisement mail may be needed for some users and may be a spam mail for other users, it is not appreciate to filter emails according to one criterion. If the emails are filtered according to one criterion, for some users, identification probability of spam mails may be reduced or the misjudging rate of non-spam mails may be increased.
Summary of the Invention
Embodiments of the present invention provide a system and method for filtering out a spam mail, which may not only reduce the usage of network bandwidth, but also guarantee a higher identification probability of spam mails and a lower misjudging rate of non-spam mails.

The embodiments of the present invention also provide a mail server applicable to the above system for filtering out a spam mail, so as to filter out a spam mail by the cooperation of the mail server and the mail client.
A system for filtering out a spam mail includes:
a mail client, configured to generate a learning library by performing feature learning for mail samples selected;
a mail server, configured to consolidate the learning library from the mail client and an original feature library of the mail client, generate an up-to-date feature library of the mail client, and filtering mails corresponding to the mail client according to the up-to-date feature library of the mail client.
A mail server includes:
a feature specification module, configured to consolidate a learning library from a mail client and an original feature library of the mail client, and generate an up-to-date feature library of the mail client;
a spam mail filter, configured to filter mails corresponding to the mail client according to the up-to-date feature library of the mail client.
A method for filtering out a spam mail includes:
generating, by a mail client, a learning library by performing feature learning for mail samples selected;
consolidating, by a mail server, the learning library received from the mail client with an original feature library of the mail client, generating an up-to-date feature library of the mail client, and filtering mails corresponding to the mail client according to the up-to-date feature library.
As can be seen, in the embodiments of the present invention, mails is filtered by the cooperation of the mail client and the mail server, so that spam mails may be filtered out at the mail server before the mails arrive at the mail clients of users, which reduces the usage of network bandwidth, shortens the receiving time of mails, and improves the user experience. In addition, since an up-to-date feature library corresponding to each user is set at the mail server, the mail server may filter emails of each user according to the up-to-date feature library corresponding to the user, which may improve the identification

probability of spam mails and reduce the misjudging rate of non-spam mails.
Brief Description of the Drawings
Figure 1 is a schematic diagram illustrating the structure of a system for filtering out a spam mail in accordance with an embodiment of the present invention.
Figure 2 is a flowchart illustrating a method for filtering out a spam mail in accordance with an embodiment of the present invention.
Detailed Description of the Invention
The present invention is hereinafter described in detail with reference to the accompanying drawings and embodiments.
Figure 1 is a schematic diagram illustrating the structure of a system for filtering out a spam mail in accordance with an embodiment of the present invention* As shown in Figure 1, the system for filtering out a spam mail includes: a mail learning module 11, a feature specification module 21 and a spam mail filter 22.
The mail learning module 11 is set at a mail client 10 and is used for performing feature learning for mail samples to generate an up-to-date learning library, and sending the up-to-date learning library to a mail server 20.
The mail samples may be some spam mails and non-spam mails which are selected automatically by the mail learning module 11 according to algorithms and strategies preset in the mail learning module 11 after the mail learning module 11 scans mails stored in the mail client 10 periodically according to mail management configuration, or are selected manually by a user. For example, when mail samples are selected manually, the user selects a set of mails, and clicks a button of "learning good mail" or "learning spam mail" on an operation panel of the mail client 10 to enable the mail learning module to perform the feature learning for the selected mail samples. In general, the mail learning module 11 may select the mail samples periodically, e.g., may select the mail samples after the system starts per day, according to the mail management configuration, and performs the feature learning for the mail samples. The mail learning module 11 performs the feature learning for the mail samples to generate a learning library, and sends the

generated learning library to the mail server 20. It should be noted that, the mail learning module 11 should scan the mails at the time set by the mail management configuration, at which the malls are allowed to be scanned.
The mail management configuration is generated automatically after the installation of the mail client 10 and may be modified by a user. In addition, the malls system administrator may also update the mail management configuration uniformly at the mail server 20 via this system. In this way, it is needless for a user to participating in the modification of the mail management configuration so that extra burden may be avoided for the use In order to improve the efficiency of the system for filtering out a spam mail and reduce the waste of system resources, the previously-scanned mails may not be scanned any more when the mail samples are selected automatically.
Generally, the mail client 10 generates a new learning library everyday, and sends the new learning library to the mail server 20. The mail server 20 returns a response indicating whether the new learning library is received or not. In some cases, the mail server 20 may be unable to receive the learning library from a client successfully because of various reasons, e.g. a network failure, so the mall client 10 needs to send the learning library again. Therefore, to perform fault tolerance processing, in the above system for filtering out a spam mail, a learning library memory 12 for storing learning libraries obtained within the latest period of time is set at the mail client, and sends a learning library to the mail server 20 after receiving the response indicating that the learning library is not received from the mail server 20 or when no response indicating that the learning library is received successfully from the mail server 20 within a period of time.
The feature specification module 21 of the above system for filtering out a spam mail is set at the mail server 20 and is used for receiving the up-to-date learning library from each mail client, consolidating the up-to-date learning library with an original feature library corresponding to the mail client 10 to form an up-to-date feature library of the mail client 10, and storing the up-to-date feature library at the mail server 20.
The spam mail filter 22 is set at the mail server 20 and is used for filtering new mails, which is received from a mail client 10 by the mall server 20, according to the up-to-date feature library corresponding to the mail client 10 to filter out spam mails of the mail

client 10.
The mail server 20 further includes a mail client inbox 24 for storing normal mails filtered by the spam mail filter 22, and waiting for mail clients corresponding to the normal mails receiving the normal mails from this inbox.
To store spam mails filtered out by the mail server 20, the mail server 20 further sets a spam mail recycle bin 23. In general, each mail client 10 has a directory or a folder labeled as a spam mailbox in the spam mail recycle bin 23 of the mail server 20. A user may check spam mails by Web Mail, so as to receive mails misjudged by the system.
At present, spam mail filtering methods based on mail content usually adopt a keyword statistic method. In practical applications, the Bayes filtering method is a most popular keyword-based mail filtering method and has the best filtering effect. In the Bayes filtering method, a certain known spam mails and non-spam mails are learned to form a Bayes learning library, and a mail is determined whether to be a spam mail according to the Bayes formula and the Bayes learning library. The Bayes filtering method has the ability of ceaselessly learning. In an embodiment of the present invention, the above system for filtering out a spam mall filters emails by use of the Bayes filtering method, specifically, the mail learning module 11 of the above system generates a Bayes learning library by learning features of emails; the feature specification module 21 generates an up-to-date feature library by use of the Bayes learning library from the mail learning module 11; the spam mail filter 22 determines whether a mail is a spam mail according to the Bayes formula and the up-to-date feature library.
The method for filtering emails in accordance with an embodiment of the present invention is hereinafter described in detail by taking a system for filtering out a spam mail using the Bayes filtering method as an example. As shown in Figure 2, the method of the present invention is as follows.
At first, a user needs to install and run a mail client 10 in the local computer.
Block 110: After the start of a mail client 10 or according to a preset interval, a mail learning module 11 scans mails in the mailbox to select mail samples automatically.
As described above, the mail samples may also be selected manually by a user.

Block 120: A mail learning module 11 performs the Bayes learning for the mail samples.
Block 130: The mail learning module 11 generates an up-to-date Bayes learning library.
Block 140: The mail learning module 11 sends the up-to-date Bayes learning library to a mail server 20 and stores the up-to-date Bayes learning library in a learning library memory 12 of the mail learning module 11.
Block 150: The mail server 20 receives the up-to-date Bayes learning library sent by each mail client 10,
Block 160: A feature specification module 21 consolidates the up-to-date learning library with an original feature library corresponding to the mail client 10 to form an up-to-date" feature library of the mail client 10.
Block 170: The mail server 20 receives and classifies new mails to determine which users the new mails belong to.
Block 180: The spam mail filter 22 filters new mails received by the mail server 20 from each mail client 10 according to the up-to-date feature library of the mail client 10, stores spam mails in the spam mail recycle bin 23 and stores non-spam mails in the mail client inbox 24 of the corresponding user at the server.
In this way, a mail client 10 may receive non-spam mails without spam mails when receiving mails.
In another embodiment of the present invention, after the mail server 20 stores spam mails in the spam mail recycle bin 23, the spam mail recycle bin 23 automatically generates a list including spam mail feature information such as addresser information and mail title, and stores the list in the corresponding mail client inbox 24. In this way, when receiving mails, the user may get information related to spam mails; if there is a misjudged mail, the user may receive the misjudged mail by WebMail, so as to improve the filtering ability of system.

In another embodiment of the present invention, a set of optimization algorithms may be used in the system, learning is performed when the system load is small, and the usage time of Central Processing Unit (CPU) is kept as a low value when learning. In this way, there is a small impact on a user system when mails of the user are learned and the burden of the user is reduced, so that the user is difficult to feel that this system is running.
According to the data statistic of some large mail systems in practical applications, when mails of a user are filtered according to the Bayes learning library of the user, the identification probability of spam mails exceeds 99.4%, and the misjudging rate of non-spam mails is lower than 0.8%. However, when 100 thousands users share one Bayes learning library, the identification probability of spam mails is under 90%, and the misjudging rate of non-spam mails exceeds 7%. As can be seen, the filtering effect when a user uses the Bayes learning library of the user is better than that when multiple users share the same Bayes library.
As can be seen, in the above embodiments, spam mails are filter out by the cooperation of the mail client 10 and the mail server 20. Each user has own learning library such as a Bayes learning library, spam mails are unnecessary to be received at the mail client 10, and non-spam mail sample of each user are learned so as to reduce the usage of network bandwidth, shorten the receiving time of mails and improve user experience. The system and method for filtering out a spam mail provided by the embodiments of the present invention may not only improve the identification probability of spam mails but also reduce the misjudging rate of non-spam mails.
The foregoing is only preferred embodiments of the present invention and is not for use in limiting the protection scope thereof, and for those skilled in the art, there may be various modifications and changes to the present invention. Any modification, equivalent substitution, and improvement without departing from the spirit and principle of the present invention should be covered in the protection scope of the present invention.













Claims
What is claimed is:
1. A system for filtering out a Spam mail, comprising:
a mail client, configured to generate a learning library by performing feature learning for mail samples selected;
a mail server, configured to consolidate the learning library from the mail client and an original feature library of the mail client, generate an up-to-date feature library of the mail client, and filtering mails corresponding to the mail client according to the up-to-date feature library of the mail client.
2. The system of Claim 1, wherein the mail client comprises:
a first module, configured to generate the learning library by performing the feature learning for the mail samples selected;
a second module, configured to send the learning library generated by the first module to the mail server.
3. The system of Claim I, wherein the mail server comprises:
a feature specification module, configured to consolidate the learning library from the mail client and the original feature library of the mail client, and generate the up-to-date feature library of the mail client;
a spam mail filter, configured to filter the mails corresponding to the mail client according to the up-to-date feature library of the mail client.
4. The system of Claim 3, wherein the mail server further comprises: means, configured to store a spam mail which is filtered out by the mail server.
5. The system of Claim 2, wherein the mail client further comprises: means, configured to store the learning library generated by the mail client;
the second module sends the learning library to the mail server again when the mail server does not receive the learning library.
6. A mail server comprising:
a feature specification module, configured to consolidate a learning library from a

mail client and an original feature library of the mail client, and generate an up-to-date feature library of the mail client:
a spam mail filter, configured to filter malls corresponding to the mail client according to the up-to-date feature library of the mail client*
7. The mail server of Claim 6, further comprising:
a spam mail recycle bin, configured to store spam mails filtered out by the spam mail filter.
8. The mail server of Claim 6, wherein the learning library is generated by the mail client by performing feature learning for mail samples selected.
9. A method for filtering out a spam mail, comprising:
generating, by a mail client, a learning library by performing feature learning for mail samples selected;
consolidating, by a mail server, the learning library received from the mail client with an original feature library of the mail client, generating an up-to-date feature library of the mail client, and filtering mails corresponding to the mail client according to the up-to-date feature library.
10. The method of Claim 9, wherein the mail samples comprise spam mails and non-spam mails filtered out by the mail client by scanning mails stored at the mail client periodically according to mail management configuration.
11. The method of Claim 9, wherein the mail samples comprise spam mails and non-spam mails filtered out manually by a user at the mail client.
12. The method of Claim 10, wherein scanning the mails stored at the mail client periodically comprises:
scanning the mails which are not scanned.
13. The method of Claim 9, wherein generating the learning library by performing
the feature learning for the mail samples selected comprises:
learning the mail samples by use of a Bayes filtering method, and generating a Bayes learning library.

14. The method of Claim 9, further comprising:
storing, by the mail server, spam mails filtered out in a spam mail recycle bin at the mail server, and storing non-spam mails in a inbox corresponding to the mail client at the mail server.
15. The method of Claim H, further comprising:
generating, by the mail server, a list containing spam mail feature information, and storing the list in the inbox corresponding to the mail client.
16. The method of Claim 9, further comprising:
storing, by the mail client, the learning library, and sending the learning library to the mail server again when the mail server does not receive the learning library.


Documents:

http://ipindiaonline.gov.in/patentsearch/GrantedSearch/viewdoc.aspx?id=I6cLMql//cGf7zGpHYQJ+w==&loc=egcICQiyoj82NGgGrC5ChA==


Patent Number 279292
Indian Patent Application Number 1248/CHENP/2008
PG Journal Number 03/2017
Publication Date 20-Jan-2017
Grant Date 17-Jan-2017
Date of Filing 13-Mar-2008
Name of Patentee TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED
Applicant Address 4/F, EAST 2 BLOCK, SEG PARK, ZHENXING ROAD, FUTIAN DISTRICT, SHENZHEN, GUANGDONG, 518044, CHINA
Inventors:
# Inventor's Name Inventor's Address
1 XU, JIAJIAN 4/F, EAST 2 BLOCK, SEG PARK, ZHENXING ROAD, FUTIAN DISTRICT, SHENZHEN, GUANGDONG, 518044, CHINA
2 LI, GUANG 4/F, EAST 2 BLOCK, SEG PARK, ZHENXING ROAD, FUTIAN DISTRICT, SHENZHEN, GUANGDONG, 518044, CHINA
3 KE, JUNYAN 4/F, EAST 2 BLOCK, SEG PARK, ZHENXING ROAD, FUTIAN DISTRICT, SHENZHEN, GUANGDONG, 518044, CHINA
4 FENG, XIAOYONG 4/F, EAST 2 BLOCK, SEG PARK, ZHENXING ROAD, FUTIAN DISTRICT, SHENZHEN, GUANGDONG, 518044, CHINA
PCT International Classification Number H04L12/58
PCT International Application Number PCT/CN06/02546
PCT International Filing date 2006-09-27
PCT Conventions:
# PCT Application Number Date of Convention Priority Country
1 200510037520.0 2005-09-27 China