Title of Invention | A METHOD FOR SERVING ADVERTISEMENTS BASED ON CONTENT |
---|---|
Abstract | Advertisers are permitted to put targeted ads on page on the web (or some other document of any media type). The present invention may do so by (i) obtaining content that includes available spots for ads, (ii) determining ads relevant to content, and/or (iii) combining content with ads determined to be relevant to the content. |
Full Text | SERVING ADVERTISEMENTS BASED ON CONTENT §0. RELATED APPLICATION Benefit is claimed, under 35 U.S.C. § 119(e)(1) and 35 U.S.C. § 120 , to the filing dates of: (i) U.S. Provisional Application Serial No. 60/413,536, entitled "METHODS AND APPARATUS FOR SERVING RELEVANT ADVERTISEMENTS", filed on September 24, 2002 and listing Jeffrey A. Dean, Georges R. Harik and Paul Buchheit as inventors; and (ii) U.S. Patent Application Serial No. 10/314,427, entitled "METHODS AND APPARATUS FOR SERVING RELEVANT ADVERTISEMENTS", filed on December 6, 2002 and listing Jeffrey A. Dean, Georges R. Harik and Paul Buchheit as inventors, for any inventions disclosed in the manner provided by 35 U.S.C. § 112, If 1. The provisional application and utility application are expressly incorporated herein by reference. §1. BACKGROUND OF THE INVENTION § 1.1 FIELD OF THE INVENTION The present invention concerns advertising. In particular, the present invention concerns expanding the opportunities for advertisers to target their ads. §1.2 RELATED ART Advertising using traditional media, such as television, radio, newspapers and magazines, is well known. Unfortunately, even when armed with demographic studies and entirely reasonable assumptions about the typical audience of various media outlets, advertisers recognize that much of their ad budget is simply wasted. Moreover, it is very difficult to identify and eliminate such waste. Recently, advertising over more interactive media has become popular. For example, as the number of people using the Internet has exploded, advertisers have come to appreciate media and services offered over the Internet as a potentially powerful way to advertise. Advertisers have developed several strategies in an attempt to maximize the value of such advertising. In one strategy, advertisers use popular presences or means for providing interactive media or services (referred to as "Web sites" in the specification without loss of generality) as conduits to reach a large audience. Using this first approach, an advertiser may place ads on the home page of the New York Times Web site, or the USA Today Web site, for example. In another strategy, an advertiser may attempt to target its ads to more narrow niche audiences, thereby increasing the likelihood of a positive response by the audience. For example, an agency promoting tourism in the Costa Rican rainforest might place ads on the ecotourism-trave! subdirectory of the Yahoo Web site. An advertiser will normally determine such targeting manually. Regardless of the strategy, Web site-based ads (also referred to as "Web ads") are typically presented to their advertising audience in the form of "banner ads"— i.e., a rectangular box that includes graphic components. When a member of the advertising audience (referred to as a "viewer" or "user" in the Specification without loss of generality) selects one of these banner ads by clicking on it, embedded hypertext links typically direct the viewer to the advertiser's Web site. This process, wherein the viewer selects an ad, is commonly referred to as a "click-through" ("Click-through" is intended to cover any user selection.). The ratio of the number of click-throughs to the number of impressions of the ad (i.e., the number of times an ad is displayed) is commonly referred to as the "click-through rate" of the ad. A "conversion" is said to occur when a user consummates a transaction related to a previously served ad. What constitutes a conversion may vary from case to case and can be determined in a variety of ways. For example, it may be the case that a conversion occurs when a user clicks on an ad, is referred to the advertiser's web page, and consummates a purchase there before leaving that web page. Alternatively, a conversion may be defined as a user being shown an ad, and making a purchase on the advertiser's web page within a predetermined time (e.g., seven days). Many other definitions of what constitutes a conversion are possible. The ratio of the number of conversions to the number of impressions of the ad (i.e., the number of times an ad is displayed) is commonly referred to as the conversion rate. If a conversion is defined to be able to occur within a predetermined time since the serving of an ad, one possible definition of the conversion rate might only consider ads that have been served more than the predetermined time in the past. Despite the initial promise of Web site-based advertisement, there remain several problems with existing approaches. Although advertisers are able to reach a large audience, they are frequently dissatisfied with the return on their advertisement investment. Some have attempted to improve ad performance by tracking the online habits of users, but this approach has led to privacy concerns. Similarly, the hosts of Web sites on which the ads are presented (referred to as "Web site hosts" or Bad consumers") have the challenge of maximizing ad revenue without impairing their users' experience. Some Web site hosts have chosen to place advertising revenues over the interests of users. One such Web site is "Overture.com", which hosts a so-called "search engine" service returning advertisements masquerading as "search results" in response to user queries. The Overture.com web site permits advertisers to pay to position an ad for their Web site (or a target Web site) higher up on the list of purported search results. If such schemes where the advertiser only pays if a user clicks on the ad (i.e., cost-per-click) are implemented, the advertiser lacks incentive to target their ads effectively, since a poorly targeted ad will not be clicked and therefore will not require payment. Consequently, high cost-per-click ads show up near or at the top, but do not necessarily translate into real revenue for the ad publisher because viewers don't click on them. Furthermore, ads that viewers would click on are further down the list, or not on the list at all, and so relevancy of ads is compromised. Search engines, such as Google for example, have enabled advertisers to target their ads so that they will be rendered in conjunction with a search results page responsive to a query that is relevant, presumably, to the ad. Although search result pages afford advertisers a great opportunity to target their ads to a more receptive audience, search result pages are merely a fraction of page views of the World Wide Web. Some have attempted to manually map Web pages to one or more categories based on a category taxonomy. Such manual classification of Web pages has numerous disadvantages. First, manual classification can be time consuming, expensive, and prone to inconsistent applications due to the subjectivity of different classifiers. Moreover, given the sheer number of Web pages and the fact that content changes so often, manual classification on a wide scale is impractical. Thus, it would be useful to allow advertisers to put targeted ads on any page on the web (or some other document of any media type) rather than just search results oaae Runh » crhAmo g^'iid avoid manual classifications and its inherent, often insurmountable disadvantages. § 2. SUMMARY OF THE INVENTION The present invention allows advertisers to put targeted ads on any page on the web (or some other document of any media type). The present invention may do so by (i) obtaining content that includes available spots for ads, (ii) determining ads relevant to content, and/or (iii) combining content with ads determined to be relevant to the content. § 3. BRIEF DESCRIPTION OF THE DRAWINGS Figure 1 is a high-level diagram showing parties or entities that can interact with an advertising system. Figure 2 is a bubble chart of an exemplary advertising environment in which, or with which, the present invention may operate. Figure 3 illustrates an environment in which advertisers can target their ads on search results pages generated by a search engine and/or documents served by content servers. Figure 4 is a bubble chart of exemplary content-relevant ad serving operations and information used or generated by such operations, consistent with the present invention. Figure 5 is a bubble chart of exemplary content-relevant ad serving operations, document information gathering operations, and information used or generated by such operations, consistent with the present invention. Figure 6 is a flow diagram of an exemplary method that may be used to get document information as a part of content-relevant ad serving operations in a manner consistent with principles of the invention. Figure 7 is a flow diagram of an exemplary method that may be used to effect targeted document information retrieval in a manner rvmoiot^f »»s*u principles or tne invention. Figure 8 is a flow diagram of an exemplary method that may be used to effect real-time document information retrieval in a manner consistent with principles of the invention. Figures 9A-9C illustrate parts of a Web page and various locations of script for extracting content of the Web page. Figure 10 is a flow diagram of an exemplary method that may be used to determine root document location in a manner consistent with principles of the present invention. Figure 11 is a high-level block diagram of apparatus that may be used to effect at least some of the various operations that may be performed and store at least some of the information that may be used and/or generated consistent with principles of the present invention. Figures 12 and 13 are messaging diagrams illustrating alternative ways to combine content-relevant ads with a document. § 4. DETAILED DESCRIPTION The present invention may involve novel methods, apparatus, message formats and/or data structures for allowing advertisers to put targeted, content-relevant ads on any page on the web (or some other document of any media type). The following description is presented to enable one skilled in the art to make and use the invention, and is provided in the context of particular applications and their requirements. Various modifications to the disclosed embodiments will be apparent to those skilled in the art, and the general principles set forth below may be applied to other embodiments and applications. Thus, the present invention is not intended to be limited to the embodiments shown and the inventor regards his invention as any patentable subject matter described. In the following, environments in which, or with which, the present Invention mav operate are described in § 4.1. Then, exemDlary^embodiments of the present invention are described in § 4.2. Examples of operations are provided in § 4.3 Finally, some conclusions regarding the present invention are set forth in § 4.4. § 4.1 ENVIRONMENTS IN WHICH, OR WITH WHICH, THE PRESENT INVENTION MAY OPERATE § 4.1.1 EXEMPLARY ADVERTISING ENVIRONMENT Figure 1 is a high level diagram of an advertising environment The environment may include an ad entry, maintenance and delivery system 120. Advertisers 110 may directly, or indirectly, enter, maintain, and track ad information in the system 120. The ads may be in the form of graphical ads such as so-called banner ads, text only ads, image ads, audio ads, video ads, ads combining one of more of any of such components, etc. The ads may also include embedded information, such as a link, meta information, and/or machine -6- executable instructions. Ad consumers 130 may submit requests for ads to, accept ads responsive to their request from, and provide usage information to, the system 120. Although not shown, other.entities may provide usage information (e.g., whether or not a conversion or click-through related to the ad occurred) to the system 120. This usage information may include measured or observed user behavior related to ads that have been served. One example of an ad consumer 130 is a general content server that receives requests for content (e.g., articles, discussion threads, music, video, graphics, search results, web page listings, etc.), and retrieves the requested content in response to, or otherwise services, the request. The content server may submit a request for ads to the system 120. Such an ad request may include a number of ads desired. The ad request may also include content request information. This information may include the content itself (e.g., page), a category corresponding to the content or the content request (e.g., arts, business, computers, arts-movies, arts-music, etc,), part or all of the content reauest. cnntp.nt anfi r.nntfint tvnfi ^fi n tF>yt menhirs virtpn anHin miv^H media, etc.), geolocation information, etc. The content server may combine the requested content with one or more of the advertisements provided by the system 120. This combined information including the content and advertisement(s) is then forwarded towards the end user that requested the content, for presentation to the viewer. Finally, the content server may transmit information about the ads and how, when, and/or where the ads are to be rendered (e.g., position, click-through or not, impression time, impression date, size, conversion or not, etc.) back to the system 120. Alternatively, or in addition, such information may be provided back to the system 120 by some other means. Another example of an ad consumer 130 is a search engine. A search engine may receive queries for search results. In response, the search engine may retrieve relevant search results (e.g., from an index of Web pages). An exemplary search engine is described in the article S. Brin and L. Page, "The Anatomy of a Large-Scale Hypertextual Search Engine," Seventh International World Wide Web Conference, Brisbane, Australia and in U.S. Patent No. 6,285,999 (both incorporated herein by reference). Such search results may include, for example, lists of Web page titles, snippets of text extracted from those Web pages, and hypertext links to those Web pages, and may be grouped into a predetermined number of (e.g., ten) search results. The search engine may submit a request for ads to the system 120. The request may include a number of ads desired. This number may depend on the search results, the amount of screen or page space occupied by the search results, the size and shape of the ads, etc. In one embodiment, the number of desired ads will be from one to ten, and preferably from three to five. The request for ads may also include the query (as entered or parsed), information based on the query (such as geolocation information, whether the query came from an affiliate and an identifier of such an affiliate), and/or information associated with, or based on: the search results. Such information may include, for example, identifiers related to the search results (e.g., document identifiers or wdoclDs"1. scores related to the search results /G_CL_ information retrieval flR"\ scores such as dot products of feature vectors corresponding to a query and a document, Page Rank scores, and/or combinations of IR scores and Page Rank scores), snippets of text extracted from identified documents (e.g., WebPages), full text of identified documents, feature vectors of identified documents, etc. The search engine may combine the search results with one or more of the advertisements provided by the system 120. This combined information including the search results and advertisement(s) is then forwarded towards the user that requested the content, for presentation to the user. Preferably, the search results are maintained as distinct from the ads, so as not to confuse the user between paid advertisements and presumably neutral search results. Finally, the search engine may transmit information about the ad and when, where, and/or how the ad was to be rendered (e.g., position, click-through or not, impression time, impression date, size, conversion or not, etc.) back to the system 120. Alternatively, or in addition, such information may be provided back to the system 120 by some other means. As can be appreciated from the foregoing, an ad entry, maintenance and delivery system(s) 120 may server ad consumers 130 such as content servers and search engines. As discussed in § 1.2 above, the serving of ads targeted to the search results page generated by a search engine is known. The present invention further permits the serving of ads targeted to documents served by content servers. For example, referring to the exemplary environment of Figure 3, a network or inter-network 360 may include an ad server 320 serving targeted ads in response to requests from a search engine 332 with ad spots for sale. Suppose that the inter-network 350 is the World Wide Web. The search engine 332 crawls much or all of the content 350. Some 334 of this content 350 will include ad spots (also referred to as "inventory") available. More specifically, one or more content servers 336 may include one or more documents 340. Documents may include content, embedded information such as meta information and machine executable instructions, and ad spots available. Note that ads inserted into ad spots in a document can vary each time the document is served. Alternatively, ads inserted into ad soots can have a static association with a given document. As will be described in more detail below, an ad server may use the results of a separate crawl of the some or all of the content with ad spots available 334. § 4.1.2 EXEMPLARY AD ENTRY, MAINTENANCE AND DELIVERY ENVIRONMENT Figure 2 illustrates an exemplary ad system 120', consistent with principles of the present invention. The exemplary ad system 120' may include an inventory system 210 and may store ad information 205 and usage information 245. The exemplary system 120' may support ad information entry and management operations 215, campaign (e.g., targeting) assistance operations 220, accounting and billing operations 225, ad serving operations 230, relevancy determination operations 235, optimization operations 240, relative presentation attribute assignment (e.g., position ordering) operations 250, fraud detection operations 255, and result interface operations 260. Advertisers 110 may interface with the system 120' via the ad information entry and management operations 215 as indicated by interface 216. Ad consumers 130 may interface with the system 120' via the ad serving operations 230 as indicated by interface 231. Ad consumers 130 and/or other entities (not shown) may also interface with the system 120' via results interface operations 260 as indicated by interface 261. An advertising program may include information concerning accounts, campaigns, creatives, targeting, etc. The term "account" relates to information for a given advertiser (e.g., a unique email address, a password, billing information, etc.). A "campaign* or Mad campaign" refers to one or more groups of one or more advertisements, and may include a start date, an end date, budget information, geo-targeting information, syndication information, etc. For example, Honda may have one advertising campaign for its automotive line, and a separate advertising campaign for its motorcycle line. The campaign for its automotive line have one or more ad groups, each containing one or more ads. Each ad group may include a set of keywords, and a maximum cost bid (cost per click-though, cost per conversion, etc.). Alternatively, or in addition, each ad group may include an average cost bid (e.g., average cost per click-through, average cost per conversion, etc.). Therefore, a single maximum cost bid and/or a single average cost bid may be associated with one or more keywords. As stated, each ad group may have one or more ads or "creatives" (That is, ad content that is ultimately rendered to an end user.). Naturally, the ad information 205 may include more or less information, and may be organized in a number of different ways. The ad information 205 can be entered and managed via the ad information entry and management operations 215. Campaign (e.g., targeting) assistance operations 220 can be employed to help advertisers 110 generate effective ad campaigns. For example, the campaign assistance operations 220 can use information provided by the inventory system 210, which, in the context of advertising for use with a search engine, may track all possible ad impressions, ad impressions already reserved, and ad impressions available for given keywords. The ad serving operations 230 may service requests for ads from ad consumers 130. The ad serving operations 230 may use relevancy determination operations 235 to determine candidate ads for a given request. The ad serving operations 230 may then use optimization operations 240 to select a final set of one or more of the candidate ads. Finally, the ad serving operations 230 may use relative presentation attribute assignment operations 250 to order the presentation of the ads to be returned. The fraud detection operations 255 can be used to reduce fraudulent use of the advertising system (e.g., by advertisers), such as through the use of stolen credit cards. Finally, the results interface operations 260 may be used to accept result information (from the ad consumers 130 or some other entity) about an ad actually served, such as whether or not click-through occurred, whether or not conversion occurred (e.g., whether the sale of an advertised item or service was initiated or consummated within a predetermined time from the rendering of the ad), etc. Such results information may be accepted at interface 261 and may include information to identify the-ad and time the ad was served as well as the associated result. §4.13 DEFINITIONS Online ads, such as those used in the exemplary systems described above with reference to Figures 1 and 2, or any other system, may have various features. Such features may be specified by an application and/or an advertiser. These features are referred to as "ad features" below. For example, in the case of a text ad, ad features may include a title line, ad text, executable code, an embedded link, etc. In the case of an image ad, ad features may additionally include images, etc. Depending on the type of online ad, ad features may include one or more of the following: text, a link, an audio file, a video file, an image file, executable code, embedded information, etc. When an online ad is served, one or more parameters may be used to describe how, when, and/or where the ad was served. These parameters are referred to as "serving parameters" below. Serving parameters may include, for example, one or more of the following: features of (including information on) a page on which the ad is served (including one or more topics or concepts determined to be associated with the page, information or content located on or within the page, information about the page such as the host of the page (e.g. AOL, Yahoo, etc.), the importance of the page as measured by e.g. traffic, freshness, quantity and quality of links to or from the page etc., the location of the page within a directory structure, etc.), a search query or search results associated with the serving of the ad, a user characteristic (e.g., their geographic location, the language they use, the type of browser used, previous page views, previous behavior), a host or affiliate site (e.g., America Online, Google, Yahoo) that initiated the request that the ad is served in response to, an absolute position of the ad on the page on which it is served, a position (spatial or temporal) of the ad relative to other ads served, an absolute size of the ad, a size of the ad "relative to other ads, a color of the ad, a number of other ads served, types of other ~ads served, time of day served, time of week served, time of year served, etc. Naturally, there are other serving parameters that may be used in the Context Of the invAntinn Although serving parameters may be extrinsic to ad features, they may be associated with an ad as conditions or constraints. When used as serving conditions or constraints, such serving parameters are referred to simply as "serving constraints". For example, in some systems, an advertiser may be able to specify that its ad is only to be served on weekdays, no lower than a certain position, only to users in a certain location, etc. As another example, in some systems, an advertiser may specify that its ad is to be served only if a page or search query includes certain keywords or phrases. uAd information" may include any combination of ad features, ad serving constraints, information derivable from ad features or ad serving constraints (referred to as "ad derived information"), and/or information related to the ad (referred to as "ad related information"), as well as an extensions of such information (e.g., information derived from ad related information). A "document" is to be broadly interpreted to include any machine-readable and machine-storable work product. A document may be a file, a combination of files, one or more files with embedded links to other files, etc.; the files may be of any typersuch as text, audio, image, video, etc. Parts of a document to be rendered to an end user can be thought of as "content" of the document. Ad spots in the document may be defined by embedded information or instructions. In the context of the Internet, a common document is a Web page. Web pages often include content and may include embedded information (such as meta information, hyperlinks, etc.) and/or embedded instructions (such as Javascript, etc.). In many cases, a document has a unique, addressable, storage location and can therefore be uniquely identified by this addressable location. A universal resource locator (URL) is a unique address used to access information on the Internet. "Document information" may include any information included in the document, information derivable from information included in the document (referred to as "document derived information"), and/or information related to the document (referred to as "document related information"), as well as an extensions of such information fe.a.. information fterivpH frnm ralntprl information;. An example or aocumeni aerivea inrormation is a classitication based on textual content of a document. Examples of document related information include document information from other documents with links to the instant document, as well as document information from other documents to which the instant document links. Content from a document may be rendered on a "content rendering application or device". Examples of content rendering applications include an Internet browser (e.g., Explorer or Netscape), a media player (e.g., an MP3 player, a Realnetworks streaming audio file player, etc.), a viewer (e.g., an Abobe Acrobat pdf reader), etc. w*arious exemplary embodiments of the invention are now described in Figure 4 is a bubble diagram of operations that may be performed and information that may be used or generated, in a manner consistent with the principles of the present invention. Content-relevant ad serving operations 410 may include relevance information extraction/generation operations 412, ad-document relevance information comparison operations 414 and ad(s)-document association operations 416. Responsive to a request 420, or some other trigger event or condition, the content-relevant ad serving operations 410 can extract and/or generate document relevance information 434 and ad relevance information 444. (See operations 412.) Alternatively, such relevance information may have been extracted and/or generated, or otherwise provided before receipt of the request 420. That is, as indicated by the dotted arrows in Figure 4, ad information and/or document information may be preprocessed to determined ad relevance information 444 and/or document relevance information 434. Exemplary techniques for extracting and/or generating document relevance information 434 and ad relevance information 444 are described in § 4.2.2 below. Then, the content-relevant ad serving operations 410 can compare document relevance information 434 for a given document (e.g., a document identified in request 420) 432 to ad relevance information 444 for one or more ads 442. (See operations 414.) Exemplary techniques for determining the relevance of ads to a document are described in § 4.2.3 below. As a result of such comparisons, the content-relevant ad servng operations 410 can generate associations of a document (e.g., via a document identifier or a request identifier associated with a document) with one or more ads (e.g., via the ad itself or an ad identifier). (See operations 416.) One such association 450 is shown. Exemplary techniques for associating one or more ads with a document are described in § 4.2.3 below. The content-relevant ad serving operations 410 may use stored data 430 which includes a document identifier (such as a URL for a Web page document for example) 432 and document relevance information 434. As indicated by the arrow 460. document relevance information 434 may be, or may have been, generated based on document information. Exemplary techniques for gathering document information are described in § 4.2.1 below. The content-relevant ad serving operations 410 may also use stored data 440 which includes a number of entries, each entry including an ad identifier 442 and ad relevance information 444. As indicated by the arrow 470, ad relevance information 444 may be, or more have been, generated based on ad information. Ultimately, one or more ads determined to be relevant to a document may be combined with the document to be served. Exemplary techniques for combining the one or more content-relevant ads with the document are described in § 4.2.4 below. INCREASING INVENTORY OF AD SPOTS -OBTAINING DOCUMENTS AND EXTRACTING AND/OR GENERATING RELEVANCE INFORMATION Referring to Figure 4, recall that document relevance information 434 is determined from document information. Various ways of obtaining document information are described in this section. Although many of the following examples are described in the context of Web page documents identified by a URL, the present i nvention is not limited to these examples. There are many ways to obtain the document information (e.g., Web page contents). First, for example, document information may be provided by a third party, such as a Web site host or ad consumer. Such provided document information may include the content (information) located within the document, or other information (e.g. a URL) that allows such information to be obtained. Second, document information (e.g. Web page contents) may be obtained during an ad request; for example, an end user's content rendering application (e.g., a browser) may be instructed to send document information (e.g., Web page contents) during an ad request, or the document information may be fetched, for example, as part of content relevant ad serving operations 410. Third, document information (e.g., Web page contents) may be pre-fetched (i.e., obtained before a specific request) for future content-relevant ad targeting. Moreover, other methods exist for obtaining document information, such as for example the methods disclosed in U.S. Patent Application Serial No. 10/113,796 titled "METHOD AND APPARATUS FOR INCREASING EFFICIENCY FOR ELECTRONIC DOCUMENT DELIVERY TO USERS" filed March 29, 2002, U.S. Patent Application Serial No. 09/734,886 titled "HYPERTEXT BROWSER ASSISTANT filed December 13, 2000, and U.S. Patent Application Serial No. 09/734,901 titled "SYSTEMS AND METHODS FOR PERFORMING IN-CONTEXT SEARCH ING* filed December 13, 2000, each of which is herein incorporated by reference. Figure 5 is a bubble diagram of an exemplary embodiment 500 of operations that may be performed and information that may be used or generated when obtaining documents for increasing ad inventory, in a manner consistent with the principles of the present invention. Content-relevant ad serving operations 510 serve requests for document information (or ad information) and mav include document information request distribution and reply combination operations 515. (Note that ad information, or ad relevance information, as well as operations such as relevance information extraction/generation operations 412, ad-document relevance information comparison operations 414 and ad(s)-document association operations 416 are not shown in Figure 5 to simplify the Figure.) These operations 515 may be used if multiple sources of available (pre-fetched) document information 520 (or ad information) are to be considered. Sources of document information may include one or more of cached document information 530, a larger set of "untargeted" document information 540, and a smaller set of "targeted" document information 550. Generally, a crawl (or some other manner of retrieval) of targeted documents will be "deeper" (e.g., crawl further down into the hierarchical Web pages of a Website) than an untargeted crawl, which may only perform a shallow crawl of a given Web site. As indicated by the arrows at the left margin of Figure 5, requests for document (or ad) information are advanced down the double arrow lines in the Figure, and replies responsive to such requests are advanced up the double arrow lines in the Figure. Documents with static information or relatively static information can be fetched in advance (pre-fetched), but may be fetched in real-time, for example on-demand in response to a request. On the other hand, it may be preferable to fetch documents with dynamic information in real-time, responsive to a request. § 4.2.1.1 PRE-FETCHING DOCUMENTS The cached document information 530 may include document information for recently and/or frequently requested documents. The larger set of "untargeted" document information 540 may have been built, and may be updated, using a search engine crawler 560. An exemplary search engine crawler 560 is described in U.S. Patent No. 6,285,999, which is incorporated herein by reference. Although information about a large set of documents is available, information about a particular document needed might not be available. In this case, in a so-called non-blocking implementation ot the present invention (where the content-relevant ad request serving Operations do not wait to get document information if it has not been previously obtained and presently stored), a request for ads for a document without available document information might be provided with so-called "house ads" (ads for the ad server itself, ads shown for free, or some other ads that don't generate revenue), or with random ads or generally well performing ads if ad revenue is based on a user action (e.g., a click-through or a conversion). (Note that if random ads or generally well performing ads are served in such an untargeted way, their performance statistics, if any, should not be affected. Alternatively, it may be desirable that, when a request for ads for a document without normally available document information is received, a "best guess" is made to estimate document information. Such an estimate might be made by, for example, examining the document's location within a directory structure and using information from the directory (categories) or from other documents in the same, similar, or higher (broader) or lower (narrower) classification. One could also examine a log of search queries that generated search results including or traffic to the document, and from the search queries discern alternative documents related to the document in question. It is further possible that, in such a situation, the Web site host of the document is contacted and provides the information. The smaller set of "targeted" document information 550 may be obtained and maintained in one or more of a number of ways. For example, targeted document information retrieval (e.g., crawling) operations 580 may be used to crawl particular content provider Websites, such as partner Websites 588. Some or all of the partner Websites may have been entered via content provider input interface operations 585. Alternatively, or in addition, a content provider, such as a Web publisher, can itself provide document information (e.g., Web pages or URLs of newly added Web pages) 550 directly via content provider input interface operations 585. A self service syndication method can allow content providers such as publishers to sign up to put content-relevant ads on their Website through a fas easy and standard process. One specific example of such a self service syndication method may support one or more of the following: (i) Publisher goes to login page/new user page, (ii) Publisher clicks on new user. (iii) Publisher fills out its name, who it wants the check written to, address where it wants the check sent, site domain, contact information, (social security number or tax id number, password with email login, etc.). This information may be reviewed against a standard checklist to ensure that the entered Website is a real Website. (iv) The entered Website may then be approved or denied, (v) Email may be sent to the publisher (vi) If approved, the publisher may be instructed to accept a service agreement and click on a link which takes them to a login page. (vii) Once logged in, the publisher can download a piece of code for a horizontal (486 x 60) or vertical (660 x120) ads with a unique identifier. In one embodiment, unique pieces of code are provided for different ad servers. (viii) Publisher may then put the code in their ad server. Other self service features may support: (i) Publisher can log into its account to see how much money it has earned. Reports may include date, page views, revenue earned, etc. (ii) Publisher may be given the option to include URLs they want to block for ads. (in) Publisher may be paid periodically (e.g., each month) for the ads shown on their Website, possibly subject to the ad being selected and/or a conversion. (iv) Publisher should have way to change their contact information. It is desirable to allow a content-relevant ad server administrator to: (i) See where a specific publisher is showing ads. (ii) Generate revenue per publisher/all publisher report for any timeframe. (iii) Mark publisher as fraudule nt. (iv) Mark who was paid. Figure 6 is a flow diagram of an exemplary method 600 that may be used to get document information as a part of content-relevant ad serving operations in a manner consistent with principles of the invention. The document identifier (e.g., URL) is accepted. (Block 610) It is then determined if the document relevance information is available. (Decision block 620) If the document relevance is available (referred to as a "hit"), the ad serving processing continues using the document relevance information. If, on the other hand, the document relevance information is not available, it is determined whether document information is available (e.g., in the cache 530, the main repository 540, and/or the CRAS repository 550). (Block 630) If so, document relevance information is extracted and/or generated using the document information (Block 640) and the ad serving processing continues. If not (referred to below as a "miss"), it may be determined whether or not the content provider (e.g., a partner) has documents that can be easily retrieved (e.g., crawled) or not. (Block 640) In the context of Web sites, a Web site may be considered to be difficult to crawl if (a) the content is dynamically assembled, (b) the content frequently changes or is frequently refreshed (e.g., news or stocks), and/or (c) the Web site has many alternatives (e.g., people finders). If the content provider is harder to crawl, and it has properly embedded script or links in their content, executable instructions (e.g., Javascript) may be used to get document information (Block 645) before the method 600 continues at block 640. If the content provider is easier to crawl, is is determined whether the content-relevant ad server is configured to use blocking or non-blocking ad serving. (Decision block 650) If the type is blocking, the document information is retrieved immediately (Block 660) and the method 600 either continues at block 640. If, on the other hand, the type is non-blocking, the document identifier Ce.a.. URU is stored (e.a.. to a loa of unfilled reauests 570) for later retrieval. (Block 670) Alternative ad serving may then be performed. (Block 675). Note that, if the document relevance information is not available, a "best guess" may also be used, as disclosed previously. Referring again to Figure 5, the targeted document information retrieval (e.g., crawling) operations 580 may then processes the logs of unfilled request(s) 570 (and identifiers, such as URLs, of (partner) content provider Web sites provided by an external source, such as a (partner) content provider Web site) and retrieves related document information into the CRAS repository 550 for future use. The targeted crawling operations 580 may also be used to pre-crawl Web pages for a given Website to "pre-warm" the CRAS repository 550. This helps to ensure that content-relevant ads will be available. Figure 7 is a flow diagram of an exemplary method 700 that may be used to effect targeted document information retrieval in a manner consistent with principles of the invention. In response to some trigger event 710, the document identifiers are accepted. (Block 730) For each document identifier (Loop 730-750), document information for the identified document is retrieved. (Block 740) In the case of Web page documents identified by URLs, the URLs of such Web pages may include information that varies across sessions used to distinguish different sessions on the same Web page, Such additional information, such as sessionids, shopperids, etc., are often appended to the URL However, when stripped on this additional information, a given URL will address the same Web page content. If session information were not removed from a URL, stored document information associated with the URL without the session information might not be found using the URL with the session information as a key. That is, even though the Web page content (or some other document information) is already available, it might be considered to be unavailable due to the session information in the URL. Document identifier (URL) rewrite operations 595 may be provided to strip such session information from URLs and make them canonical for purposes of providing search keys to store and lookup document information stored in the repositories 540,550 and the cache 530. The targeted document information retrieval operations 580 may work in cooperation and conjunction with the search engine crawler 560 (which may complete a crawl of the Web less frequently). For example, in one embodiment, it may be desired to have the targeted document information retrieval operations 580 be a Web crawler that works with a small number of Web pages per day (e.g., There is often content that can not be crawled. Dynamic Web pages, such as those generated using a search engine, are one such example. Other examples include pages generated by filling forms, personalized pages, pages that require a login and password, etc. Real-time document information extraction operations 590 may be used to extract contents of such Web pages, as well as Web pages that haven't been pre-fetched, but (the context of which) are needed. In one embodiment, the document information (e.g., contents) are extracted using embedded instructions (e.g., Javascript) included in a documenl More specifically, the embedded instructions (e.g., Javascript) may send some c all of the document information (e.g., content) to the content-relevant ad serving operations 410 to get one or more targeted ads for the dynamic document. "Interesting" document information to be extracted a Web page could include meta tags, headers, titles, etc. The content extraction and fetching occur in real-time. In one embodiment of the invention, Javascript is used in the context of a "TDroxyTTTTis"^ tags, header, titles, etc., from any Web pagait is available on. A target page could include the following Javascript as embedded instructions: |
---|
Patent Number | 222680 | |||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Indian Patent Application Number | 693/CHENP/2005 | |||||||||||||||||||||||||||
PG Journal Number | 47/2008 | |||||||||||||||||||||||||||
Publication Date | 21-Nov-2008 | |||||||||||||||||||||||||||
Grant Date | 20-Aug-2008 | |||||||||||||||||||||||||||
Date of Filing | 21-Apr-2005 | |||||||||||||||||||||||||||
Name of Patentee | GOOGLE, INC | |||||||||||||||||||||||||||
Applicant Address | 1600 AMPHITHEATRE PARKWAY, MOUNTAIN VIEW, CA 94043, | |||||||||||||||||||||||||||
Inventors:
|
||||||||||||||||||||||||||||
PCT International Classification Number | G06F 17/30 | |||||||||||||||||||||||||||
PCT International Application Number | PCT/US03/30233 | |||||||||||||||||||||||||||
PCT International Filing date | 2003-09-24 | |||||||||||||||||||||||||||
PCT Conventions:
|