Path: senator-bedfellow.mit.edu!bloom-beacon.mit.edu!newsfeed.sgi.net!news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!newsfeed.mathworks.com!newsfeed1.earthlink.net!nntp.earthlink.net!posted-from-earthlink!not-for-mail From: infobasic@earthlink.net Newsgroups: comp.infosystems.search Subject: Web and Internet Search Engine FAQ Date: Tue, 04 Jan 2000 17:49:13 GMT X-ELN-Insert-Date: Tue Jan 4 09:55:09 2000 X-Newsreader: Forte Free Agent 1.21/32.243 Organization: Infobasic Inc X-Posted-Path-Was: not-for-mail Lines: 653 NNTP-Posting-Host: ip68.pittsburgh5.pa.pub-ip.psi.net X-ELN-Date: 4 Jan 2000 17:49:41 GMT Message-ID: <38723269.85582609@news.earthlink.net> Xref: senator-bedfellow.mit.edu comp.infosystems.search:2244 Web and Internet Search Engine FAQ (WISE FAQ (copyright) 1997-1998-1999-2000) Copyright 1997-1998-1999-2000 Ken Bogucki krb@infobasic.com WISE FAQ (c) Ver. 4.1 Jan. 2000 ============================== A Windows 95/98 Search Engine Help File and Tutorial is available at http://infobasic.com Infobasic has set up a new and greatly expanded list of search and search engine resources at http://infobasic.com/sengine/index.shtml Please feel free to submit appropriate URLs to the Infobasic Directory For a list of General Search Engines go to: http:infobasic.com/se-gen.html For a list of Geo-Specific Search Engines go to: http:infobasic.com/se-geo.html For a list of Meta Search Engines go to: http:infobasic.com/se-meta.html ============================== All email queries, complaints or corrections addressed to krb@infobasic.com COPYRIGHT This FAQ is copyrighted material. The copyright is owned by the author of this FAQ, Ken Bogucki krbg@infobasic.com This FAQ may not be reproduced or distributed, in whole or in part, for commercial purposes without the express written permission of the author. This FAQ may be used for non-commercial purposes as long as the author is notified in advance, the entire FAQ is used without alterations (except for formatting purposes) and the copyright notice & warranty notice remain intact. WARRANTY. This FAQ is an AS-IS document. When necessary, double brackets [] are used in this FAQ for clarity. These brackets are not part of any search expression. Their only purpose is to separate the search words, expressions and results from the surrounding text. CONTENTS*** 1A Alta Vista http://www.altavista.digital.com 1A.1 Alta Vista Simple Searches 1A.2 Alta Vista Complex Searches 1A.1 Restricting A Simple and Complex Search 1A.4 Sorting Results by Ranking 1A.4.1 Simple Search Ranking 1A.4.2 Complex Search Ranking 1A.5 Misc. Information about Alta Vista 1B Excite http://www.excite.com 1B.1 Excite Concept Based Queries 1B.2 Excite Advanced Queries 1B.1 Excite Exact Match Queries 1C Lycos http://www.lycos.com 1C.1 Lycos Simple Searches 1C.2 Lycos Complex Searches 1D Infoseek http://www.infoseek.com 1D.1 Infoseek Simple Searches 1D.2 Infoseek Complex Searches 1E Web Crawler http://www.webcrawler.com 1E.1 Basic Searches 1E.2 Using Logical Word Operators 1F Yahoo http://www.yahoo.com 1F.1 Yahoo Menu/Simple Searches 1F.2 Yahoo Complex Searches 2.0 Quick Reference Card 2.1 Alta Vista 2.2 Excite 2.1 Lycos 2.4 Web Crawler 2.5 Yahoo 2.6 Infoseek **** 1A.0 ALTA VISTA SEARCH ENGINE http://www.altavista.digital.com Alta Vista is one of the more complex search engines. It may seem intimidating, however, for those with a serious interest or pressing need to find information, Alta Vista may be the place to go. Like other search engines, Alta Vista has simple and complex searches. It also contains several other options that allow the user to optimize their time and efforts. One is ordering your search results based on ranking (not necessarily confined to the original search criteria) and the ability to restrict the search to certain types and locations of Web pages. 1A.1 ALTA VISTA SIMPLE SEARCHES apples peaches "orange juice" : documents where only "apples" or "peaches" or the phrase "orange juice" appear. +apples +pears -"orange juice" : documents where only "apples" and "oranges" appear and not the phrase "orange juice". Wildcard Operator "*" app* : all documents that contain the words "apples", "applets", "appraise", etc. It will not find "applications" or "applicable". The "*" notation can only be used to represent a max. of 5 characters. The above Operators can be used in any combination. For example: +oranges -app* : documents that contain the word "oranges" but not the words "apples", "apply" and "applets", etc. 1A.2 ALTA VISTA COMPLEX SEARCHES There are two ways to construct an Alta Vista complex search. You can use either Logical Word Expressions or Logical Symbol Expressions in the search request. Alta Vista will interpret both types of logical expressions the same way. WORD EXPRESSION is the same as SYMBOL EXPRESSION ---------------------------------------------------- a AND b is the same as a & b a OR b is the same as a | b a NOT b is the same as a ! b a NEAR b is the same as a ~ b SPECIAL NOTE: Logical word and symbol expressions are precise search tools. The search expression... apple AND peach...will find "apple" and "peach" but not "apples" and "peaches". In Alta Vista, the complex search page contains an editing window 1 lines by 70 characters. This window allows you to viewand edit the entire complex search expression at one glance. AND apple AND orange : sites that contain the word "apple" as well as the word "orange", however, this expression will not display those sites that have "apples" and "oranges" in the same document. (See Special Note above) OR apple OR orange : sites that contain either the word "apple" or the word "orange". NOT apples NOT oranges : sites that contain the word "apples" but not the word "oranges" NEAR apple NEAR juice : will generate a list of pages where the word "juice" is within ten words of the word "apple". Note, the Alta Vista NEAR operator uses a default 10 word range. 1A.1 RESTRICTING A SIMPLE AND COMPLEX SEARCH This is a method of confining the Web search to certain pages or sites that meet specific criteria. [partial list] anchor:click-here : only search pages that contain the phrase "click-here" in the text of a hyperlink. applet: : only search pages that have the specified Java class applet in the applet tag of the Web page. domain:ie : only search pages that originate in the domain .ie (Ireland), or any of the other country codes and the miscellaneous standard codes, .com, .org, .mil, etc. host:xyz.com : only search those pages that reside at the host name xyz.com. image:apples.jpg : search those sites that contain the image tag, "apple.jpg". link:xyz.com : search those sites with a link to xyz.com. If you have a Web page and are curious about how many other pages carry a link to your page then run this search; link:www.yourhomepage.com. title:"Apples and Oranges" : search those pages that have "Apples and Oranges" in the title of the Web page. 1A.4 SORTING RESULTS BY RANKING Ranking results, simply, is a way to sort the results of your search. For example, if you use a complex search for "apples" and "oranges", you can instruct Alta Vista to sort the results so that those sites with the most references to "apples" appear first in the result list. Simple searches are sorted automatically by Alta Vista. 1A.4.1 Simple Search Ranking Alta Vista automatically uses a formula to sort the results of a simple query. Results are ranked according to the following criteria: 1. results score highest if the search criteria are meant in the first few words of a document 2. query words and phrases are found close to each other in a document 1. query words or phrases appear more than once in a Web document. 1A.4.2 Complex Search Ranking On the complex search page, there is a separate window for ranking. After establishing the search expression, go to the ranking window and insert those words (these words need not be the same words you used in the search expression) that will be used to sort the result list. For example, if your search expression is; "apples & oranges", you may then use the ranking window and include the word "California". The end result is that the search will produce all those documents that contain the word "apples" a nd the word "oranges" in the same document. With the ranking example above, Alta Vista will then sort the result list so that all documents that have a reference to "California" will appear first in the list. More than one word or phrase may be used in the ranking window. **** 1B EXCITE SEARCH ENGINE http://www.excite.com Excite uses several methods for finding the requested information. One is a concept based query, another is an advanced based query and the last is an exact match query. NOTE: Excite provides it's own relevancy rating. The user cannot directly change or alter this rating. Excite uses " " marks to indicate a phrase search, for example, "apple butter" will find those sites where the phrase --apple butter-- can be found but not those sites that list only the word apple. 1B.1 A concept based query utilizes the relationship between words and ideas to find matches. For example, in a concept based search the keyword "fruit" will yield "fruit", but also, "apples", "oranges", etc. Concept based queries rely on the user requesting information in the form of one or more keywords. 1B.2 ADVANCED BASED QUERIES In a Advanced based query the operators "+" and "-" are used. +apples +oranges : documents that have the word "apples" and the word "oranges" on the same page. -apples +oranges : documents that have the word "oranges" but not the word "apples". +apple -pears -tarts : documents that have the word "apple" but not the words "pears" or "tarts". This query will not return "apple tarts" but will return "apple turnovers". 1B.1 EXACT MATCH QUERIES Exact match queries use Logical Word Expressions to find documents. The logical word operators are: AND, OR, AND NOT plus (). Using the logical word operators will turn off Excite's concept based search. A keyword search for "fruit" will instruct Excite to search only for those sites that contain the word "fruit". Excite will display sites that contain related words like "apples", "oranges", etc. apples AND oranges : sites that contain both the words "apples" & "oranges" in the same document. apples OR oranges : sites that contain either the word "apples" or the word "oranges". apples AND NOT oranges : sites that contain only the word "apples" but not those sites that contain the word "oranges". () is an organizational operator. For example, "apples AND NOT(oranges OR peaches)" will produce sites that contain the word "apples" but not the words "oranges" or "peaches". **** 1C LYCOS SEARCH ENGINE http://www.lycos.com Lycos has two search levels, simple and complex. In the case of Lycos, the complex search function is menu driven and not difficult to use, however, because of its menu interface this Lycos search is somewhat more restrictive than other search engines. 1C.1 STANDARD SEARCH (Simple) Standard searches do not use Logical Word Operators. apples oranges peaches : will yield sites in which all three words appear [ - ] This is a restrictive operator. apples oranges -berries : all documents in which "apples" and "oranges" appear but not those pages where "berries" appear. If "apples", "oranges" and "berries" appear in the same document, this document will not appear in the search results. [ $ ] This is a wildcard operator. app$ : will yield all pages in which the words, "apples", "applications", "applets" appear. [ . ] This a delimiting tag. Searching for "apple" will yield "apples" and "apple", however, if the search were "apple." then only those documents with the word "apple" will be returned and not those pages with the word "apples". 1C.2 CUSTOM SEARCHES(Pro Search) Complex searches are done through a menu interface. All of this is fairly intuitive. Just a very brief explanation is required here. Everything that appears on the complex search page has a corresponding on screen example and explanation. **** 1D INFOSEEK http://www.infoseek.com Infoseek has two search options, simple and complex. Both search options provide only limited query syntax. Infoseek has no way to rank search results. However, Infoseek is fast and is more than suitable for those quick search needs. The site is low graphics and works well with text browsers. 1D.1 INFOSEEK SIMPLE SEARCHES Infoseek's simple searches use a combination of commas, plus and minus signs, quotes (to make phrase searches) and caps. apples oranges : will find pages with either "apples" or "oranges". +apples oranges : normally will return pages with just "apples", however, pages that contain "oranges" as well are acceptable. Those pages, however, will receive a lower ranking. "apple juice" : will display those pages where the words "apple" and "juice" appear next to each other. Caps are used to indicate proper names and a case sensitive search: Johnny Appleseed : will find only pages with the name "Johnny Appleseed". Johnny,Appleseed : will find pages with either name. Note: commas are only used to separate names. apples -grapes : will find pages with "apples" but not with the word "grapes". 1D.2 INFOSEEK COMPLEX SEARCHES There are only a few addition symbols that distinguish a complex query from a simple query. the pipe symbol [ | ] is used to construct a search within a set of search results. fruit | apple | juice : will find pages that refer to "fruit" then search out those pages within that result that contain the word "apple". Finally, the last group of results will be searched for any pages that contain the word "juice". title:fruit : will find any pages where the word "fruit" appears in the title of the web page. url:www.orange.com : will find those site that contain the address "www.orange.com". The search expression [ url:fruit ] will find those sites that have the word "fruit" in the URL, for example, "www.fruit.com". link:www.juice.com : will find those sites that are linked to the specified URL site:xyz.com : will bring up all the sites located at the specified address. **** 1E WEBCRAWLER http://www.webcrawler.com One of the better Web search engines is WebCrawler, simplybecause of its flexibility. 1E.1 BASIC SEARCHES apples oranges pineapples : will provide information on those documents that contain any of the words: "apples", "oranges", "pineapples". A simple search expression. 1E.2 USING LOGICAL WORD OPERATORS AND apples AND oranges : will provide information on documents where both the words "apples" and "oranges" appear. OR apples OR oranges : will display information on pages that contain either of the two search words. This is similar to the Simple Search example except that this search employees specific logical word operators. The first search could also be run as: apples OR oranges OR pineapples. NOT fruit NOT apples : displays information about "fruit" but not those pages that reference "apples". NEAR cheese NEAR/15 wine : will display those pages that contain the word "cheese" and is within 15 words of the word "wine". Note, you can specify any number of words in the NEAR operator, NEAR/20, NEAR/5, etc.. ADJ world ADJ war : will display Web pages that contain the word "world" immediately followed by the word "war" " " Quotes have the same effect as the ADJ command above: "world war" will provide the same results as: world ADJ war. () Parenthesis are used to organize complex search expressions. For example: (wine NEAR/10 cheese) AND apples or "California wine" AND prices NOT (white OR rose) **** 1F YAHOO http://www.yahoo.com Yahoo is one of the most intuitive search engines to use. There are two ways to search Yahoo, one is a very simple, menu driven search and the second is by use of logical word operators. However, this second search option is also a menu driven search. 1F.1 MENU/BASIC SEARCHES The Menu interface is easy to use and understand. Simply select the type of material you want to search (WEB, Usenet, etc.) and how the search should be conducted. Select how the results should be displayed, 20, 10, 40 per page and click the search button. 1F.2 MENU/ADVANCED SEARCHES [ + ] apples +oranges : those sites that have "apples" as well as "oranges" in the same document. [ - ] apples -oranges : those sites that have "apples" but not those sites that have "oranges". [ t: ] A restriction operator that will confine the search to Web page titles. For example, t:apples will restrict the search to pages with the word "apples" in the title of the page. It will not search a page if the page title is "Oranges". The correct usage of the "t:" operator in a search expression is [ +t:oranges +apples ] this expression will yield documents that have the word "apples" in the Web page and the word "oranges" in the Web page title. The expression, "+apples t:oranges" is incorrect. The "t :" operator must immediately precede the search word. [ u: ] A restrictive operator. Confines the search for the keywords to certain URLs. For example, [ u:xyz ] will restrict the search to URLs that have an "xyz" in the url address. The "u:" operator follows the same rules listed for the "t:" operator. [" "] Phrase combining operator: "orange juice", "apple juice", etc. [ * ] Wildcard search. For example, "pea*" will return "pears", "peas", etc. 2.0 REFERENCE CARD NOTE: This reference card is designed on the assumption that you have a basic understanding of the search expressions and criteria covered in prior sections of this FAQ. The double brackets [] in the reference card are not part of the query syntax. **** 2.1 ALTA VISTA http://www.altavista.digital.com [apples "orange juice"] "apples" or the phrase "orange juice" [+apples -"orange juice"] "apples" & not the phrase "orangejuice" [app* (wildcard)] "apples", "applets", "appraise" (wildcard in Alta Vista requires Min. of three letters before the wildcard and will return from 0-5 characters Max.) Complex Searches (Can use either logical word or symbol expressions) AND or &, OR or |, NOT or !, NEAR or ~ [apple AND orange] "apple" & the word "orange" [apple OR orange] "apple" or the word "orange" [apples NOT oranges] "apples" but not the word "oranges" [apple NEAR juice] "juice" within ten words of "apple" RESTRICTING A SIMPLE AND COMPLEX SEARCH [anchor:click-here] pages with "click-here" in the hyperlink. [applet:] pages with the Java class in the applet tag [domain:xyz] pages in the domain "xyz" [host:xyz.com] sites at the host name xyz.com. [image:a.jpg] sites with an image tag, "a.jpg". [link:xyz.com] sites with a link to xyz.com. [text:orange] sites with "orange" in the visible text [title:"A, B and C"] sites with "A, B and C" in the title. RANKING Simple searches: The ranking is automatic. Complex searches: Enter any word or groups of words in the ranking window. Alta Vista will sort the results based on these words. **** 2.2 EXCITE http://www.excite.com Concept Based Search [+apples +pears] "apples" and "pears" [-apples +peach] "peach" but not "apples" [+apples -pears -berries] "apples" but not "peaches" or "berries" Exact match queries use Logical Word Expressions to find Web documents. The Logical Word Operator are: AND, OR, AND NOT. Using logical word expressions will turn off Excite's concept based option. Precise searches require the use of Logical Word Operators. [apples AND peaches] pages with "apples" and "peaches" [apples OR peaches] pages with either "apples" or "peaches" [apples AND NOT peaches] pages with "apples" but not with"peaches" **** 2.3 LYCOS http://www.lycos.com STANDARD SEARCH Standard searches do not use logical word operators. [apples oranges peaches] pages where any of the words appear [apples +berries] "apples" and "berries" [apples -berries] "apples" but not "berries" [app$ (wildcard)] "apples", "applets" etc.. [apple.] "apple" but not the word "apples" CUSTOM SEARCHES Complex searches are done through an intuitive menu interface. **** 2.4 WEBCRAWLER http://www.webcrawler.com [apples oranges or apples OR oranges] pages that contain any of the words. [apples AND oranges] "apples" and "oranges" [fruit NOT apples] "fruit" but not "apples" [cheese NEAR/(x) wine] "wine" is within "x" words of "cheese" [world ADJ war] "world" & "war" are next to each other [".. " Phrases searches] "us army", "jack and jill went up the hill" [(..)] used to organize search expressions **** 2.5 Yahoo http://www.yahoo.com Advanced Options: [apples +oranges] "apples" as well as "oranges" [apples -oranges] "apples" but not with "oranges". [t:] confines the search to certain Web titles. [u:] confines the search to certain URLs. [" "] phrase operator "orange juice", "apple juice", etc. [pea* (wildcard)] "pears", "peas", "peaches" etc. **** 2.6 Infoseek http://www.infoseek.com Simple Searches [apples oranges] either "apples" or "oranges". [+apples oranges] "apples", pages with "oranges" are ranked lower. ["apple juice"] "apple" and "juice" appear next to each other. Caps are used to indicate proper names and a case sensitive search: [Johnny Appleseed] will find the name "Johnny Appleseed". [Johnny,Appleseed] will find either name. Note: commas are only used to separate names. [apples -grapes] "apples" but not "grapes". Complex Searches [fruit | apple | juice] will find "fruit" then search results for "apple" then search those results for "juice". [title:fruit] "fruit" in the title of the page. [url:www.orange.com] sites with address "www.orange.com". [url:fruit] sites with "fruit" in the URL, "www.fruit.com" or "www.fruitandnuts.com". [link:www.juice.com] will find sites linked to the specified URL [site:xyz.com] will find all sites at the specified address. **** Contact Information Corrections, additions or comments can be sent to: Ken Bogucki krb@infobasic.com http://www.infobasic.com/ END WISE FAQ (c) ========================= .