You know the basic concept of a search engine. Type a word or phrase into a search box and click a button. Wait a few seconds, and references to thousands (or hundreds of thousands) of pages will appear. Then all you have to do is click through those results to find what you want. But what exactly is a search engine, beyond this general concept of ‘‘seek and ye shall find’’?
It’s a little complicated. On the back end, a search engine is a piece of software that uses algorithms to find and collect information about web pages. The information collected is usually keywords or phrases that are possible indicators of what is contained on the web page as a whole, the URL of the page, the code that makes up the page, and links into and out of the page. That information is then indexed and stored in a database. On the front end, the software has a user interface where users enter a search term — a word or phrase — in an attempt to find specific information. When the user clicks a search button, analgorithm then examines the information stored in the back-end database and retrieves links to web pages that appear to match the search term the user entered.
The process of collecting information about web pages is performed by an agent called a crawler, spider, or robot. The crawler literally looks at every URL on the Web that’s not blocked from it and collects key words and phrases on each page, which are then included in the database that powers a search engine. Considering that the number of sites on the Web exceeded 100 million some time ago and is increasing by more than 1.5 million sites each month, that’s like your brain cataloging every single word you read, so that when you need to know something, you think of that word and every reference to it comes to mind. In a word . . . overwhelming.