CS 222, John Paxton
October 21, 1999
Come in squad leader. chshhhhk. All drones in position for advance sir... Over. chshhhhk. No sign of movement on server 126.96.36.1995. Nothing but civilian chatter bots and a harmless spider indexer, sir. chshhhhk. Keep your scanners active menÖ stay low on the bandwidth and watch for suspicious data chshhhhk.
In an effort to increase the power of the Internet, is it possible to take it too far? A relatively new concept in the internet world is the use of small programs, (web bots), that operate for long periods of time traversing the web or performing other tasks that benefit their designer. Non-web-based bots of all different sorts have slowly been creeping into existence on home computers around the world since the early 90s. Primitive versions, however, have existed for decades --such as Massachusetts Institute of Technology's Eliza born in 1966. After the advent of network scripting languages such as Perl, PHP, and CGI, and the introduction of large scale SQL and ORACLE network interfacing type database concepts, bot evolution exploded. Today, with the power and quantity of the Internet, the bot's potential to benefit the public has increased exponentially. By examining the characteristics, current uses, and taxonomy of today's bots, one might catch a glimpse into the future of the Internet.
A bot rarely travels between computers, rather it stays fixed on one server and reaches its tentacles out into the net. The first bots to emerge from the Internet's primordial soup were the web indexing bots, (often called spiders). Webcrawler, Lycos, and Hotbot are a few of the more popular spiders whose sole purpose is retrieving resource discoveries. An indexing bot is programmed to start with a page, and recursively follow all links, which in turn lead to more pages with more links. On its way, it stores selected pieces of information into a massive searchable database. The database becomes a gold mine when it is released to an information hungry population seeking to look at the pertinent points of interest while skipping over the "junk". Some indexing bots are designed to gather statistics on server presence, while others function as an easy way to check for broken links. Some of the more sophisticated spider programs are adding features that allow bots to sift through information based on relevance. A few designers are allowing specific spiders to live or die based on the usefulness of the retrieved data, endowing their bots with an evolutionary aspect.
Retailers are quick to notice that a web bot's relentless work ethic can do more than simple indexing. Internet commerce has enticed many bot developers to invest their time in producing bots that open new doors in the net shopping industries. A shopping bot will scour the Internet for bargains in competing products and return the lowest prices to the consumer. For consumers who hate being outbid at an auction, an auction bot proves useful in ensuring that no one slips in for the steal just before closing time.
Unlike the tame mannered commerce bot, a chatter bot is known for its quasi-personality characteristics. A chatter bot takes the top down approach on artificial intelligence, meaning that instead of learning from the beginning like a newborn human, it is given a large database of information on itís first birthday. Its "intelligence" lies on the ability to recognize keywords and create appropriate responses through the database. Until recently, chatter bots were mainly accessible only on the PCs on which they were installed. An Internet version, however, can talk with hundreds of users at one time. One use is placing a chatter bot at the front end of a search engine to allow users to ask questions instead of typing in a query. Similar bots are utilized to provide customers at book and media sites to see what a bot recommends based on preferences, past purchases, and similar customer preferences. Perhaps the most common platform for a chatter bot type is a MUD or MOO. These are gaming/role-playing multi-user games in which a person takes on an identity and tries to accumulate more wealth and power than his/her opponents. Because of its nature, many players have written chatter bots that have personalities of their own. Some bots follow the player around to protect them, while others are written to perform specific tasks -- for example; a thief bot would relentlessly try and steal items from other players. More sophisticated MUD/MOO bots have the ability to invent their own personality as the game progresses, and learn from their mistakes. Along with MUDs and MOOs, chat clients such as IRC, MIRC, Hotline, and others have embraced the chatter bot, incorporating it into their culture as well.
So many Internet thriving bots exist today in different forms that counting them is no longer possible. Over a dozen books have been dedicated to the writing of web bots. Internet perl, PHP, C, SQL, and similar newsgroups are loaded with questions regarding the building and ethics the critters. Ethics? Of course. A web bot can resemble a demon or an angel depending on who its creator is. Because of a bot's speed, dedication, and ease of access to sites, intentional and unintentional misuse can cause serious problems. If a bot is located on a fast enough server, it may sift through another server in a rapid-fire succession, making numerous requests per second. The result is usually an entire network slowdown for other users and possibly a server shutdown.
Near the time web based bots were just being introduced, no rules or ethics were laid down because its effects were not known. A standard was needed to limit a bots speed and tentacle length. Today there is a FAQ called, "A Standard for Robot Exclusion", hosted and revised by Webcrawler, one of the first bots ever made. It is located on http://info.webcrawler.com/mak/projects/robots/robots.html. The FAQ outlines a method encouraging servers to place a file in the root directory called robots.txt, which contains information concerning what files a bot can and can't use. Whether this file is heeded or ignored, however, is at the mercy of the programmer. Adhering to the stipulations in robots.txt may cost extra time and coding, but it may benefit the bot by blocking out the indexing of redundancies and unnecessary files that occur when searching deep into a tree. At the top of the FAQ, a statement reads as such: "It, [the document], is not enforced by anybody, and there is no guarantee that all current and future robots will use it". A less popular standard than the robots.txt method is placing a metatag on strategic pages in the form:
<META NAME="ROBOTS" CONTENT="NOFOLLOW">
<META NAME="ROBOTS" CONTENT="NOINDEX">
In this case, NOFOLLOW means don't follow any link on this page, and NOINDEX means don't index this page. Perl has yet another exclusion method in norobots.pl. Which method emerges a winner, if any, is yet to be seen.
There will always be programmers with malicious intent. For example in one MOO, (called "Pt. MOOt"), a user created numerous Barney Bots that wandered around the virtual city singing, "I Love You, You Love Me", and spawned off new Barney characters faster than they could be killed, driving the inhabitants crazy in the process. There is nothing to stop a programmer from creating a bot that continually clicked on a link to purposely knock over the server. Web bots have been used to continually spam a newsgroup, individual, or guest-book causing memory overloads. An infiltrating bot may be used to gather confidential information about a competing company. To combat each type of attack, a corresponding bot has been created to look for patterns of rapid succession and repel the attacks, and as always, more potent bad-guy bots are being developed to sneak around the defenses.
Although malicious intent presents problems, accidental misused remains an equally difficult dilemma. In a popular mailing list devoted to web bot research, a spam bot began responding to its own messages, producing a recursively infinite sequence of messages. Web pages and their links can be complicated enough to frustrate even the best programmer. A great deal of costly history checking occurs because links can link back to their parent directories, spawn an infinite number of new windows, or turn up a 404. New programmers must continually seek scarce help from the few existing experts in the field to keep from falling into the many snares that eat web-bots for lunch. Bot developers are urged to keep releases to the minimum by experts in the field such as Martijn Koster, (developer of the "Standard for Robot Exclusion"), who states; "Reconsider: Do you really need a new robot?" It is clear that too many robots introduced too early may cause serious problems, and hurt the cause more than help by overtaking precious bandwidth.
Today. The concept of a bot on the Internet is quite simple, with only a few hundred to a few thousand lines of code per bot. Interaction between bots is almost non-existent except in the MUD/MOO environment. Fortunately for the bot industry, the potential uses for bigger and better bots are growing near the rate of the web explosion. As the information quantity grows, bandwidth gets faster, bot programming techniques improve, and advances in artificial intelligence open doors, web bots will grow smarter, faster, and more numerous. The Internet is a cluttered environment, and individual web bots will either add to it or subtract from it depending on their design. The Internet reflects humanity and will never be, "perfect", in any sense. Most likely bots will battle each other just as we do, although not because they choose to, but because we will order them to do so. Although not a popular theory today, some theorists venture to say that if or when an artificial life being with the complexity and characteristics of human life is completed, the Internet would be a suitable environment for growth and learning. After all, with billions of people posting billions of documents, the Internet contains an enormous amount of information about who and what our race is all about. Perhaps a two-dimensional text world nearly as complex as our own may resemble us so closely that it would be equally habitable.