User-agent: * Keywordspy.com keywordspy.com Disallow: /stats/ Disallow: /etc/ Disallow: /mail/ urllist.txt # 2/09 Disallow User-agent: keywordspy.com # 9/06 Disallow: /_private/Disallow: /_vti_cnf/Disallow: /_vti_log/Disallow: /_vti_pvt/ # b/c new hostgator.com # Disallow: /images/ removed 8/20/05 b/c of http://validator.w3.org/checklink?uri=... # Disallow: # not required for = /images/cc/ # Disallow: # = /images/catapics/ # is not required # Must have 2 carrage returns to properly terminate file otherwise robot can # ignore robots.txt file. # 12/6/00 = moved Disallow: /cc/ # # Everything you wanted to know about robots.txt at # http://www.robotstxt.org/orig.html # "Any number of agent id(s) can be placed on the User-Agent line so long as # they are separated by white space (WS), but the User-Agent line must have at # least one agent id." # # 1/31/01 changed Disallow: /whis/ to protect emof for his-wayh.org # 11/05 removed b/c this is robots.txt for ccpcfl.org >> 3/4/02 added 'Disallow: /scripts/' # 7/31/02 was User-agent: * ip3000.com cyveillance.com mts.net omniseek.com # 4/7/05 added Disallow: /cc-dwld/ and The real answer is that /robots.txt is not # intended for access control, so don't try to use it as such. Think of it as a # "No Entry" sign, not a locked door. If you have files on your web site that # you don't want unauthorized people to access, then configure your server to do # authentication, and configure appropriate athorization. Basic Authentication has # been around since the early days of the web (and in e.g. Apache on UNIX is trivial # to configure), and if you're really serious, SSL is commonplace in web servers. # http://www.robotstxt.org/wc/faq.html#nosecurity