robots.txt怎么写?跟着google、百度学习设置robots.txt

robots.txt是什么?

robots.txt是搜索引擎中访问网站的时候要查看的第一个文件。当一个搜索蜘蛛访问一个站点时,它会首先检查该站点根目录下是否存在robots.txt,如果存在,搜索机器人就会按照该文件中的内容来确定访问的范围;如果该文件不存在,所有的搜索蜘蛛将能够访问网站上所有没有被口令保护的页面。

如何设置robots.txt

跟着百度和google的robots.txt学习研究就行了,没有对比,就没有伤害,百度……(省略了,不说了)但是,这种对比,也能让我学到不少的东西。

首先,百度认可的蜘蛛很少

地址:https://www.baidu.com/robots.txt

User-agent: Baiduspider

Disallow: /baidu

Disallow: /s?

Disallow: /ulink?

Disallow: /link?

Disallow: /home/news/data/

User-agent: Googlebot

Disallow: /baidu

Disallow: /s?

Disallow: /shifen/

Disallow: /homepage/

Disallow: /cpro

Disallow: /ulink?

Disallow: /link?

Disallow: /home/news/data/

User-agent: MSNBot

Disallow: /baidu

Disallow: /s?

Disallow: /shifen/

Disallow: /homepage/

Disallow: /cpro

Disallow: /ulink?

Disallow: /link?

Disallow: /home/news/data/

User-agent: Baiduspider-image

Disallow: /baidu

Disallow: /s?

Disallow: /shifen/

Disallow: /homepage/

Disallow: /cpro

Disallow: /ulink?

Disallow: /link?

Disallow: /home/news/data/

User-agent: YoudaoBot

Disallow: /baidu

Disallow: /s?

Disallow: /shifen/

Disallow: /homepage/

Disallow: /cpro

Disallow: /ulink?

Disallow: /link?

Disallow: /home/news/data/

User-agent: Sogou web spider

Disallow: /baidu

Disallow: /s?

Disallow: /shifen/

Disallow: /homepage/

Disallow: /cpro

Disallow: /ulink?

Disallow: /link?

Disallow: /home/news/data/

User-agent: Sogou inst spider

Disallow: /baidu

Disallow: /s?

Disallow: /shifen/

Disallow: /homepage/

Disallow: /cpro

Disallow: /ulink?

Disallow: /link?

Disallow: /home/news/data/

User-agent: Sogou spider2

Disallow: /baidu

Disallow: /s?

Disallow: /shifen/

Disallow: /homepage/

Disallow: /cpro

Disallow: /ulink?

Disallow: /link?

Disallow: /home/news/data/

User-agent: Sogou blog

Disallow: /baidu

Disallow: /s?

Disallow: /shifen/

Disallow: /homepage/

Disallow: /cpro

Disallow: /ulink?

Disallow: /link?

Disallow: /home/news/data/

User-agent: Sogou News Spider

Disallow: /baidu

Disallow: /s?

Disallow: /shifen/

Disallow: /homepage/

Disallow: /cpro

Disallow: /ulink?

Disallow: /link?

Disallow: /home/news/data/

User-agent: Sogou Orion spider

Disallow: /baidu

Disallow: /s?

Disallow: /shifen/

Disallow: /homepage/

Disallow: /cpro

Disallow: /ulink?

Disallow: /link?

Disallow: /home/news/data/

User-agent: ChinasoSpider

Disallow: /baidu

Disallow: /s?

Disallow: /shifen/

Disallow: /homepage/

Disallow: /cpro

Disallow: /ulink?

Disallow: /link?

Disallow: /home/news/data/

User-agent: Sosospider

Disallow: /baidu

Disallow: /s?

Disallow: /shifen/

Disallow: /homepage/

Disallow: /cpro

Disallow: /ulink?

Disallow: /link?

Disallow: /home/news/data/

User-agent: yisouspider

Disallow: /baidu

Disallow: /s?

Disallow: /shifen/

Disallow: /homepage/

Disallow: /cpro

Disallow: /ulink?

Disallow: /link?

Disallow: /home/news/data/

User-agent: EasouSpider

Disallow: /baidu

Disallow: /s?

Disallow: /shifen/

Disallow: /homepage/

Disallow: /cpro

Disallow: /ulink?

Disallow: /link?

Disallow: /home/news/data/

User-agent: *

Disallow: /

google的设置很复杂,内容很多,google的sitemap.xml真心大得可怕

地址:https://www.google.com/robots.txt

User-agent: *

Disallow: /search

Allow: /search/about

Allow: /search/static

Allow: /search/howsearchworks

Disallow: /sdch

Disallow: /groups

Disallow: /index.html?

Disallow: /?

Allow: /?hl=

Disallow: /?hl=*&

Allow: /?hl=*&gws_rd=ssl$

Disallow: /?hl=*&*&gws_rd=ssl

Allow: /?gws_rd=ssl$

Allow: /?pt1=true$

Disallow: /imgres

Disallow: /u/

Disallow: /preferences

Disallow: /setprefs

Disallow: /default

Disallow: /m?

Disallow: /m/

Allow: /m/finance

Disallow: /wml?

Disallow: /wml/?

Disallow: /wml/search?

Disallow: /xhtml?

Disallow: /xhtml/?

Disallow: /xhtml/search?

Disallow: /xml?

Disallow: /imode?

Disallow: /imode/?

Disallow: /imode/search?

Disallow: /jsky?

Disallow: /jsky/?

Disallow: /jsky/search?

Disallow: /pda?

Disallow: /pda/?

Disallow: /pda/search?

Disallow: /sprint_xhtml

Disallow: /sprint_wml

Disallow: /pqa

Disallow: /palm

Disallow: /gwt/

Disallow: /purchases

Disallow: /local?

Disallow: /local_url

Disallow: /shihui?

Disallow: /shihui/

Disallow: /products?

Disallow: /product_

Disallow: /products_

Disallow: /products;

Disallow: /print

Disallow: /books/

Disallow: /bkshp?*q=*

Disallow: /books?*q=*

Disallow: /books?*output=*

Disallow: /books?*pg=*

Disallow: /books?*jtp=*

Disallow: /books?*jscmd=*

Disallow: /books?*buy=*

Disallow: /books?*zoom=*

Allow: /books?*q=related:*

Allow: /books?*q=editions:*

Allow: /books?*q=subject:*

Allow: /books/about

Allow: /booksrightsholders

Allow: /books?*zoom=1*

Allow: /books?*zoom=5*

Allow: /books/content?*zoom=1*

Allow: /books/content?*zoom=5*

Disallow: /ebooks/

Disallow: /ebooks?*q=*

Disallow: /ebooks?*output=*

Disallow: /ebooks?*pg=*

Disallow: /ebooks?*jscmd=*

Disallow: /ebooks?*buy=*

Disallow: /ebooks?*zoom=*

Allow: /ebooks?*q=related:*

Allow: /ebooks?*q=editions:*

Allow: /ebooks?*q=subject:*

Allow: /ebooks?*zoom=1*

Allow: /ebooks?*zoom=5*

Disallow: /patents?

Disallow: /patents/download/

Disallow: /patents/pdf/

Disallow: /patents/related/

Disallow: /scholar

Disallow: /citations?

Allow: /citations?user=

Disallow: /citations?*cstart=

Allow: /citations?view_op=new_profile

Allow: /citations?view_op=top_venues

Allow: /scholar_share

Disallow: /s?

Allow: /maps?*output=classic*

Allow: /maps?*file=

Allow: /maps/api/js?

Allow: /maps/d/

Disallow: /maps?

Disallow: /mapstt?

Disallow: /mapslt?

Disallow: /maps/stk/

Disallow: /maps/br?

Disallow: /mapabcpoi?

Disallow: /maphp?

Disallow: /mapprint?

Disallow: /maps/api/js/

Disallow: /maps/api/staticmap?

Disallow: /maps/api/streetview

Disallow: /mld?

Disallow: /staticmap?

Disallow: /maps/preview

Disallow: /maps/place

Disallow: /help/maps/streetview/partners/welcome/

Disallow: /help/maps/indoormaps/partners/

Disallow: /lochp?

Disallow: /center

Disallow: /ie?

Disallow: /blogsearch/

Disallow: /blogsearch_feeds

Disallow: /advanced_blog_search

Disallow: /uds/

Disallow: /chart?

Disallow: /transit?

Disallow: /extern_js/

Disallow: /xjs/

Disallow: /calendar/feeds/

Disallow: /calendar/ical/

Disallow: /cl2/feeds/

Disallow: /cl2/ical/

Disallow: /coop/directory

Disallow: /coop/manage

Disallow: /trends?

Disallow: /trends/music?

Disallow: /trends/hottrends?

Disallow: /trends/viz?

Disallow: /trends/embed.js?

Disallow: /trends/fetchComponent?

Disallow: /trends/beta

Disallow: /trends/topics

Disallow: /musica

Disallow: /musicad

Disallow: /musicas

Disallow: /musicl

Disallow: /musics

Disallow: /musicsearch

Disallow: /musicsp

Disallow: /musiclp

Disallow: /urchin_test/

Disallow: /movies?

Disallow: /wapsearch?

Allow: /safebrowsing/diagnostic

Allow: /safebrowsing/report_badware/

Allow: /safebrowsing/report_error/

Allow: /safebrowsing/report_phish/

Disallow: /reviews/search?

Disallow: /orkut/albums

Disallow: /cbk

Allow: /cbk?output=tile&cb_client=maps_sv

Disallow: /maps/api/js/AuthenticationService.Authenticate

Disallow: /maps/api/js/QuotaService.RecordEvent

Disallow: /recharge/dashboard/car

Disallow: /recharge/dashboard/static/

Disallow: /profiles/me

Allow: /profiles

Disallow: /s2/profiles/me

Allow: /s2/profiles

Allow: /s2/oz

Allow: /s2/photos

Allow: /s2/search/social

Allow: /s2/static

Disallow: /s2

Disallow: /transconsole/portal/

Disallow: /gcc/

Disallow: /aclk

Disallow: /cse?

Disallow: /cse/home

Disallow: /cse/panel

Disallow: /cse/manage

Disallow: /tbproxy/

Disallow: /imesync/

Disallow: /shenghuo/search?

Disallow: /support/forum/search?

Disallow: /reviews/polls/

Disallow: /hosted/images/

Disallow: /ppob/?

Disallow: /ppob?

Disallow: /accounts/ClientLogin

Disallow: /accounts/ClientAuth

Disallow: /accounts/o8

Allow: /accounts/o8/id

Disallow: /topicsearch?q=

Disallow: /xfx7/

Disallow: /squared/api

Disallow: /squared/search

Disallow: /squared/table

Disallow: /qnasearch?

Disallow: /app/updates

Disallow: /sidewiki/entry/

Disallow: /quality_form?

Disallow: /labs/popgadget/search

Disallow: /buzz/post

Disallow: /compressiontest/

Disallow: /analytics/feeds/

Disallow: /analytics/partners/comments/

Disallow: /analytics/portal/

Disallow: /analytics/uploads/

Allow: /alerts/manage

Allow: /alerts/remove

Disallow: /alerts/

Allow: /alerts/$

Disallow: /ads/search?

Disallow: /ads/plan/action_plan?

Disallow: /ads/plan/api/

Disallow: /ads/hotels/partners

Disallow: /phone/compare/?

Disallow: /travel/clk

Disallow: /hotelfinder/rpc

Disallow: /hotels/rpc

Disallow: /flights/rpc

Disallow: /async/flights/

Disallow: /commercesearch/services/

Disallow: /evaluation/

Disallow: /chrome/browser/mobile/tour

Disallow: /compare/*/apply*

Disallow: /forms/perks/

Disallow: /shopping/suppliers/search

Disallow: /ct/

Disallow: /edu/cs4hs/

Disallow: /trustedstores/s/

Disallow: /trustedstores/tm2

Disallow: /trustedstores/verify

Disallow: /adwords/proposal

Disallow: /shopping/product/

Disallow: /shopping/seller

Disallow: /shopping/reviewer

Disallow: /about/careers/applications/

Disallow: /landing/signout.html

Disallow: /webmasters/sitemaps/ping?

Disallow: /ping?

Disallow: /gallery/

Disallow: /landing/now/ontap/

Allow: /searchhistory/

Allow: /maps/reserve

Allow: /maps/reserve/partners

Disallow: /maps/reserve/api/

Disallow: /maps/reserve/search

Disallow: /maps/reserve/bookings

Disallow: /maps/reserve/settings

Disallow: /maps/reserve/manage

Disallow: /maps/reserve/payment

Disallow: /maps/reserve/receipt

Disallow: /maps/reserve/sellersignup

Disallow: /maps/reserve/payments

Disallow: /maps/reserve/feedback

Disallow: /maps/reserve/terms

Disallow: /maps/reserve/m/

Disallow: /maps/reserve/b/

Disallow: /maps/reserve/partner-dashboard

Disallow: /about/views/

Disallow: /intl/*/about/views/

Disallow: /local/dining/

Disallow: /local/place/reviews/

Disallow: /local/place/rap/

Disallow: /local/tab/

Disallow: /travel/hotels/

Allow: /finance

Disallow: /finance?*q=*

# Certain social media sites are whitelisted to allow crawlers to access page markup when links to google.com/imgres* are shared. To learn more, please contact images-robots-whitelist@google.com.

User-agent: Twitterbot

Allow: /imgres

User-agent: facebookexternalhit

Allow: /imgres

Sitemap: http://www.gstatic.com/s2/sitemaps/profiles-sitemap.xml

Sitemap: https://www.google.com/sitemap.xml

人吐槽 人点赞

猜你喜欢

发表评论

用户名: 密码:
验证码: 匿名发表

你可以使用这些语言

查看评论:robots.txt怎么写?跟着google、百度学习设置robots.txt