我爱编程

亚马逊robots协议解析

2018-05-14  本文已影响0人  弹弹弹弹走于思琦

1. robots协议

Robots协议(也称为爬虫协议、机器人协议等)的全称是“网络爬虫排除标准”(Robots Exclusion Protocol),网站通过Robots协议告诉搜索引擎哪些页面可以抓取,哪些页面不能抓取。robots.txt文件是一个文本文件,使用任何一个常见的文本编辑器,就可以创建和编辑它。robots.txt是一个协议,而不是一个命令。robots.txt是搜索引擎中访问网站的时候要查看的第一个文件。robots.txt文件告诉蜘蛛程序在服务器上什么文件是可以被查看的。

————robots协议百度百科

2.亚马逊的robots文件

亚马逊robots.txt

User-agent: *                                               #针对所有爬虫


Disallow: /buycar                         

Disallow: /cart

Disallow: /checkout

Disallow: /class

Disallow: /com

Disallow: /common

Disallow: /css

Disallow: /dll

Disallow: /doc

#禁止访问爬取buycar、cart、checkout、class、com、common、css、dll、doc这些目录


Disallow: /dp/e-mail-friend/

Disallow: /dp/manual-submit/

Disallow: /dp/product-availability/

Disallow: /dp/rate-this-item/

Disallow: /dp/shipping/

Disallow: /dp/twister-update/

#禁止访问爬取dp目录下指定的e-mail-friend、manual-submit、product-availability、rate-this-item、shipping、twister-update目录(应该是给商品评分、提交等页面信息)


Disallow: /gp/aws/ssop

Disallow: /gp/cart

Disallow: /gp/css/homepage.html

Disallow: /gp/customer-reviews/common/du

Disallow: /gp/flex

Disallow: /gp/gfix

Disallow: /gp/history

Disallow: /gp/item-dispatch

Disallow: /gp/music/clipserve

Disallow: /gp/music/wma-pop-up

Disallow: /gp/offer-listing

Disallow: /gp/product/e-mail-friend

Disallow: /gp/product/product-availability

Disallow: /gp/product/rate-this-item

Disallow: /gp/recsradio

Disallow: /gp/slredirect

Disallow: /gp/twitter/

Disallow: /gp/vote

Disallow: /gp/voting/

Disallow: /gp/yourstore

#禁止访问爬取gp目录下指定文件(顾客评论、历史浏览、商品目录下的评分、邮件、分享至Twitter等)


Disallow: /inc

Disallow: /js

Disallow: /lib

#禁止访问爬取inc、js、lib目录


Disallow: /mn/bookLookInsideApp

Disallow: /mn/checkInitApp

Disallow: /mn/checkoutAlertMsgApp

Disallow: /mn/checkoutredirectApp

Disallow: /mn/giftCardApp

Disallow: /mn/loginApplication

Disallow: /mn/loyaltyApp

Disallow: /mn/orderAddrApp

Disallow: /mn/orderCfmApp

Disallow: /mn/orderDetailApp

Disallow: /mn/orderFailApp

Disallow: /mn/orderHistoryApp

Disallow: /mn/orderModifyApp

Disallow: /mn/orderSummaryApp

Disallow: /mn/paymentRedriveApp

Disallow: /mn/recommendReviewApp

Disallow: /mn/releaseReviewApp

Disallow: /mn/reviewVoteApplication

Disallow: /mn/selectPaymentMethodApp

Disallow: /mn/selectShippingOpptionApplication

Disallow: /mn/shipmentTraceApp

Disallow: /mn/shoppingCartApplication

Disallow: /mn/tellFriend

Disallow: /mn/thankYouApplication

Disallow: /mn/virtualAccountApp

Disallow: /mn/yourAccountApp

#禁止访问爬取mn目录下的指定文件(登录账户、注销账户、选择支付方式、订单详情、失败订单、历史订单、全部订单、选择物流、物流追踪等)


Disallow: /paper

Disallow: /xml

Disallow: /youraccount

Disallow: /ap/signin

Disallow: /gp/registry/wishlist/

Disallow: /wishlist/

#禁止访问爬取用户账户、登录、心愿单等目录


Allow: /wishlist/universal*

Allow: /wishlist/vendor-button*

Allow: /wishlist/get-button*

#允许访问wishlist目录下的指定文件


Disallow: /gp/wishlist/

Allow: /gp/wishlist/universal*

Allow: /gp/wishlist/vendor-button*

Allow: /gp/wishlist/ipad-install*

#禁止访问gp目录下的wishlist中除了三个指定文件外的其他所有文件


Disallow: /registry/wishlist/

Disallow:/gp/help/contact-us/general-questions.html*?type&email&skip=true

Disallow:/gp/help/customer/accessibility?ie=UTF8&initialIssue=forgotpw&skip=true

Disallow: /gp/registry/search.html

Disallow: /gp/orc/rml/

Disallow: /gp/digital/fiona/manage

Disallow: /gp/entity-alert/external

Disallow: /gp/customer-reviews/dynamic/sims-box

Disallow: /review/dynamic/sims-box

Disallow: /gp/redirect.html

Disallow: /gp/customer-media/upload/

Disallow: /gp/customer-media/actions/delete/

Disallow: /gp/customer-media/actions/edit-caption/

Disallow: /gp/dmusic/

Disallow: /registry

Disallow: /*/wishlist

Disallow: /gp/registry

Disallow: /gp/aag

Disallow: /gp/socialmedia/giveaways

Disallow: /gp/aw/so.html

Disallow: /gp/pdp/profile/

#禁止访问以上指定目录文件


Disallow: /gp/help/customer/display.html*nodeId=200843370

Disallow: /gp/help/customer/display.html*nodeId=200877580

Disallow: /gp/help/customer/display.html*nodeId=200877590

Disallow: /gp/help/customer/display.html*nodeId=200879080

Disallow: /gp/help/customer/display.html*nodeId=200879100

Disallow: /gp/help/customer/display.html*nodeId=200879120

Disallow: /gp/help/customer/display.html*nodeId=200879160

Disallow: /gp/help/customer/display.html*nodeId=200879140

Disallow: /gp/help/customer/display.html*nodeId=200877610

Disallow: /gp/help/customer/display.html*nodeId=200878960

Disallow: /gp/help/customer/display.html*nodeId=200878980

Disallow: /gp/help/customer/display.html*nodeId=200879000

Disallow: /gp/help/customer/display.html*nodeId=200879040

Disallow: /gp/help/customer/display.html*nodeId=200879020

Disallow: /gp/help/customer/display.html*nodeId=200877630

Disallow: /gp/help/customer/display.html*nodeId=200879200

Disallow: /gp/help/customer/display.html*nodeId=200879220

Disallow: /gp/help/customer/display.html*nodeId=200879240

Disallow: /gp/help/customer/display.html*nodeId=200879280

Disallow: /gp/help/customer/display.html*nodeId=200879260

Disallow: /gp/help/customer/display.html*nodeId=200877650

Disallow: /gp/help/customer/display.html*nodeId=200879320

Disallow: /gp/help/customer/display.html*nodeId=200879340

Disallow: /gp/help/customer/display.html*nodeId=200879360

Disallow: /gp/help/customer/display.html*nodeId=200879400

Disallow: /gp/help/customer/display.html*nodeId=200879380

Disallow: /gp/help/customer/display.html*nodeId=200877560

Disallow: /gp/help/customer/display.html*nodeId=200843460

Disallow: /gp/help/customer/display.html*nodeId=200843440

Disallow: /gp/help/customer/display.html*nodeId=200899270

Disallow: /gp/help/customer/display.html*nodeId=200879440

Disallow: /gp/help/customer/display.html*nodeId=200899330

Disallow: /gp/help/customer/display.html*nodeId=200899350

Disallow: /gp/help/customer/display.html*nodeId=200899390

Disallow: /gp/help/customer/display.html*nodeId=200899410

Disallow: /gp/help/customer/display.html*nodeId=200899430

Disallow: /gp/help/customer/display.html*nodeId=200899220

Disallow: /gp/help/customer/display.html*nodeId=200899450

Disallow: /gp/help/customer/display.html*nodeId=200899670

Disallow: /gp/help/customer/display.html*nodeId=200899530

Disallow: /gp/help/customer/display.html*nodeId=200899470

Disallow: /gp/help/customer/display.html*nodeId=200899550

Disallow: /gp/help/customer/display.html*nodeId=200899570

Disallow: /gp/help/customer/display.html*nodeId=200899510

Disallow: /gp/help/customer/display.html*nodeId=200899610

Disallow: /gp/help/customer/display.html*nodeId=200899630

Disallow: /gp/help/customer/display.html*nodeId=200899650

Disallow: /gp/help/customer/display.html*nodeId=200879180

Disallow: /gp/help/customer/display.html*nodeId=200879060

Disallow: /gp/help/customer/display.html*nodeId=200879300

Disallow: /gp/help/customer/display.html*nodeId=200879420

Disallow: /gp/help/customer/display.html*nodeId=200899290

Disallow: /gp/help/customer/display.html*nodeId=200899310

Disallow: /gp/help/customer/display.html*nodeId=200843380

Disallow: /gp/help/customer/display.html*nodeId=200843420

Disallow: /gp/help/customer/display.html*nodeId=200899230

Disallow: /gp/help/customer/display.html*nodeId=200899250

Disallow: /gp/help//display.html*nodeId=200899370

#禁止访问爬取gp/help下的指定文件(感觉像是联系亚马逊客服时特定问题的自动回复)


Disallow: /reviews/iframe

Disallow:/gp/help/reports/infringement/jquery/handle-notice-submit.html

Disallow: /gp/help/customer/handler/handle-email-submit.html

Disallow: /ss/customer-reviews/lighthouse/

Disallow: /gp/aw/ol/

#禁止访问爬取以上目录文件


亚马逊的robots协议相当详细,禁止了相当多有关顾客、商品等的访问,在此robots.txt中,仅允许访问部分wishlist指定文件,个人猜测是通过这些允许爬取的文件,通过浏览器,从浏览器向用户推送相关商品信息,引导用户访问。

上一篇下一篇

猜你喜欢

热点阅读