Python爬虫准备:认识urllib/urllib2与requ

2018-05-02  本文已影响47人  部落大圣

[urllib2中的urlopen()使用方法及实例]
http://www.cnblogs.com/langdashu/p/4963053.html

数据传输GET和POST的区别
1.Post传输数据时,不需要在URL中显示出来,而Get方法要在URL中显示。
2.Post传输的数据量大,可以达到2M,而Get方法由于受到URL长度的限制,只能传递大约1024字节.
3.Post顾名思义,就是为了将数据传送到服务器段,Get就是为了从服务器段取得数据.而Get之所以也能传送数据,只是用来设计告诉服务器,你到底需要什么样的数据.Post的信息作为http请求的内容,而Get是在Http头部传输的。

Get方式传递数据

# coding:utf-8
import urllib2
import urllib

values = {}
values['username'] = '186######26'
values['password'] = '######'
data = urllib.urlencode(values)  # 注意转换格式
url = 'https://accounts.douban.com/login?alias=&redir=https%3A%2F%2Fwww.douban.com%2F&source=index_nav&error=1001'
getUrl = url+'?'+data  # get传输时要在url中显示
request = urllib2.Request(getUrl)
response = urllib2.urlopen(request)

# print response.read()
print(getUrl)
输出结果
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
https://accounts.douban.com/login?alias=&redir=https%3A%2F%2Fwww.douban.com%2F&source=index_nav&error=1001?username=186######26&password=######

Post方式传递数据

# coding:utf-8
import urllib2
import urllib

values = {}
values['username'] = '18699940926'
values['password'] = 'shl5880423'
data = urllib.urlencode(values)
url = 'https://accounts.douban.com/login?alias=&redir=https%3A%2F%2Fwww.douban.com%2F&source=index_nav&error=1001'
request = urllib2.Request(url, data)
response = urllib2.urlopen(request)
#print response.read()
print(request)
print(data)
输出结果
··································
<!DOCTYPE HTML>
<!--[if lte IE 7 ]>
<!--
Content-Type: multipart/related; boundary="_"
--_
Content-Location:logo
Content-Transfer-Encoding:base64

iVBORw0KGgoAAAANSUhEUgAAANcAAAAeCAMAAACIRHv7AAAAYFBMVEWq1LBksW/3xpDQ6fVhs9v8/fz61KuKxJLF4srj8eT96NJHolQ8odObzaM4m0bm8/aJxuXz+fTX69qv2O1Xq2P+9Op6vITy+fzM5dC73b8xl0D/+/gkls0mkzculj3/////+nwbAAAAIHRSTlP/////////////////////////////////////////AFxcG+0AAAVlSURBVHja1Zhvm6osEMahJCUCw4jUjsP3/5bPDH+S07q1Z1+c5+y9pYhi8+MeRq9lut0UCyivWtoJOWquGDYcdvwMMQObckH0UoGTtvNqBtLUDzNw2YsfwcXVpnywBnHmmTtAGYUf6oDZ2B/BJezWnxXBZuM4eA96BOUh6Wdwhc8kEKhTMHGYJmDINVCHAS1+CFfLPyhWBw06KJAeGIt+lY4Pt3BuDF/QpNz4F7mEgw/i8RS0wsw9+sXILyNyR3iSBujCG5Xr/jKXUZVM5prA29nYh19igtaCsRvxjn+Hy3bDn3F5UYknLjEC75Gm+KXsCF5ix//H1S063Snt5UqhK7GKK1TKXBOVdgDnil+tgpk+wz/A1ZF7s36EvFTir7kYkJCj+MUhafxHuPjiV671zu+4xtbPYNr24RdLHV6+4LoJ8Z7LincAQrznmhZn33FxWSlykYQBF8K6vqiD17/eac596wrX7dycTqfmkoI6Ho/3kHQ9Hq+Fy2qnXGtDUX9pmuaMuoRwxzF0+eGwO1CjKMNkJS5pZhneccFcCQpXQC5hk18DmGeu0UFS5rqcfiU1ezo87HYIkxB3u2Pm6hSQyiK9nX8VNci12+3CHQeSDuKZy/uFa5W4rFumUHF5XVRzPctRtgzMgFHMkV9aghmZnkExln9voungrXcmcV0otsvljHSn/hOuscMxzhGYjVg0FacGfY5cAmkI64CG0ZBaBEN5yBMXjyn5qm6EcXIAasoyAHyiODuIQq4OuSwYnQtJyqEBm3GpibS+9hjhJcbaYIy3ba52Bj4IQVOiaWiDVHtB2Vj8IiLMX3HE5iuuQYeXYnlFm+xCPwNMJXDClQYG4ppBG4NUJs20VRhlXTcwxHNedick3ORC+fg7LYDDxjlaS7oVv3Z51P1zrqnzi+6yXnIRQpePEbHPD3ds+yABeuQSCphFRmetyFnoRMXVY4i3kHShILe5eBojAYyNWUhLsfarDBI4/BOuOuk09fMPmnXkEuvk87S4ygrSuFF3hnuHX/uoG8LRTKxciSXrhil5C8cNruR1oBuBzGNqv9ZVdf+ca4h+OdOhJDrBuUOSBbfrh7GQ80I9fpC6Sm+HGx6Iy4PHswXazmBszXXOaVgScb/pV7m1VcR1zivyt/V1D+/8ik3c21BJzU952E0oJIDYolxpqdEJ8mSWaGU7INOgQYnVryHbWnNd3nONK9csA1327FdZVPd3XLqu80Eu/olrhk0ZEYYZHAIYHo8nmHvx8GsCJPxzv7qw2i3osr5w/bFfQ01Ci0w/NLz4v42yFAelH6dnjuKDASZgg0vwLa6+5jo8c400vOa6PPwSr/3qVOYKapHrSbdU6ohLDnJDgxRBjN6G0Xd2GvEoMCaFbvMK6dZyyIC49lXdoNpY143r7pnL05SJqhyevuoXX1xsRovWk90qExceC9+SfDyfx5m4MJHWyT9TkIWLon3mGqjqVI+8G7a+6NekbeKSVi2bwS8mfJOrfix3BgxyJRhR7CIfaFEVt36vGwOO0MRPvmasLb9evM8zzvFoC6wr71HfE1VQ3Q+4822M90ap1ON+j41zyT5xRbxDrhs0wlqpTX4+90Sz7/F9+YTfRrz0q+Zq01sgW56qYDrXJq7vGwbx1d+y5EN/ojCbuBVpxrPuhYsKkII4KCds1j4g2+2rfsllcV0M/nfDNOfcxbLxfa4gHaCMto910zcpyNNZlClHUaSFK0sxkYMnsGRzg3Dv/cr+eF/eW55Oo5QO3+ciWea9lkQ4jjaFuT83zfnSh6z78XA4XqlxvaINchjHiWk2VtHszzgg5uR+3wdxvV5DVmpu/z/Khm11eD7rP4RXfNLv+S0LAAAAAElFTkSuQmCC
<![endif]-->
<html lang="zh-CN">
<head>
<meta charset="UTF-8">
<title>登录豆瓣</title>
<style type="text/css">
/* Reset */
body,div,dl,dt,dd,ul,ol,li,h1,h2,h3,h4,h5,h6,pre,form,fieldset,input,textarea,p,blockquote,th,td { margin:0; padding:0; }
table { border-collapse:collapse; border-spacing:0; }
fieldset,img { border:0; }
address,caption,cite,code,dfn,em,strong,th,var { font-style:normal; font-weight:normal; }
ol,ul { list-style:none; }
caption,th { text-align:left; }
h1,h2,h3,h4,h5,h6 { font-size:100%; font-weight:normal; }
q:before,q:after { content:''; }
abbr,acronym { border:0; }

/* Font,  Link & Container */
body { font:12px/1.6 arial,helvetica,sans-serif; }
a:link { color:#369;text-decoration:none; }
a:visited { color:#669;text-decoration:none; }
a:hover { color:#fff;text-decoration:none;background:#039; }
a:active { color:#fff;text-decoration:none;background:#f93; }
button { cursor:pointer;line-height:1.2; }
.mod { width:100%; }
.hd:after, .bd:after, .ft:after, .mod:after {content:'\0020';display:block;clear:both;height:0; }
.error-tip { margin-left:10px; }
.error-tip, .error { color:#fe2617; }

/* Layout */
.wrapper { width:950px;margin:0 auto; }
#header { padding-top:30px; }
#content { min-height:400px;*height:400px; }
#header, #content { margin-bottom:40px; }
#header, #content, #footer { width:100%;overflow:hidden; }
.article { float:left;width:590px; }
.aside { float:right;width:310px;color:#666; }
.aside li { padding-bottom: 1em; }

.narrow-layout .wrapper { width:90%; }
.narrow-layout h1 { padding-bottom:10px; }
.narrow-layout #header { padding-top:10px;margin-bottom:20px; }
.narrow-layout .article, .narrow-layout .aside { width:auto;float:none;margin-bottom:20px; }
.narrow-layout .aside li { padding:0;margin-bottom:10px; }
.narrow-layout .fright { display:block;float:none; }

/* header */
.logo { float:left; width:215px;  height:30px; overflow:hidden; line-height:10em; }
a.logo:link,
a.logo:visited,
a.logo:hover,
a.logo:active { background:transparent url(data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAANcAAAAeCAMAAACIRHv7AAAAYFBMVEWq1LBksW/3xpDQ6fVhs9v8/fz61KuKxJLF4srj8eT96NJHolQ8odObzaM4m0bm8/aJxuXz+fTX69qv2O1Xq2P+9Op6vITy+fzM5dC73b8xl0D/+/gkls0mkzculj3/////+nwbAAAAIHRSTlP/////////////////////////////////////////AFxcG+0AAAVlSURBVHja1Zhvm6osEMahJCUCw4jUjsP3/5bPDH+S07q1Z1+c5+y9pYhi8+MeRq9lut0UCyivWtoJOWquGDYcdvwMMQObckH0UoGTtvNqBtLUDzNw2YsfwcXVpnywBnHmmTtAGYUf6oDZ2B/BJezWnxXBZuM4eA96BOUh6Wdwhc8kEKhTMHGYJmDINVCHAS1+CFfLPyhWBw06KJAeGIt+lY4Pt3BuDF/QpNz4F7mEgw/i8RS0wsw9+sXILyNyR3iSBujCG5Xr/jKXUZVM5prA29nYh19igtaCsRvxjn+Hy3bDn3F5UYknLjEC75Gm+KXsCF5ix//H1S063Snt5UqhK7GKK1TKXBOVdgDnil+tgpk+wz/A1ZF7s36EvFTir7kYkJCj+MUhafxHuPjiV671zu+4xtbPYNr24RdLHV6+4LoJ8Z7LincAQrznmhZn33FxWSlykYQBF8K6vqiD17/eac596wrX7dycTqfmkoI6Ho/3kHQ9Hq+Fy2qnXGtDUX9pmuaMuoRwxzF0+eGwO1CjKMNkJS5pZhneccFcCQpXQC5hk18DmGeu0UFS5rqcfiU1ezo87HYIkxB3u2Pm6hSQyiK9nX8VNci12+3CHQeSDuKZy/uFa5W4rFumUHF5XVRzPctRtgzMgFHMkV9aghmZnkExln9voungrXcmcV0otsvljHSn/hOuscMxzhGYjVg0FacGfY5cAmkI64CG0ZBaBEN5yBMXjyn5qm6EcXIAasoyAHyiODuIQq4OuSwYnQtJyqEBm3GpibS+9hjhJcbaYIy3ba52Bj4IQVOiaWiDVHtB2Vj8IiLMX3HE5iuuQYeXYnlFm+xCPwNMJXDClQYG4ppBG4NUJs20VRhlXTcwxHNedick3ORC+fg7LYDDxjlaS7oVv3Z51P1zrqnzi+6yXnIRQpePEbHPD3ds+yABeuQSCphFRmetyFnoRMXVY4i3kHShILe5eBojAYyNWUhLsfarDBI4/BOuOuk09fMPmnXkEuvk87S4ygrSuFF3hnuHX/uoG8LRTKxciSXrhil5C8cNruR1oBuBzGNqv9ZVdf+ca4h+OdOhJDrBuUOSBbfrh7GQ80I9fpC6Sm+HGx6Iy4PHswXazmBszXXOaVgScb/pV7m1VcR1zivyt/V1D+/8ik3c21BJzU952E0oJIDYolxpqdEJ8mSWaGU7INOgQYnVryHbWnNd3nONK9csA1327FdZVPd3XLqu80Eu/olrhk0ZEYYZHAIYHo8nmHvx8GsCJPxzv7qw2i3osr5w/bFfQ01Ci0w/NLz4v42yFAelH6dnjuKDASZgg0vwLa6+5jo8c400vOa6PPwSr/3qVOYKapHrSbdU6ohLDnJDgxRBjN6G0Xd2GvEoMCaFbvMK6dZyyIC49lXdoNpY143r7pnL05SJqhyevuoXX1xsRovWk90qExceC9+SfDyfx5m4MJHWyT9TkIWLon3mGqjqVI+8G7a+6NekbeKSVi2bwS8mfJOrfix3BgxyJRhR7CIfaFEVt36vGwOO0MRPvmasLb9evM8zzvFoC6wr71HfE1VQ3Q+4822M90ap1ON+j41zyT5xRbxDrhs0wlqpTX4+90Sz7/F9+YTfRrz0q+Zq01sgW56qYDrXJq7vGwbx1d+y5EN/ojCbuBVpxrPuhYsKkII4KCds1j4g2+2rfsllcV0M/nfDNOfcxbLxfa4gHaCMto910zcpyNNZlClHUaSFK0sxkYMnsGRzg3Dv/cr+eF/eW55Oo5QO3+ciWea9lkQ4jjaFuT83zfnSh6z78XA4XqlxvaINchjHiWk2VtHszzgg5uR+3wdxvV5DVmpu/z/Khm11eD7rP4RXfNLv+S0LAAAAAElFTkSuQmCC) no-repeat;
*background-image:url(mhtml:https://accounts.douban.com/login!logo); }
h1 { color:#494949;display:block;font-size:25px;font-weight:bold;line-height:1.1;margin:0;padding:0 0 30px;word-wrap:break-word; }

/* form */
.item { clear:both;margin:0 0 15px;zoom:1; }
label { display: inline-block; float:left; margin-right: 15px; text-align: right; width: 60px; font-size: 14px; line-height: 30px; vertical-align: baseline }
.remember { cursor: pointer; font-size: 12px; display: inline; width: auto; text-align: left; float: none; margin: 0; color: #666 }
.item-captcha input,
.basic-input { width: 200px; padding: 5px; height: 18px; font-size: 14px;vertical-align:middle; -moz-border-radius: 3px; -webkit-border-radius: 3px; border-radius: 3px; border: 1px solid #c9c9c9 }
.basic-input.small {width:100px;}
.item-captcha input:focus,
.basic-input:focus { border: 1px solid #a9a9a9 }
.item-captcha input { width:100px; }
.item-captcha .pl { color:#666; }
.btn-submit { cursor: pointer;color: #ffffff;background: #3fa156; border: 1px solid #528641; font-size: 14px; font-weight: bold; padding:6px 26px; border-radius: 3px; -moz-border-radius: 3px; -webkit-border-radius: 3px; *width: 100px;*height:30px; }
.btn-submit:hover { background-color:#4fca6c;border-color:#6aad54; }
.btn-submit:active { background-color:#3fa156;border-color:#528641; }
#item-error { padding-left:75px; }
.item-captcha img { max-width:70%; }
body { -webkit-text-size-adjust: none;-webkit-touch-callout: none;-webkit-tap-highlight-color: transparent; }
/* 3rd login*/
.item-3rd { padding:5px 0;width:200px;margin:20px 0 0 75px;border-top:1px solid #eee;border-bottom:1px solid #eee; }
.item-3rd label { width:auto;margin:0;font-size:12px;color:#999;line-height:1.5; }
.item-3rd img { margin:0 5px;vertical-align:middle; }
.item-3rd a:hover { background-color:transparent; }
.item-3rd a:active { background-color:transparent; }
/* sms login */
.item.extra{float:left;}
#post-code-button {float:none;padding-left:200px;width:87px;text-align:right;margin:5px 0;}
#post-code {float:left;width:200px;}
.item-right {text-align:right;width:287px;}
</style>


<style type="text/css">
#footer { color:#999;padding-top:6px;border-top: 1px dashed #ddd; }
.fright { float:right; }
.icp { float:left; }
</style>

<script type="text/javascript" src="https://img3.doubanio.com/f/accounts/c5268df4c1f0bada95cb3d2b80089a50b494b5ee/js/lib/jquery.min.js"></script>
<script>
 function changeWindowSize(){var e=document.documentElement,n=document.getElementById("header").offsetHeight+document.getElementById("content").offsetHeight+document.getElementById("side-nav").offsetHeight;e.offsetWidth<=500||e.offsetHeight<=n?(changeWindowSize.changed||(window.resizeTo(500,n),changeWindowSize.changed=!0),e.className="narrow-layout",resizeEvent(!0)):(e.className="",resizeEvent(!1))}function resizeEvent(e){return e?void(window.onresize=function(){var e;return function(){e&&window.clearTimeout(e),e=window.setTimeout(changeWindowSize,100)}}()):void(window.onresize=null)}
 function set_cookie(e,t,n,o){var i,r,c=new Date;c.setTime(c.getTime()+24*(t||30)*60*60*1e3),i="; expires="+c.toGMTString();for(r in e)document.cookie=r+"="+e[r]+i+"; domain="+(n||"douban.com")+"; path="+(o||"/")}function get_cookie(e){var t,n,o=e+"=",i=document.cookie.split(";");for(t=0;t<i.length;t++){for(n=i[t];" "==n.charAt(0);)n=n.substring(1,n.length);if(0===n.indexOf(o))return n.substring(o.length,n.length).replace(/\"/g,"")}return null}
</script>
</head>
<body onload="changeWindowSize()">
<div class="wrapper">
  <div id="header">
      <a href="" class="logo">登录豆瓣</a>
  </div>

<div id="content">
  <h1>登录豆瓣</h1>
  <div class="article">
      

<form id="lzform" name="lzform" method="post" onsubmit="return validateForm(this);" action="https://accounts.douban.com/login">
  <div style="display:none;">
    <img src="https://www.douban.com/pics/blank.gif" onerror="document.lzform.action='https://accounts.douban.com/login'"/>
  </div>
  <input name="source" type="hidden" value="index_nav"/>
    <input name="redir" type="hidden" value="https://www.douban.com/"/>
    <div id="item-error">
      <p class="error">帐号不能为空</p>
    </div>
  <div class="item-right">
    <a href="?redir=https://www.douban.com/&amp;source=index_nav&amp;login_type=sms">手机验证码登录</a>
  </div>
  <div class="item">
    <label>帐号</label>
    <input id="email" name="form_email" type="text" class="basic-input"
           maxlength="60" value="邮箱/手机号/用户名" tabindex="1"/>
  </div>
  <div class="item">
    <label>密码</label>
    <input id="password" name="form_password" type="password" class="basic-input" maxlength="20" tabindex="2"/>
  </div>
  <!-- xFdX6TNE8gM | 117.146.230.154 -->
  
  <div class="item">
    <label>&nbsp;</label>
    <p class="remember">
      <input type="checkbox" id="remember" name="remember" tabindex="4"/>
      <label for="remember" class="remember">下次自动登录</label>
      | <a href="https://accounts.douban.com/resetpassword">忘记密码了</a>
    </p>
  </div>
  <div class="item">
    <label>&nbsp;</label>
    <input type="submit" value="登录" name="login" class="btn-submit" tabindex="5"/>
  </div>
  




<div class="item item-3rd">
    <label>
    第三方登录:
    </label>
    <a target="_top" href="https://www.douban.com/accounts/connect/wechat/?from=index_nav&amp;redir=https%3A//www.douban.com/" class="item-wechat"><img src="https://img3.doubanio.com/f/accounts/1b6cc3ca91f78cf47f41eafa91fbcd4918ae239c/pics/connect_wechat.png" title="微信"></a>
    <a target="_top" href="https://www.douban.com/accounts/connect/sina_weibo/?from=index_nav&amp;redir=https%3A//www.douban.com/&amp;fallback=" class="item-weibo"><img src="https://img3.doubanio.com/f/accounts/e2f1d8c0ede93408b46cbbab4e613fb29ba94e35/pics/connect_sina_weibo.png" title="新浪微博"></a>
</div>

</form>

  </div>
  <ul id="side-nav" class="aside">
    <li>&gt;&nbsp;还没有豆瓣帐号?<a rel="nofollow" href="https://accounts.douban.com/register">立即注册</a></li>
    <li>&gt;&nbsp;<a href="https://www.douban.com/mobile/">点击下载豆瓣移动应用</a></li>
  </ul>
</div>
<div id="footer">


<span id="icp" class="fleft gray-link">
    &copy; 2005-2018 douban.com, all rights reserved
</span>

<span class="fright">
    <a href="https://www.douban.com/about">关于豆瓣</a>
    · <a href="https://www.douban.com/jobs">在豆瓣工作</a>
    · <a href="https://www.douban.com/about?topic=contactus">联系我们</a>
    · <a href="https://www.douban.com/about?policy=disclaimer">免责声明</a>
    
    · <a href="https://www.douban.com/help/">帮助中心</a>
    · <a href="https://developers.douban.com/" target="_blank">开发者</a>
    · <a href="https://www.douban.com/mobile/">移动应用</a>
    · <a href="https://www.douban.com/partner/">豆瓣广告</a>
</span>



<script type="text/javascript">
function report_ps(r){
    $.get("https://www.douban.com/accounts/misc/ps", {ps:r});
    set_cookie({ps:'y'});
}
</script>

<img src="https://www.douban.com/pics/blank.gif" style="display:none;" onload="report_ps(true)" onerror="report_ps(false)" />



</div>

<script>
function trim(e){return e.replace(/^(\s|\u00A0)+/,"").replace(/(\s|\u00A0)+$/,"")}function validateForm(e){var r=0,t=e.elements["captcha-solution"],l=e.elements.form_email,n=e.elements.form_password,a=document.getElementById("item-error");if(a&&(a.style.display="none"),t){var o=trim(t.value);""===o?(displayError(t,"请输入验证码"),r=1):o.length<4?(displayError(t,"请输入正确的验证码"),r=1):clearError(t)}if(l){var i=trim(l.value);""===i||"邮箱/手机号/用户名"===i?(displayError(l,"请输入正确的邮箱/手机号/用户名"),r=1):clearError(l)}return n&&(""===n.value?(displayError(n,"请输入密码"),r=1):n&&clearError(n)),!r}function displayError(e,r){var t=document.getElementById(e.name+"_err");t||(t=document.createElement("span"),t.id=e.name+"_err",t.className="error-tip",e.parentNode.appendChild(t)),t.style.display="inline",t.innerHTML=r}function clearError(e){var r=document.getElementById(e.name+"_err");r&&(r.style.display="none")}!function(e){var r=function(r){return e.getElementById(r)},t="邮箱/手机号/用户名",l=r("email"),n=r("password"),a=r("captcha_field");l.onfocus=function(){this.value==t&&(this.value="",this.style.color="#000")},l.onblur=function(){this.value||(this.value=t,this.style.color="#ccc")},l.value==t?l.style.color="#ccc":""===n.value?n.focus():a&&a.focus()}(document);
</script>

<!-- COLLECTED JS -->


</div>
</body>
</html>

如何获取http头部编码格式信息,利用info():返回一个httplib.HTTPMessage 对象,表示远程服务器返回的头信息(header)

from urllib2 import urlopen
doc = urlopen("http://www.baidu.com")
print doc.info()
print doc.info().getheader('Content-Type')

输出的结果
···························
Transfer-Encoding: chunked
Bdpagetype: 1
Bdqid: 0xad9de3e700024e01
Cache-Control: private
Content-Type: text/html; charset=utf-8
Cxy_all: baidu+ddb991b06b5ef88b2a906ae2f393f374
Date: Wed, 02 May 2018 09:29:39 GMT
Expires: Wed, 02 May 2018 09:29:05 GMT
Keep-Alive: timeout=38
P3p: CP=" OTI DSP COR IVA OUR IND COM "
Server: BWS/1.1
Set-Cookie: BAIDUID=727D13931FCB93DA64AF865355E9B838:FG=1; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.com
Set-Cookie: BIDUPSID=727D13931FCB93DA64AF865355E9B838; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.com
Set-Cookie: PSTM=1525253379; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.com
Set-Cookie: BDSVRTM=0; path=/
Set-Cookie: BD_HOME=0; path=/
Set-Cookie: H_PS_PSSID=1467_21110_26307_20927; path=/; domain=.baidu.com
Vary: Accept-Encoding
X-Powered-By: HPHP
X-Ua-Compatible: IE=Edge,chrome=1

text/html; charset=utf-8

requests

安装第三方库requests

响应与编码

# coding:utf-8
import requests

url = 'http://www.baidu.com'
r = requests.get(url)  # 尝试获取网页
print type(r)
print r.status_code  # 响应状态码
print r.encoding  # 编码值
print r.content   # 找到编码
print r.cookies   # 浏览器缓存
..............................................
<class 'requests.models.Response'>
200
ISO-8859-1
<!DOCTYPE html>
<!--STATUS OK--><html> <head><meta http-equiv=content-type content=text/html;charset=utf-8><meta http-equiv=X-UA-Compatible content=IE=Edge><meta content=always name=referrer><link rel=stylesheet type=text/css href=http://s1.bdstatic.com/r/www/cache/bdorz/baidu.min.css><title>百度一下,你就知道</title></head> <body link=#0000cc> <div id=wrapper> <div id=head> <div class=head_wrapper> <div class=s_form> <div class=s_form_wrapper> <div id=lg> <img hidefocus=true src=//www.baidu.com/img/bd_logo1.png width=270 height=129> </div> <form id=form name=f action=//www.baidu.com/s class=fm> <input type=hidden name=bdorz_come value=1> <input type=hidden name=ie value=utf-8> <input type=hidden name=f value=8> <input type=hidden name=rsv_bp value=1> <input type=hidden name=rsv_idx value=1> <input type=hidden name=tn value=baidu><span class="bg s_ipt_wr"><input id=kw name=wd class=s_ipt value maxlength=255 autocomplete=off autofocus></span><span class="bg s_btn_wr"><input type=submit id=su value=百度一下 class="bg s_btn"></span> </form> </div> </div> <div id=u1> <a href=http://news.baidu.com name=tj_trnews class=mnav>新闻</a> <a href=http://www.hao123.com name=tj_trhao123 class=mnav>hao123</a> <a href=http://map.baidu.com name=tj_trmap class=mnav>地图</a> <a href=http://v.baidu.com name=tj_trvideo class=mnav>视频</a> <a href=http://tieba.baidu.com name=tj_trtieba class=mnav>贴吧</a> <noscript> <a href=http://www.baidu.com/bdorz/login.gif?login&amp;tpl=mn&amp;u=http%3A%2F%2Fwww.baidu.com%2f%3fbdorz_come%3d1 name=tj_login class=lb>登录</a> </noscript> <script>document.write('<a href="http://www.baidu.com/bdorz/login.gif?login&tpl=mn&u='+ encodeURIComponent(window.location.href+ (window.location.search === "" ? "?" : "&")+ "bdorz_come=1")+ '" name="tj_login" class="lb">登录</a>');</script> <a href=//www.baidu.com/more/ name=tj_briicon class=bri style="display: block;">更多产品</a> </div> </div> </div> <div id=ftCon> <div id=ftConw> <p id=lh> <a href=http://home.baidu.com>关于百度</a> <a href=http://ir.baidu.com>About Baidu</a> </p> <p id=cp>&copy;2017&nbsp;Baidu&nbsp;<a href=http://www.baidu.com/duty/>使用百度前必读</a>&nbsp; <a href=http://jianyi.baidu.com/ class=cp-feedback>意见反馈</a>&nbsp;京ICP证030173号&nbsp; <img src=//www.baidu.com/img/gs.gif> </p> </div> </div> </div> </body> </html>

<RequestsCookieJar[<Cookie BDORZ=27315 for .baidu.com/>]>

Get请求方式

values = {'user': 'aaa', 'id': '123' }
r = requests.get(url, values)  # Get请求方式
print r.url
............................
http://www.baidu.com/?user=aaa&id=123

Post请求方式

values = {'user': 'aaa', 'id': '123'}
r = requests.post(url, values) # Post请求方式
print r.url
得到
............................
http://www.baidu.com/

请求headers处理

user_agent = {'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.104 Safari/537.36 Core/1.53.4295.400 QQBrowser/9.7.12661.400'}
header = {'User-Agent': user_agent}
url = 'http://www.baidu.com/'
r = requests.get(url, headers=header)
g = requests.get
print r.content

响应码code与响应头headers处理

url = 'http://www.baidu.com'
r = requests.get(url)

if r.status_code == requests.codes.ok:  # Requests 内置的状态码查询对象
    print r.status_code
    print r.headers
    print r.headers.get('content-type')  # 推荐用这种get方式获取头部字段
else:
    r.raise_for_status() # 如果发送错误请求,我们通过raise_for_status()来抛出异常

得到
..........................................................
200
{'Content-Encoding': 'gzip', 'Transfer-Encoding': 'chunked', 'Set-Cookie': 'BDORZ=27315; max-age=86400; domain=.baidu.com; path=/', 'Server': 'bfe/1.0.8.18', 'Last-Modified': 'Mon, 23 Jan 2017 13:27:36 GMT', 'Connection': 'Keep-Alive', 'Pragma': 'no-cache', 'Cache-Control': 'private, no-cache, no-store, proxy-revalidate, no-transform', 'Date': 'Wed, 02 May 2018 17:26:58 GMT', 'Content-Type': 'text/html'}
text/html


cookie处理

url = 'https://www.zhihu.com/'
r = requests.get(url)
print r.cookies
print r.cookies.keys()
得到
...................................
<RequestsCookieJar[<Cookie aliyungf_tc=AQAAACJnB3p3Gg4AmuaSdUe1UU/RKylo for www.zhihu.com/>]>
['aliyungf_tc']
重定向与历史消息

处理重定向(网址重新定向)只是需要设置一下allow_redirects字段即可,将allow_redirectsy设置为True则是允许重定向的,设置为False则禁止重定向的。

url = 'http://www.baidu.com'
r = requests.get(url, allow_redirects=True)
print r.url
print r.status_code
print r.history
得到
...................................................
http://www.baidu.com/
200
[]
上一篇下一篇

猜你喜欢

热点阅读