HOWTO 使用 urllib 包获取网络资源.pdf

yBmZlQzJ

165

11页

6次

2022-10-31

免费下载

HOWTO urllib

3.9.1

Guido van Rossum

and the Python development team

16, 2020

Python Software Foundation

Email: docs@python.org

Contents

1 2

2 URL 2

2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.2 Headers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

3 4

3.1 URLError . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

3.2 HTTPError . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

4 info and geturl 7

5 Openers and Handlers 8

6 8

7 9

8 Sockets and Layers 10

9 10

Michael Foord

: HOWTO urllib2 - Le Manuel manquant

微信搜索公众号 "Python编程时光" ，回复 "pycharm" 获取免安装永久破解的专业版PyCharm

第 1 页，共 11 页

Python

•

Python

urllib.request URL Python urlopen

URL

cookies handlers opener

urllib.request ”URL ” URL ":" URL

"ftp://python.org/"` ``"ftp"` FTP HTTP

URL HTTP

For straightforward situations urlopen is very easy to use. But as soon as you encounter errors or non-trivial cases

when opening HTTP URLs, you will need some understanding of the HyperText Transfer Protocol. The most com-

prehensive and authoritative reference to HTTP is RFC 2616. This is a technical document and not intended to be

easy to read. This HOWTO aims to illustrate using urllib, with enough detail about HTTP to help you through. It is

not intended to replace the urllib.request docs, but is supplementary to them.

2 URL

urllib.request

import urllib.request

with urllib.request.urlopen('http://python.org/') as response:

html = response.read()

URL shutil.copyfileobj()

tempfile.NamedTemporaryFile() :

import shutil

import tempfile

import urllib.request

with urllib.request.urlopen('http://python.org/') as response:

with tempfile.NamedTemporaryFile(delete=False) as tmp_file:

shutil.copyfileobj(response, tmp_file)

with open(tmp_file.name) as html:

pass

urllib URL ’http:’ ’ftp:’ ’ﬁle:’

HTTP

HTTP urllib.request HTTP

“Request“ Request URL Request

“urlopen“ URL

“.read()“

import urllib.request

req = urllib.request.Request('http://www.voidspace.org.uk')

( )

微信搜索公众号 "Python编程时光" ，回复 "pycharm" 获取免安装永久破解的专业版PyCharm

第 2 页，共 11 页

( )

with urllib.request.urlopen(req) as response:

the_page = response.read()

urllib.request Request FTP

req = urllib.request.Request('ftp://example.com/')

HTTP Request

HTTP

2.1

URL URL CGI web

HTTP POST HTML

POST POST

HTML data Request

urllib.parse

import urllib.parse

import urllib.request

url = 'http://www.someserver.com/cgi-bin/register.cgi'

values = {'name' : 'Michael Foord',

'location' : 'Northampton',

'language' : 'Python' }

data = urllib.parse.urlencode(values)

data = data.encode('ascii') # data should be bytes

req = urllib.request.Request(url, data)

with urllib.request.urlopen(req) as response:

the_page = response.read()

Note that other encodings are sometimes required (e.g. for ﬁle upload from HTML forms - see HTML Speciﬁcation,

Form Submission for more details).

If you do not pass the data argument, urllib uses a GET request. One way in which GET and POST requests

diﬀer is that POST requests often have ”side-eﬀects”: they change the state of the system in some way (for example

by placing an order with the website for a hundredweight of tinned spam to be delivered to your door). Though

the HTTP standard makes it clear that POSTs are intended to always cause side-eﬀects, and GET requests never to

cause side-eﬀects, nothing prevents a GET request from having side-eﬀects, nor a POST requests from having no

side-eﬀects. Data can also be passed in an HTTP GET request by encoding it in the URL itself.

>>> import urllib.request

>>> import urllib.parse

>>> data = {}

>>> data['name'] = 'Somebody Here'

>>> data['location'] = 'Northampton'

>>> data['language'] = 'Python'

>>> url_values = urllib.parse.urlencode(data)

>>> print(url_values) # The order may differ from below.

name=Somebody+Here&language=Python&location=Northampton

>>> url = 'http://www.example.com/example.cgi'

>>> full_url = url + '?' + url_values

>>> data = urllib.request.urlopen(full_url)

Notice that the full URL is created by adding a ? to the URL, followed by the encoded values.

微信搜索公众号 "Python编程时光" ，回复 "pycharm" 获取免安装永久破解的专业版PyCharm

第 3 页，共 11 页

of 11

免费下载

urllib

手机用户4260

关注

评论