python使用scrapy发送post请求的坑_Python

使用requests发送post请求

先来看看使用requests来发送post请求是多少好用，发送请求

Requests 简便的 API 意味着所有 HTTP 请求类型都是显而易见的。例如，你可以这样发送一个 HTTP POST 请求：

				?

									>>>r = requests.post('http://httpbin.org/post', data = {'key':'value'})

使用data可以传递字典作为参数，同时也可以传递元祖

				?

									>>>payload = (('key1', 'value1'), ('key1', 'value2'))

									>>>r = requests.post('http://httpbin.org/post', data=payload)

									>>>print(r.text)

									{

									 ...

									 "form": {

									  "key1": [

									   "value1",

									   "value2"

									  ]

									 },

									 ...

									}

传递json是这样

				?

									>>>import json

									>>>url = 'https://api.github.com/some/endpoint'

									>>>payload = {'some': 'data'}

									>>>r = requests.post(url, data=json.dumps(payload))

2.4.2 版的新加功能：

				?

									>>>url = 'https://api.github.com/some/endpoint'

									>>>payload = {'some': 'data'}

									>>>r = requests.post(url, json=payload)

也就是说，你不需要对参数做什么变化，只需要关注使用data=还是json=，其余的requests都已经帮你做好了。

使用scrapy发送post请求

通过源码可知scrapy默认发送的get请求，当我们需要发送携带参数的请求或登录时，是需要post、请求的，以下面为例

									from scrapy.spider import CrawlSpider

									from scrapy.selector import Selector

									import scrapy

									import json

									class LaGou(CrawlSpider):

									  name = 'myspider'

									  def start_requests(self):

									    yield scrapy.FormRequest(

									      url='https://www.******.com/jobs/positionAjax.json?city=%E5%B9%BF%E5%B7%9E&needAddtionalResult=false',

									      formdata={

									        'first': 'true',#这里不能给bool类型的True，requests模块中可以

									        'pn': '1',#这里不能给int类型的1，requests模块中可以

									        'kd': 'python'

									      },这里的formdata相当于requ模块中的data，key和value只能是键值对形式

									      callback=self.parse

									    )

									  def parse(self, response):

									    datas=json.loads(response.body.decode())['content']['positionResult']['result']

									    for data in datas:

									      print(data['companyFullName'] + str(data['positionId']))

官方推荐的 Using FormRequest to send data via HTTP POST

				?

									return [FormRequest(url="http://www.example.com/post/action",

									          formdata={'name': 'John Doe', 'age': '27'},

									          callback=self.after_post)]

这里使用的是FormRequest，并使用formdata传递参数，看到这里也是一个字典。

但是，超级坑的一点来了，今天折腾了一下午，使用这种方法发送请求，怎么发都会出问题，返回的数据一直都不是我想要的

				?

									return scrapy.FormRequest(url, formdata=(payload))

在网上找了很久，最终找到一种方法，使用scrapy.Request发送请求，就可以正常的获取数据。

复制代码代码如下:

	return scrapy.Request(url, body=json.dumps(payload), method='POST', headers={'Content-Type': 'application/json'},)

参考：Send Post Request in Scrapy

				?

									my_data = {'field1': 'value1', 'field2': 'value2'}

									request = scrapy.Request( url, method='POST', 

									             body=json.dumps(my_data), 

									             headers={'Content-Type':'application/json'} )

FormRequest 与 Request 区别

在文档中，几乎看不到差别，

The FormRequest class adds a new argument to the constructor. The remaining arguments are the same as for the Request class and are not documented here.
Parameters: formdata (dict or iterable of tuples) – is a dictionary (or iterable of (key, value) tuples) containing HTML Form data which will be url-encoded and assigned to the body of the request.

说FormRequest新增加了一个参数formdata，接受包含表单数据的字典或者可迭代的元组，并将其转化为请求的body。并且FormRequest是继承Request的

				?

									class FormRequest(Request):

									  def __init__(self, *args, **kwargs):

									    formdata = kwargs.pop('formdata', None)

									    if formdata and kwargs.get('method') is None:

									      kwargs['method'] = 'POST'

									    super(FormRequest, self).__init__(*args, **kwargs)

									    if formdata:

									      items = formdata.items() if isinstance(formdata, dict) else formdata

									      querystr = _urlencode(items, self.encoding)

									      if self.method == 'POST':

									        self.headers.setdefault(b'Content-Type', b'application/x-www-form-urlencoded')

									        self._set_body(querystr)

									      else:

									        self._set_url(self.url + ('&' if '?' in self.url else '?') + querystr)

									      ###

									def _urlencode(seq, enc):

									  values = [(to_bytes(k, enc), to_bytes(v, enc))

									       for k, vs in seq

									       for v in (vs if is_listlike(vs) else [vs])]

									  return urlencode(values, doseq=1)

最终我们传递的{‘key': ‘value', ‘k': ‘v'}会被转化为'key=value&k=v' 并且默认的method是POST，再来看看Request

				?

									class Request(object_ref):

									  def __init__(self, url, callback=None, method='GET', headers=None, body=None,

									         cookies=None, meta=None, encoding='utf-8', priority=0,

									         dont_filter=False, errback=None, flags=None):

									    self._encoding = encoding # this one has to be set first

									    self.method = str(method).upper()

默认的方法是GET，其实并不影响。仍然可以发送post请求。这让我想起来requests中的request用法，这是定义请求的基础方法。

				?

									def request(method, url, **kwargs):

									  """Constructs and sends a :class:`Request <Request>`.

									  :param method: method for the new :class:`Request` object.

									  :param url: URL for the new :class:`Request` object.

									  :param params: (optional) Dictionary or bytes to be sent in the query string for the :class:`Request`.

									  :param data: (optional) Dictionary or list of tuples ``[(key, value)]`` (will be form-encoded), bytes, or file-like object to send in the body of the :class:`Request`.

									  :param json: (optional) json data to send in the body of the :class:`Request`.

									  :param headers: (optional) Dictionary of HTTP Headers to send with the :class:`Request`.

									  :param cookies: (optional) Dict or CookieJar object to send with the :class:`Request`.

									  :param files: (optional) Dictionary of ``'name': file-like-objects`` (or ``{'name': file-tuple}``) for multipart encoding upload.

									    ``file-tuple`` can be a 2-tuple ``('filename', fileobj)``, 3-tuple ``('filename', fileobj, 'content_type')``

									    or a 4-tuple ``('filename', fileobj, 'content_type', custom_headers)``, where ``'content-type'`` is a string

									    defining the content type of the given file and ``custom_headers`` a dict-like object containing additional headers

									    to add for the file.

									  :param auth: (optional) Auth tuple to enable Basic/Digest/Custom HTTP Auth.

									  :param timeout: (optional) How many seconds to wait for the server to send data

									    before giving up, as a float, or a :ref:`(connect timeout, read

									    timeout) <timeouts>` tuple.

									  :type timeout: float or tuple

									  :param allow_redirects: (optional) Boolean. Enable/disable GET/OPTIONS/POST/PUT/PATCH/DELETE/HEAD redirection. Defaults to ``True``.

									  :type allow_redirects: bool

									  :param proxies: (optional) Dictionary mapping protocol to the URL of the proxy.

									  :param verify: (optional) Either a boolean, in which case it controls whether we verify

									      the server's TLS certificate, or a string, in which case it must be a path

									      to a CA bundle to use. Defaults to ``True``.

									  :param stream: (optional) if ``False``, the response content will be immediately downloaded.

									  :param cert: (optional) if String, path to ssl client cert file (.pem). If Tuple, ('cert', 'key') pair.

									  :return: :class:`Response <Response>` object

									  :rtype: requests.Response

									  Usage::

									   >>> import requests

									   >>> req = requests.request('GET', 'http://httpbin.org/get')

									   <Response [200]>

									  """

									  # By using the 'with' statement we are sure the session is closed, thus we

									  # avoid leaving sockets open which can trigger a ResourceWarning in some

									  # cases, and look like a memory leak in others.

									  with sessions.Session() as session:

									    return session.request(method=method, url=url, **kwargs)

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持服务器之家。

原文链接：https://zhangslob.github.io/2018/08/24/使用scrapy发送post请求的坑/

python使用scrapy发送post请求的坑

相关文章

热门资讯