最近在学习爬虫相关的知识,在配置pyspider的时候遇到一些小坑。还好前人栽树,后人乘凉,百度后得到了很好的解决。 我用的是anaconda,自带的python版本是3.6.5。

安装pyspider:

(base) C:\WINDOWS\system32>pip install pyspider
Requirement already satisfied: pyspider in c:\programdata\anaconda3\lib\site-packages (0.3.10)
Requirement already satisfied: tornado<=4.5.3,>=3.2 in c:\programdata\anaconda3\lib\site-packages (from pyspider) (4.5.3)
Requirement already satisfied: tblib>=1.3.0 in c:\programdata\anaconda3\lib\site-packages (from pyspider) (1.3.2)
Requirement already satisfied: six>=1.5.0 in c:\programdata\anaconda3\lib\site-packages (from pyspider) (1.11.0)
Requirement already satisfied: requests>=2.2 in c:\programdata\anaconda3\lib\site-packages (from pyspider) (2.18.4)
Requirement already satisfied: Flask>=0.10 in c:\programdata\anaconda3\lib\site-packages (from pyspider) (1.0.2)
Requirement already satisfied: lxml in c:\programdata\anaconda3\lib\site-packages (from pyspider) (4.2.1)
Requirement already satisfied: pycurl in c:\programdata\anaconda3\lib\site-packages (from pyspider) (7.43.0.2)
Requirement already satisfied: pyquery in c:\programdata\anaconda3\lib\site-packages (from pyspider) (1.4.0)
Requirement already satisfied: click>=3.3 in c:\programdata\anaconda3\lib\site-packages (from pyspider) (6.7)
Requirement already satisfied: cssselect>=0.9 in c:\programdata\anaconda3\lib\site-packages (from pyspider) (1.0.3)
Requirement already satisfied: Flask-Login>=0.2.11 in c:\programdata\anaconda3\lib\site-packages (from pyspider) (0.4.1)
Requirement already satisfied: u-msgpack-python>=1.6 in c:\programdata\anaconda3\lib\site-packages (from pyspider) (2.5.2)
Requirement already satisfied: Jinja2>=2.7 in c:\programdata\anaconda3\lib\site-packages (from pyspider) (2.10)
Requirement already satisfied: chardet>=2.2 in c:\programdata\anaconda3\lib\site-packages (from pyspider) (3.0.4)
Requirement already satisfied: wsgidav>=2.0.0 in c:\programdata\anaconda3\lib\site-packages (from pyspider) (3.0.0)
Requirement already satisfied: idna<2.7,>=2.5 in c:\programdata\anaconda3\lib\site-packages (from requests>=2.2->pyspider) (2.6)
Requirement already satisfied: urllib3<1.23,>=1.21.1 in c:\programdata\anaconda3\lib\site-packages (from requests>=2.2->pyspider) (1.22)
Requirement already satisfied: certifi>=2017.4.17 in c:\programdata\anaconda3\lib\site-packages (from requests>=2.2->pyspider) (2019.6.16)
Requirement already satisfied: itsdangerous>=0.24 in c:\programdata\anaconda3\lib\site-packages (from Flask>=0.10->pyspider) (0.24)
Requirement already satisfied: Werkzeug>=0.14 in c:\programdata\anaconda3\lib\site-packages (from Flask>=0.10->pyspider) (0.14.1)
Requirement already satisfied: MarkupSafe>=0.23 in c:\programdata\anaconda3\lib\site-packages (from Jinja2>=2.7->pyspider) (1.0)
Requirement already satisfied: PyYAML in c:\programdata\anaconda3\lib\site-packages (from wsgidav>=2.0.0->pyspider) (3.12)
Requirement already satisfied: defusedxml in c:\programdata\anaconda3\lib\site-packages (from wsgidav>=2.0.0->pyspider) (0.6.0)
Requirement already satisfied: jsmin in c:\programdata\anaconda3\lib\site-packages (from wsgidav>=2.0.0->pyspider) (2.2.2)
WARNING: You are using pip version 19.1.1, however version 19.2.2 is available.
You should consider upgrading via the 'python -m pip install --upgrade pip' command.

升级pip:

(base) C:\WINDOWS\system32>python -m pip install --upgrade pip
Collecting pip
  Downloading https://files.pythonhosted.org/packages/8d/07/f7d7ced2f97ca3098c16565efbe6b15fafcba53e8d9bdb431e09140514b0/pip-19.2.2-py2.py3-none-any.whl (1.4MB)
     |████████████████████████████████| 1.4MB 22kB/s
Installing collected packages: pip
  Found existing installation: pip 19.1.1
    Uninstalling pip-19.1.1:
      Successfully uninstalled pip-19.1.1
Successfully installed pip-19.2.2

一切都配置好了在命令行输入pyspider或者pyspider all验证是否有没有配置好。

(base) C:\WINDOWS\system32>pyspider
c:\programdata\anaconda3\lib\site-packages\pyspider\libs\utils.py:196: FutureWarning: timeout is not supported on your platform.
  warnings.warn("timeout is not supported on your platform.", FutureWarning)
phantomjs fetcher running on port 25555
[I 190823 15:18:12 result_worker:49] result_worker starting...
[I 190823 15:18:12 processor:211] processor starting...
[I 190823 15:18:12 scheduler:647] scheduler starting...
[I 190823 15:18:12 scheduler:586] in 5m: new:0,success:0,retry:0,failed:0
[I 190823 15:18:12 tornado_fetcher:638] fetcher starting...
[I 190823 15:18:12 scheduler:782] scheduler.xmlrpc listening on 127.0.0.1:23333
[I 190823 15:18:12 app:76] webui running on 0.0.0.0:5000
[I 190823 15:19:12 scheduler:586] in 5m: new:0,success:0,retry:0,failed:0
[I 190823 15:20:12 scheduler:586] in 5m: new:0,success:0,retry:0,failed:0
[I 190823 15:21:12 scheduler:586] in 5m: new:0,success:0,retry:0,failed:0
[I 190823 15:23:36 scheduler:586] in 5m: new:0,success:0,retry:0,failed:0
[I 190823 15:24:36 scheduler:586] in 5m: new:0,success:0,retry:0,failed:0
[I 190823 15:25:36 scheduler:586] in 5m: new:0,success:0,retry:0,failed:0
[I 190823 15:26:36 scheduler:586] in 5m: new:0,success:0,retry:0,failed:0
[I 190823 15:27:36 scheduler:586] in 5m: new:0,success:0,retry:0,failed:0
[I 190823 15:28:37 scheduler:586] in 5m: new:0,success:0,retry:0,failed:0
[I 190823 15:29:37 scheduler:586] in 5m: new:0,success:0,retry:0,failed:0
[I 190823 15:30:37 scheduler:586] in 5m: new:0,success:0,retry:0,failed:0
[I 190823 15:31:37 scheduler:586] in 5m: new:0,success:0,retry:0,failed:0
[I 190823 15:32:37 scheduler:586] in 5m: new:0,success:0,retry:0,failed:0

但是配置多坑,我这边就踩到一个。

(base) C:\WINDOWS\system32>pyspider
c:\programdata\anaconda3\lib\site-packages\pyspider\libs\utils.py:196: FutureWarning: timeout is not supported on your platform.
  warnings.warn("timeout is not supported on your platform.", FutureWarning)
phantomjs fetcher running on port 25555
[I 190823 15:02:38 result_worker:49] result_worker starting...
[I 190823 15:02:38 processor:211] processor starting...
[I 190823 15:02:38 scheduler:647] scheduler starting...
[I 190823 15:02:38 scheduler:586] in 5m: new:0,success:0,retry:0,failed:0
[I 190823 15:02:39 tornado_fetcher:638] fetcher starting...
[I 190823 15:02:39 scheduler:782] scheduler.xmlrpc listening on 127.0.0.1:23333
[I 190823 15:02:39 run:420] phantomjs exited.
[I 190823 15:02:39 app:84] webui exiting...
[I 190823 15:02:39 tornado_fetcher:671] fetcher exiting...
[I 190823 15:02:39 scheduler:663] scheduler exiting...
[I 190823 15:02:39 processor:229] processor exiting...
[I 190823 15:02:40 result_worker:66] result_worker exiting...
Traceback (most recent call last):
  File "c:\programdata\anaconda3\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "c:\programdata\anaconda3\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\ProgramData\Anaconda3\Scripts\pyspider.exe\__main__.py", line 9, in <module>
  File "c:\programdata\anaconda3\lib\site-packages\pyspider\run.py", line 754, in main
    cli()
  File "c:\programdata\anaconda3\lib\site-packages\click\core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "c:\programdata\anaconda3\lib\site-packages\click\core.py", line 697, in main
    rv = self.invoke(ctx)
  File "c:\programdata\anaconda3\lib\site-packages\click\core.py", line 1043, in invoke
    return Command.invoke(self, ctx)
  File "c:\programdata\anaconda3\lib\site-packages\click\core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "c:\programdata\anaconda3\lib\site-packages\click\core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "c:\programdata\anaconda3\lib\site-packages\click\decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "c:\programdata\anaconda3\lib\site-packages\pyspider\run.py", line 165, in cli
    ctx.invoke(all)
  File "c:\programdata\anaconda3\lib\site-packages\click\core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "c:\programdata\anaconda3\lib\site-packages\click\decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "c:\programdata\anaconda3\lib\site-packages\pyspider\run.py", line 497, in all
    ctx.invoke(webui, **webui_config)
  File "c:\programdata\anaconda3\lib\site-packages\click\core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "c:\programdata\anaconda3\lib\site-packages\click\decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "c:\programdata\anaconda3\lib\site-packages\pyspider\run.py", line 384, in webui
    app.run(host=host, port=port)
  File "c:\programdata\anaconda3\lib\site-packages\pyspider\webui\app.py", line 59, in run
    from .webdav import dav_app
  File "c:\programdata\anaconda3\lib\site-packages\pyspider\webui\webdav.py", line 216, in <module>
    dav_app = WsgiDAVApp(config)
  File "c:\programdata\anaconda3\lib\site-packages\wsgidav\wsgidav_app.py", line 135, in __init__
    _check_config(config)
  File "c:\programdata\anaconda3\lib\site-packages\wsgidav\wsgidav_app.py", line 119, in _check_config
    raise ValueError("Invalid configuration:\n  - " + "\n  - ".join(errors))
ValueError: Invalid configuration:
  - Deprecated option 'domaincontroller': use 'http_authenticator.domain_controller' instead.

还好在网上看到解决方案,这是WsgiDAV发布了版本 pre-release 3.x导致的,所以只要把版本降下来就好了。 将wsgidav替换为2.4.1

(base) C:\WINDOWS\system32>python -m pip install wsgidav==2.4.1
Collecting wsgidav==2.4.1
  Downloading https://files.pythonhosted.org/packages/95/e8/88e25c17ff671f7fad21fe16cdc435c33c4befe35203bd47c05366af362a/WsgiDAV-2.4.1-py2.py3-none-any.whl (186kB)
     |████████████████████████████████| 194kB 7.0kB/s
Requirement already satisfied: jsmin in c:\programdata\anaconda3\lib\site-packages (from wsgidav==2.4.1) (2.2.2)
Requirement already satisfied: defusedxml in c:\programdata\anaconda3\lib\site-packages (from wsgidav==2.4.1) (0.6.0)
Requirement already satisfied: PyYAML in c:\programdata\anaconda3\lib\site-packages (from wsgidav==2.4.1) (3.12)
Installing collected packages: wsgidav
  Found existing installation: WsgiDAV 3.0.0
    Uninstalling WsgiDAV-3.0.0:
      Successfully uninstalled WsgiDAV-3.0.0
Successfully installed wsgidav-2.4.1

然后就顺利解决了。

(base) C:\WINDOWS\system32>pyspider
c:\programdata\anaconda3\lib\site-packages\pyspider\libs\utils.py:196: FutureWarning: timeout is not supported on your platform.
  warnings.warn("timeout is not supported on your platform.", FutureWarning)
phantomjs fetcher running on port 25555
[I 190823 15:18:12 result_worker:49] result_worker starting...
[I 190823 15:18:12 processor:211] processor starting...
[I 190823 15:18:12 scheduler:647] scheduler starting...
[I 190823 15:18:12 scheduler:586] in 5m: new:0,success:0,retry:0,failed:0
[I 190823 15:18:12 tornado_fetcher:638] fetcher starting...
[I 190823 15:18:12 scheduler:782] scheduler.xmlrpc listening on 127.0.0.1:23333
[I 190823 15:18:12 app:76] webui running on 0.0.0.0:5000
[I 190823 15:19:12 scheduler:586] in 5m: new:0,success:0,retry:0,failed:0
[I 190823 15:20:12 scheduler:586] in 5m: new:0,success:0,retry:0,failed:0
[I 190823 15:21:12 scheduler:586] in 5m: new:0,success:0,retry:0,failed:0
[I 190823 15:23:36 scheduler:586] in 5m: new:0,success:0,retry:0,failed:0
[I 190823 15:24:36 scheduler:586] in 5m: new:0,success:0,retry:0,failed:0
[I 190823 15:25:36 scheduler:586] in 5m: new:0,success:0,retry:0,failed:0
[I 190823 15:26:36 scheduler:586] in 5m: new:0,success:0,retry:0,failed:0
[I 190823 15:27:36 scheduler:586] in 5m: new:0,success:0,retry:0,failed:0
[I 190823 15:28:37 scheduler:586] in 5m: new:0,success:0,retry:0,failed:0
[I 190823 15:29:37 scheduler:586] in 5m: new:0,success:0,retry:0,failed:0
[I 190823 15:30:37 scheduler:586] in 5m: new:0,success:0,retry:0,failed:0
[I 190823 15:31:37 scheduler:586] in 5m: new:0,success:0,retry:0,failed:0
[I 190823 15:32:37 scheduler:586] in 5m: new:0,success:0,retry:0,failed:0
[I 190823 15:33:37 scheduler:586] in 5m: new:0,success:0,retry:0,failed:0
[I 190823 15:34:37 scheduler:586] in 5m: new:0,success:0,retry:0,failed:0
[I 190823 15:35:37 scheduler:586] in 5m: new:0,success:0,retry:0,failed:0
[I 190823 15:36:37 scheduler:586] in 5m: new:0,success:0,retry:0,failed:0
[I 190823 15:37:37 scheduler:586] in 5m: new:0,success:0,retry:0,failed:0

浏览器输入:127.0.0.1:5000或者localhost:5000

本文标题: pyspider遇到的坑
本文作者: 豆果
发布时间: 2019年08月23日 - 15:44
最后更新: 2019年08月23日 - 15:44
知识共享许可协议 转载请保留原文链接及作者