1 JupyterHub on Kubernetes部署与应用指南
2 jupyterHub 安装
2.1 virtualenv 下安装与配置
2.1.1 安装jupyterhub
在装好的虚拟环境下安装jupyterhub,首先要保证使用的python版本是3.0以上的版本,官方说明只支持3.0以上的版本.
pip install jupyterhub
在使用pip安装的时候需要安装nodejs/npm,
sudo apt-get install npm nodejs-legacy
如果想要在本地运行notebook服务,还需要在本地安装jupyter notebook包
pip install --upgrade notebook
2.1.2 配置jupyterhub
jupyterhub --generate-config
这时候当前目录相面会生成jupyterhub_config.py的配置文件.
注, 如果想要在某个文件目录下生成,则需要在那个文件目录下运行上面的文件.
生成配置文件之后,可以进行适当的配置, 如
# 端口和ip c.JupyterHub.ip = 'IP地址' c.JupyterHub.port = 端口
配置完成之后, 启动jupyterhub, 可能会遇到下面的问题:
[E 2019-08-12 17:12:53.624 JupyterHub proxy:658] Failed to find proxy ['configurable-http-proxy'] The proxy can be installed with `npm install -g configurable-http-proxy`.To install `npm`, install nodejs which includes `npm`.If you see an `EACCES` error or permissions error, refer to the `npm` documentation on How To Prevent Permissions Errors. [C 2019-08-12 17:12:53.624 JupyterHub app:2349] Failed to start proxy Traceback (most recent call last): File "/home/gpyz/venv/xgb/lib/python3.5/site-packages/jupyterhub/app.py", line 2347, in start await self.proxy.start() File "/home/gpyz/venv/xgb/lib/python3.5/site-packages/jupyterhub/proxy.py", line 650, in start cmd, env=env, start_new_session=True, shell=shell File "/usr/lib/python3.5/subprocess.py", line 947, in __init__ restore_signals, start_new_session) File "/usr/lib/python3.5/subprocess.py", line 1551, in _execute_child raise child_exception_type(errno_num, err_msg) FileNotFoundError: [Errno 2] No such file or directory: 'configurable-http-proxy'
则需要参考下面的操作:
# 参考github上jupyterhub的说明, 在/opt/nodejs 目录中安装 npm install -g configurable-http-proxy 注, 当然,不一定在opt目录下面, 可以通过whereis nodejs,查看你所安装的nodejs所在的位置,我的文件就保存在'/usr/local/lib/nodejs/node-v10.15.0/bin'下 c.JupyterHub.proxy_cmd = ['/opt/nodejs/bin/configurable-http-proxy',] ps, 环境变量中如果配置了node的路径,这边可以忽略
# 设置默认登陆jupyterhub后的目录,这是相对目录,用户登陆后会在当前用户的/notebook文件夹下,如果用户没有/notebook这个文件夹,启动过程可能会失败 c.Spawner.notebook_dir = '~/notebook' # 该方式设置了绝对路径,所有用户登陆jupyterhub的时候,都会在'/home/ubuntu/notebook/'这个目录下 c.Spawner.notebook_dir = 'home/ubuntu/notebook'
2.1.3 创建多用户
首先除了当前root用户之外,我们还可以新建其他的用户作为普通用户,这个还是跟linux下添加用户一样
adduser user1
根据系统提示,设置密码和身份等东西.
设置完成之后在配置文件中添加相应的设置
1、添加普通用户 c.Authenticator.whitelist = {'user1'} 2、添加管理员 c.Authenticator.admin_users = {'ubuntu'}
2.1.4 非root用户下创建env虚拟环境, 并且添加到jupyterhub中
step1、首先在当前用户下使用virtualenv创建本地虚拟环境
step2、该虚拟环境下安装ipykernel
pip install ipykernel
step3、 安装完ipykernel之后执行下面的命令
python -m ipykernel install --user --name [环境名]--display-name [简称]
例如
python -m ipykernel install --user --name env1 --display-name env1
然后重启当前用户的jupyterhub server就可以看到该环境了
样例:
(env1) [ly@deepq venv]$ python -m ipykernel install --user --name env1 --display-name env1 Installed kernelspec env1 in /home/ly/.local/share/jupyter/kernels/env1
在非root用户下查看当前可使用的环境:
[liyu@localhost virtualenv]$ jupyter kernelspec list Available kernels: tensorflow2.3 /home/liyu/.local/share/jupyter/kernels/tensorflow2.3 tf1.0 /home/liyu/.local/share/jupyter/kernels/tf1.0 venv /home/liyu/.local/share/jupyter/kernels/venv python3 /usr/local/share/jupyter/kernels/python3
2.1.5 root用户下创建多个版本的虚拟环境,并且添加到jupyterhub中
step1、首先在root用户下使用virtualenv创建多个虚拟环境
例如:
在root用户下,使用vitrualenv创建名为tensorflow1.0, tensorflow2.4的虚拟环境
(env) [root@localhost jupyterhub]# tree -L 2 . ├── env │ ├── bin │ ├── lib │ ├── lib64 │ └── pyvenv.cfg ├── tensorflow1.0 │ ├── bin │ ├── lib │ ├── lib64 │ └── pyvenv.cfg └── tensorflow2.4 ├── bin ├── lib ├── lib64 └── pyvenv.cfg
step2、该虚拟环境下安装ipykernel
pip install ipykernel
step3、 安装完ipykernel之后执行下面的命令
python -m ipykernel install --name [环境名]--display-name [简称]
注,与非root用户下指定环境的区别在于没有使用--user命令
例如
python -m ipykernel install --name tensorflow1.0 --display-name tensorflow1.0 python -m ipykernel install --name tensorflow2.4 --display-name tensorflow2.4
注
jupyterhub服务 不需要重启
查看可以使用的公用环境
查看命令如下:
jupyter kernelspec list
例如:
查看step3创建的环境:
(env) [root@localhost jupyterhub]# jupyter kernelspec list Available kernels: python3 /opt/jupyterhub/env/share/jupyter/kernels/python3 tensorflow1.0 /usr/local/share/jupyter/kernels/tensorflow1.0 tensorflow2.4 /usr/local/share/jupyter/kernels/tensorflow2.4
退出(deactivate)当前使用的虚拟环境查看的区别如下:
[root@localhost virtualenv]# jupyter kernelspec list Available kernels: python3 /usr/local/share/jupyter/kernels/python3 tensorflow1.0 /usr/local/share/jupyter/kernels/tensorflow1.0 tensorflow2.4 /usr/local/share/jupyter/kernels/tensorflow2.4
在非root用户下查看当前可使用的环境:
[liyu@localhost virtualenv]$ jupyter kernelspec list Available kernels: tensorflow2.3 /home/liyu/.local/share/jupyter/kernels/tensorflow2.3 tf1.0 /home/liyu/.local/share/jupyter/kernels/tf1.0 venv /home/liyu/.local/share/jupyter/kernels/venv python3 /usr/local/share/jupyter/kernels/python3 tensorflow1.0 /usr/local/share/jupyter/kernels/tensorflow1.0 tensorflow2.4 /usr/local/share/jupyter/kernels/tensorflow2.4
与非root用户下创建env虚拟环境的虚拟环境相比多了两个可用的虚拟环境。
2.2 Jupyterhub中使用JupyterLab
1、安装jupyterlab
pip install jupyterlab
2、配置jupyterhub_config.py文件
在jupyterhub_config.py文件中找到
c.Spawner.default_url=''
修改为:
c.Spawner.default_url='/lab'
重启jupyterhub, 即可.
2.3 jupyterhub 插件
2.3.1 jupyterlab-execute-time
jupyterlab-execute-time插件帮助我们在jupyter lab中记录每个单元cell的执行开始以及运行耗时
jupyter labextension install jupyterlab-execute-time
2.3.2 jupyterlab-drawio
jupyterlab-drawio是一个让我们可以在jupyter lab界面内基于drawio绘制流程图、思维导图等示意图的插件:
jupyter labextension install jupyterlab-drawio
2.3.3 jupyterlab-spreadsheet
jupyterlab-spreadsheet帮助我们在jupyter lab中查看表格类文件,特别是其支持查看多工作表的excel表格文件:
jupyter labextension install jupyterlab-spreadsheet
2.3.4 jupyterlab-system-monitor
jupyterlab-system-monitor通过在jupyter lab界面中添加资源监视器部件,能帮助我们在工作过程中方便的看到CPU、内存的实时占用情况:
pip install nbresuse jupyter labextension install jupyterlab-topbar-extension jupyterlab-system-monitor
2.3 jupyterhub的启动配置
2.3.1 使用nohup让程序在后台运行.
注, 运行时好像需要在root用户下运行,其他用户运行可能会导致有的用户启动不了.如果想要使用sudo运行而不用root用户运行,可以参考Using sudo to run JupyterHub without root privileges
nohup jupyterhub > jupyterhub.log &
2.3.2 开机运行
开机配置
sudo cat /etc/systemd/system/jupyterhub.service [Unit] Description=Jupyterhub After=syslog.target network.target [Service] User=root ExecStart=/home/xyq/.jupyter/run_hub [Install] WantedBy=multi-user.target
启动
sudo systemctl enable jupyterhub # 开机自启动 sudo systemctl daemon-reload # 加载配置文件 sudo systemctl start jupyterhub # 启动 sudo journalctl -u jupyterhub # 查看log
2.3.3 jupyterhub启用代理
为了方便安全使用, 下面介绍使用nginx代理
server { listen 443 ssl http2; listen [::]:443 ssl http2; resolver 192.168.2.1 114.114.114.114; set $backend "https://jp.abcgogo.com:12443"; server_name jp.abcgogo.com; ssl_certificate /usr/syno/etc/certificate/ReverseProxy/faea4d05-2458-4ffc-acfe-e4b48e5a04f9/fullchain.pem; ssl_certificate_key /usr/syno/etc/certificate/ReverseProxy/faea4d05-2458-4ffc-acfe-e4b48e5a04f9/privkey.pem; add_header Strict-Transport-Security "max-age=15768000; includeSubdomains; preload" always; location / { proxy_set_header Host $http_host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; proxy_intercept_errors on; # WebSocket support proxy_http_version 1.1; proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection "upgrade"; proxy_read_timeout 120s; proxy_next_upstream error; proxy_pass $backend; } }
3 Docker 下安装配置jupyterhub
3.1 pull 一个纯净的ubuntu环境
3.2 进入ubuntu docker, 安装相关软件, 这里仅考虑安装的最基本的应用需求, 其他需要自行下载.
python3 vim
docker下的应用的安装和使用可以参见Docker使用教程和python3.5升级到python3.6中的相关内容.
3.3 安装基于python3的虚拟环境
虚拟环境的使用参考virtualenv使用教程
当然你也可以使用annaconda进行安装, 据说annaconda安装不会出现下面的问题.
3.4 配置好之后进入虚拟环境, 使用pip安装和生成jupyterhub配置文件, 同第一节中的安装方式.
在执行过程中会出现下面的错误, 正确的使用方式如下:
不使用npm安装configurable-http-proxy
直接从源码安装nodejs,安装方法见官网Installation,我这里为了方便,直接将源码放在虚拟环境的conf文件夹下.
然后在~/.bashrc中添加环境变量
export NODEJS_HOME=/usr/local/lib/nodejs/node-v10.15.0/bin export PATH=$NODEJS_HOME:$PATH export NODEJS_HOME
其中,node-v10.15.0就是自己下载的源码压缩包解压缩的内容.
使用源码安装配置的nodejs版本为v10.15.0, 官网说源码自带npm, 里面的npm版本为6.6.0
所有才有前文,我的nodejs的文件目录跟默认的安装路径的区别,
这时候也可以使用源码安装node后自带的npm来安装configurable-http-proxy, 安装好后就不需要在jupyterhub_config.py中配置了
3.5 在配置文件目录下运行jupyterhub,其他配置没有改变,结束.
Docker下运行配置,与非docker下的情况类似,主要是如何访问docker下的jupyterhub,下面提供一个简单的访问使用方式
- 1、设置docker容器和宿主机的端口映射,这个在运行docker的时候设置, 例如:
docker run -p 18000:8000 -it [IMAGE ID]
因为jupyterhub启动默认使用的是8000端口,所以直接映射到container的8000端口.
启动之后进入container,然后只需要在jupyterhub_conf.py里面配置c.Authenticator.admin_users和c.Authenticator.whitelist即可.
ps,如果是要在界面里面添加用户,只能添加ubuntu已经添加好的user
然后使用nohup jupyterhub &后台运行jupyterhub, 最后使用ctrl + p + q退出但是并没有关闭container. 然后就可以在其他的机器的浏览器中输入和运行ip:18000,输入对应的用户名和密码就可以访问jupyterhub了.
Docker镜像
这里提供一个已经安装配置到的纯净docker镜像地址, 镜像中只包含jupyterhub.如果在安装过程中出现错误,可以pull 下来对比一下.
错误
有人在使用sudo apt-get install npm nodejs-legacy的时候,安装完成后就算在jupyterhub_configure.py下配置了configurable-http-proxy的路径,或者使用configurable-http-proxy -h的时候会出现下面的问题,主要是nodejs安装版本可能比较低
使用sudo apt-get install npm安装完成之后,查看npm的版本npm -v, 版本号为3.5.2
使用sudo apt-get install nodejs-legacy安装完成之后,查看node的版本node -v, 版本号为v4.2.6
使用npm install -g configurable-http-proxy 安装会出现下面的提示
- |--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| loadRequestedDeps \ |###############################-----------------------------------------------------------------------------------------------------------------------| loadDep:winston -> header | |###############################-----------------------------------------------------------------------------------------------------------------------| loadDep:winston-transport - |####################################################--------------------------------------------------------------------------------------------------| /usr/local/bin/configurable-http-proxy -> /usr/local/lib/node_modules/configurable-http-proxy/bin/configurable-http-proxy /usr/local/lib `-- configurable-http-proxy@4.0.1 +-- commander@2.19.0 +-- http-proxy@1.17.0 | +-- eventemitter3@3.1.0 | +-- follow-redirects@1.7.0 | | `-- debug@3.2.6 | `-- requires-port@1.0.0 +-- lynx@0.2.0 | +-- mersenne@0.0.4 | `-- statsd-parser@0.0.4 +-- strftime@0.10.0 `-- winston@3.1.0 +-- async@2.6.2 | `-- lodash@4.17.11 +-- diagnostics@1.1.1 | +-- colorspace@1.1.1 | | +-- color@3.0.0 | | | +-- color-convert@1.9.3 | | | | `-- color-name@1.1.3 | | | `-- color-string@1.5.3 | | | `-- simple-swizzle@0.2.2 | | | `-- is-arrayish@0.3.2 | | `-- text-hex@1.0.0 | +-- enabled@1.0.2 | | `-- env-variable@0.0.5 | `-- kuler@1.0.1 | `-- colornames@1.1.1 +-- is-stream@1.1.0 +-- logform@1.10.0 | +-- colors@1.3.3 | +-- fast-safe-stringify@2.0.6 | +-- fecha@2.3.3 | `-- ms@2.1.1 +-- one-time@0.0.4 +-- readable-stream@2.3.6 | +-- core-util-is@1.0.2 | +-- inherits@2.0.3 | +-- isarray@1.0.0 | +-- process-nextick-args@2.0.0 | +-- safe-buffer@5.1.2 | +-- string_decoder@1.1.1 | `-- util-deprecate@1.0.2 +-- stack-trace@0.0.10 +-- triple-beam@1.3.0 `-- winston-transport@4.3.0
configurable-http-proxy的整个安装过程没有错误提示, 但是使用configurable-http-proxy -h检查的时候出现下面的错误:
/usr/local/lib/node_modules/configurable-http-proxy/node_modules/winston/lib/winston.js:11 const { warn } = require('./winston/common'); ^ SyntaxError: Unexpected token { at exports.runInThisContext (vm.js:53:16) at Module._compile (module.js:374:25) at Object.Module._extensions..js (module.js:417:10) at Module.load (module.js:344:32) at Function.Module._load (module.js:301:12) at Module.require (module.js:354:17) at require (internal/module.js:12:17) at Object.<anonymous> (/usr/local/lib/node_modules/configurable-http-proxy/bin/configurable-http-proxy:14:13) at Module._compile (module.js:410:26) at Object.Module._extensions..js (module.js:417:10)
上面的安装步骤都是正确的,而且没有报错,官网github的issue中有关于上面类似的问题,里面有人提到可能是npm的版本过低或者不对应等问题,于是到node的官网查看node的发行版本,发现node的版本都已经V10以上了,而本地npm安装的版本才4.推测可能是版本太低,不兼容所致.
于是尝试升级npm,node,结果使用npm install npm -g依然无法升级到最新的版本.最终的解决方法是直接从官网把源码下下来,然后配置一下环境变量,具体参见前面的操作步骤
后面,上面的的问题通过更新的方法没有解决有点不甘心,然后又使用whereis nodejs将安装路径找出来.
nodejs: /usr/bin/nodejs /usr/lib/nodejs /usr/include/nodejs /usr/share/nodejs /usr/share/man/man1/nodejs.1.gz
然后找到configurable-http-proxy的安装目录/usr/local/lib/node_modules/configurable-http-proxy,将它添加到jupyterhub_config.py中,然后运行jupyterhub.
之前的错误提示没了,但是后面又出现了下面的问题:
[I 2019-03-07 14:54:04.979 JupyterHub proxy:567] Starting proxy @ http://:8000 [C 2019-03-07 14:54:04.980 JupyterHub app:1867] Failed to start proxy Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/jupyterhub/app.py", line 1865, in start await self.proxy.start() File "/usr/local/lib/python3.6/dist-packages/jupyterhub/proxy.py", line 571, in start self.proxy_process = Popen(cmd, env=env, start_new_session=True, shell=shell) File "/usr/lib/python3.6/subprocess.py", line 709, in __init__ restore_signals, start_new_session) File "/usr/lib/python3.6/subprocess.py", line 1344, in _execute_child raise child_exception_type(errno_num, err_msg, err_filename) PermissionError: [Errno 13] Permission denied: '/usr/local/lib/node_modules/configurable-http-proxy'
权限问题,一脸懵,所有的操作都是root用户,又去查看了node_modules这个文件,发现用户名是nobody,又是一脸懵.算了,使用777修改文件夹的权限.修改完之后重新运行jupyterhu还是同样的问题,这个问题就没有解决了,最后还是使用上面的的教程,从官网下载nodejs替换之后两个问题都没有出现,jupyterhub可以在docker上启动,其他配置需要自己配置.
ps,之前学习使用nodejs的时候刚开始也是使用npm来做程序管理的, 但是是由于npm一直无法更新到最新, 后来才从源码安装的.
ps网上说npm的镜像在国外, 也有可能自己在使用npm升级的时候没有升级成功, 导致一系列的错误, 但是升级不成功他也不能没有提示吧. 有空可以设置使用淘宝镜像然后再尝试尝试.
jupyterhub用户登陆失败
使用某一个用户登陆成功,然后退出,使用另外一个用户登陆失败。
[I 2020-12-03 10:35:28.350 JupyterHub login:43] User logged out: ly [I 2020-12-03 10:35:28.368 JupyterHub log:181] 302 GET /hub/logout -> /hub/login (@::ffff:192.168.11.51) 20.69ms [I 2020-12-03 10:35:28.387 JupyterHub log:181] 200 GET /hub/login (@::ffff:192.168.11.51) 5.69ms [W 2020-12-03 10:35:34.858 JupyterHub auth:1032] PAM Authentication failed (cxh@::ffff:192.168.11.51): [PAM Error 7] Authentication failure [W 2020-12-03 10:35:34.859 JupyterHub base:762] Failed login for cxh [I 2020-12-03 10:35:34.861 JupyterHub log:181] 200 POST /hub/login?next= (@::ffff:192.168.11.51) 1971.52ms [I 2020-12-03 10:35:38.887 JupyterHub log:181] 302 GET /hub/ -> /hub/login?next=%2Fhub%2F (@::ffff:192.168.11.51) 1.16ms [I 2020-12-03 10:35:38.900 JupyterHub log:181] 200 GET /hub/login?next=%2Fhub%2F (@::ffff:192.168.11.51) 1.21ms [I 2020-12-03 10:35:39.618 JupyterHub log:181] 302 GET /hub/ -> /hub/login?next=%2Fhub%2F (@::ffff:192.168.11.51) 0.89ms [I 2020-12-03 10:35:40.267 JupyterHub log:181] 302 GET /hub/ -> /hub/login?next=%2Fhub%2F (@::ffff:192.168.11.51) 0.53ms [W 2020-12-03 10:35:49.040 JupyterHub auth:1032] PAM Authentication failed (cxh@::ffff:192.168.11.51): [PAM Error 7] Authentication failure [W 2020-12-03 10:35:49.041 JupyterHub base:762] Failed login for cxh
解决方法:
在配置文件在jupyterhub_config.py中添加以下代码:
c.PAMAuthenticator.open_sessions = False
参考文献:
PAM Authentication failed (adaragso@x.x.x.x): [PAM Error 3] Error in service module #2235
Centos搭建jupyterhub环境
jupyterhub下使用tensorflow-gpu产生的错误
[update2019-09-19]使用jupyter的过程出现了一个奇怪的问题, 在终端下测试使用tensorflow调用gpu的情况没有问题, 但是在jupyter使用同样的代码则检测不到gpu
具体检测代码如下:
from tensorflow.python.client import device_lib print(device_lib.list_local_devices()) # 打印可用的device, 包括cpu和gpu
输出结果:
[name: "/device:CPU:0" device_type: "CPU" memory_limit: 268435456 locality { } incarnation: 15259160786169634382 , name: "/device:XLA_GPU:0" device_type: "XLA_GPU" memory_limit: 17179869184 locality { } incarnation: 8968781236993845730 physical_device_desc: "device: XLA_GPU device" , name: "/device:XLA_CPU:0" device_type: "XLA_CPU" memory_limit: 17179869184 locality { } incarnation: 9349831921872187387 physical_device_desc: "device: XLA_CPU device" ]
终端下调用执行上面的代码:
[name: "/device:CPU:0" device_type: "CPU" memory_limit: 268435456 locality { } incarnation: 8983197487004247094 , name: "/device:XLA_GPU:0" device_type: "XLA_GPU" memory_limit: 17179869184 locality { } incarnation: 17880524038454437849 physical_device_desc: "device: XLA_GPU device" , name: "/device:XLA_CPU:0" device_type: "XLA_CPU" memory_limit: 17179869184 locality { } incarnation: 6972433931048861409 physical_device_desc: "device: XLA_CPU device" , name: "/device:GPU:0" device_type: "GPU" memory_limit: 7967745639 locality { bus_id: 1 links { } } incarnation: 11302034368239340091 physical_device_desc: "device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1" ]
对比中断和jupyterhub下的输出结果, 会发现jupyterhub下缺了GPU信息, 然后在程序用调用gpu则会报错:
调用程序如下:
import tensorflow as tf with tf.device('/cpu:0'): a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a') b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b') with tf.device('/gpu:0'): c = tf.matmul(a, b) sess = tf.Session(config=tf.ConfigProto(log_device_placement=True)) sess.run(tf.global_variables_initializer()) print(sess.run(c))
部分报错信息如下:
InvalidArgumentError: Cannot assign a device for operation MatMul: node MatMul (defined at <ipython-input-1-4db82431c7bb>:7) was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0, /job:localhost/replica:0/task:0/device:XLA_GPU:0 ]. Make sure the device specification refers to a valid device. [[MatMul]] Errors may have originated from an input operation. Input Source operations connected to node MatMul: b (defined at <ipython-input-1-4db82431c7bb>:5) a (defined at <ipython-input-1-4db82431c7bb>:4)
然后在后台的jupyterhub后台输出的日志中的报错结果如下, 但同样的程序在终端下执行缺不会出问题.
2019-09-19 17:15:38.137139: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3600000000 Hz 2019-09-19 17:15:38.137612: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x4ea6990 executing computations on platform Host. Devices: 2019-09-19 17:15:38.137626: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): <undefined>, <undefined> 2019-09-19 17:15:38.137759: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-09-19 17:15:38.138151: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.7335 pciBusID: 0000:01:00.0 2019-09-19 17:15:38.138236: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory 2019-09-19 17:15:38.138289: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory 2019-09-19 17:15:38.138336: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory 2019-09-19 17:15:38.138384: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory 2019-09-19 17:15:38.138424: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory 2019-09-19 17:15:38.138467: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory 2019-09-19 17:15:38.140342: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7 2019-09-19 17:15:38.140355: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1663] Cannot dlopen some GPU libraries. Skipping registering GPU devices... 2019-09-19 17:15:38.140367: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-09-19 17:15:38.140373: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0 2019-09-19 17:15:38.140377: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N [I 2019-09-19 17:17:29.796 SingleUserNotebookApp log:158] 200 GET /user/jerry/api/contents/jerry/workshop/projects/jupyter_projects/Untitled.ipynb?content=0&_=1568884528992 (jerry@10.15.176.199) 3.05ms [I 2019-09-19 17:17:29.821 SingleUserNotebookApp handlers:164] Saving file at /jerry/workshop/projects/jupyter_projects/Untitled.ipynb [I 2019-09-19 17:17:29.986 SingleUserNotebookApp log:158] 200 PUT /user/jerry/api/contents/jerry/workshop/projects/jupyter_projects/Untitled.ipynb (jerry@10.15.176.199) 169.95ms
查阅相关文档, 需要对jupyterhub做如下设置, 在jupyterhub的配置文件中添加下面一行:
c.Spawner.env_keep.append('LD_LIBRARY_PATH') # 这行是我们踩的坑,因为用了GPU版的tensorflow,这个目的是将LD_LIBRARY_PATH的路径放到jupyterhub中,这样才能正确使用GPU版的tensorflow。
重启之后, 运行上面的两段代码, 没有报错, 然后跟终端下运行的结果一致.
另外, 有同事在jupyter下运行也出现同样的问题, 不过他的解决方案是不在root下启动, 或者不使用sudo启动jupyter.
参考文献
搭建一套云工作平台 (JupyterHub + Rstudio Server)
JupyterHub on Kubernetes部署与应用指南