Fastdeploy编译中遇到nvcc fatal : Unsupported gpu architecture ‘compute_35‘的解决

背景

使用Paddle的Fastdeploy,安装过程中需要对C++ SDK进行编译,编译过程中报标题所述的错误。后来在github上找到了解决办法

环境

  • GPU: RTX3060Ti
  • Ubuntu 2204
  • cuda 12.1.1
  • TensorRT-8.6.1.6
  • opencv 4.7
  • Fastdeploy develop,commit id = cd0ee79c91d4ed1103abdc65ff12ccadd23d0827

复现路径

  1. 安装cuda-12.1.1(官网下载步骤及链接
  2. 安装opencv, 到github官网git clone下来,手动编译,资料csdn很多就不贴了。
  3. 安装TensorRT。按照Paddle官网要求,CUDA 工具包 12.0 配合 cuDNN v8.9.1, 如需使用 PaddleTensorRT 推理,需配合 TensorRT8.6.1.6(官网链接提供了,tar包的,解压后设置一下路径就可以了,但下载需要nvidia developer帐号登录,免费注册)
  4. 安装FastDeploy,我按照的是这个教程,其中以下cmake选项有几处问题需要手动修改。
git clone https://github.com/PaddlePaddle/FastDeploy.git
cd FastDeploy
mkdir build && cd build
cmake .. -DENABLE_ORT_BACKEND=ON \
         -DENABLE_PADDLE_BACKEND=ON \
         -DENABLE_OPENVINO_BACKEND=ON \
         -DENABLE_TRT_BACKEND=ON \
         -DWITH_GPU=ON \
         -DTRT_DIRECTORY=/Paddle/TensorRT-8.4.1.5 \  # TensorRT的路径要根据你刚才解压的Tar包进行修改
         -DCUDA_DIRECTORY=/usr/local/cuda \
         -DCMAKE_INSTALL_PREFIX=${PWD}/compiled_fastdeploy_sdk \
         -DENABLE_VISION=ON \
         -DOPENCV_DIRECTORY=/usr/lib/x86_64-linux-gnu/cmake/opencv4 \  # 如果你是源码编译再make install的话,就不用改
         -DENABLE_TEXT=ON
make -j12
make install
  • 注意点1 cmake的选项需要调整,具体见上面。
  • 注意点2,个人认为就是Fastdeploy的问题,问题如下,留意那一堆的nvcc fatal。其中的compute_35其实就是老的计算架构SM_35,我的显卡是SM_86,不应该出现此问题。
make -j16
[  3%] Built target extern_onnxruntime
[  6%] Built target extern_paddle_inference
[  8%] Built target extern_fast_tokenizer
[ 10%] Built target extern_paddle2onnx
[ 21%] Built target yaml-cpp
[ 21%] Built target yaml-cpp-parse
[ 22%] Built target yaml-cpp-read
[ 23%] Built target yaml-cpp-sandbox
Consolidate compiler generated dependencies of target fastdeploy
[ 23%] Building CUDA object CMakeFiles/fastdeploy.dir/fastdeploy/runtime/backends/common/cuda/adaptive_pool2d_kernel.cu.o
[ 23%] Building CUDA object CMakeFiles/fastdeploy.dir/fastdeploy/function/cuda_cast.cu.o
[ 23%] Building CUDA object CMakeFiles/fastdeploy.dir/fastdeploy/runtime/backends/paddle/ops/grid_sample_3d.cu.o
[ 24%] Building CUDA object CMakeFiles/fastdeploy.dir/fastdeploy/runtime/backends/paddle/ops/voxelize_op.cu.o
[ 24%] Building CUDA object CMakeFiles/fastdeploy.dir/fastdeploy/runtime/backends/paddle/ops/iou3d_nms_kernel.cu.o
[ 25%] Building CUDA object CMakeFiles/fastdeploy.dir/fastdeploy/runtime/backends/paddle/ops/centerpoint_postprocess_op.cu.o
[ 25%] Building CXX object CMakeFiles/fastdeploy.dir/fastdeploy/vision/classification/contrib/yolov5cls/preprocessor.cc.o
[ 25%] Building CXX object CMakeFiles/fastdeploy.dir/fastdeploy/vision/classification/contrib/resnet.cc.o
[ 25%] Building CXX object CMakeFiles/fastdeploy.dir/fastdeploy/vision/classification/contrib/yolov5cls/yolov5cls.cc.o
[ 26%] Building CXX object CMakeFiles/fastdeploy.dir/fastdeploy/vision/classification/ppcls/model.cc.o
[ 26%] Building CXX object CMakeFiles/fastdeploy.dir/fastdeploy/vision/classification/ppcls/postprocessor.cc.o
[ 26%] Building CXX object CMakeFiles/fastdeploy.dir/fastdeploy/vision/classification/ppcls/preprocessor.cc.o
[ 27%] Building CXX object CMakeFiles/fastdeploy.dir/fastdeploy/vision/classification/ppshitu/ppshituv2_rec_postprocessor.cc.o
[ 28%] Building CXX object CMakeFiles/fastdeploy.dir/fastdeploy/vision/classification/ppshitu/ppshituv2_rec.cc.o
[ 28%] Building CXX object CMakeFiles/fastdeploy.dir/fastdeploy/vision/classification/contrib/yolov5cls/postprocessor.cc.o
nvcc fatal   : Unsupported gpu architecture 'compute_35'
nvcc fatal   : Unsupported gpu architecture 'compute_35'
nvcc fatal   : Unsupported gpu architecture 'compute_35'
nvcc fatal   : Unsupported gpu architecture 'compute_35'
nvcc fatal   : Unsupported gpu architecture 'compute_35'
[ 28%] Building CXX object CMakeFiles/fastdeploy.dir/fastdeploy/vision/classification/ppshitu/ppshituv2_rec_preprocessor.cc.o
nvcc fatal   : Unsupported gpu architecture 'compute_35'
make[2]: *** [CMakeFiles/fastdeploy.dir/build.make:496:CMakeFiles/fastdeploy.dir/fastdeploy/function/cuda_cast.cu.o] 错误 1
make[2]: *** 正在等待未完成的任务....
make[2]: *** [CMakeFiles/fastdeploy.dir/build.make:510:CMakeFiles/fastdeploy.dir/fastdeploy/runtime/backends/common/cuda/adaptive_pool2d_kernel.cu.o] 错误 1
make[2]: *** [CMakeFiles/fastdeploy.dir/build.make:706:CMakeFiles/fastdeploy.dir/fastdeploy/runtime/backends/paddle/ops/grid_sample_3d.cu.o] 错误 1
make[2]: *** [CMakeFiles/fastdeploy.dir/build.make:734:CMakeFiles/fastdeploy.dir/fastdeploy/runtime/backends/paddle/ops/voxelize_op.cu.o] 错误 1
make[2]: *** [CMakeFiles/fastdeploy.dir/build.make:720:CMakeFiles/fastdeploy.dir/fastdeploy/runtime/backends/paddle/ops/iou3d_nms_kernel.cu.o] 错误 1
make[2]: *** [CMakeFiles/fastdeploy.dir/build.make:692:CMakeFiles/fastdeploy.dir/fastdeploy/runtime/backends/paddle/ops/centerpoint_postprocess_op.cu.o] 错误 1
make[1]: *** [CMakeFiles/Makefile2:310:CMakeFiles/fastdeploy.dir/all] 错误 2
make: *** [Makefile:156:all] 错误 2

问题的解决

解决的方法也很简单,修改FastDeploy/cmake/cuda.cmake文件即可。

if(NOT WITH_GPU)
  return()
endif()

# This is to eliminate the CMP0104 warnings from cmake 3.18+.
# Instead of setting CUDA_ARCHITECTURES, we will set CMAKE_CUDA_FLAGS.
set(CMAKE_CUDA_ARCHITECTURES OFF)

if(BUILD_ON_JETSON)
  set(fd_known_gpu_archs "53 62 72")
  set(fd_known_gpu_archs10 "53 62 72")
else()
  message("Using New Release Strategy - All Arches Packge")
#  set(fd_known_gpu_archs "35 50 52 60 61 70 75 80 86") #原来
#  set(fd_known_gpu_archs10 "35 50 52 60 61 70 75")		#原来

  set(fd_known_gpu_archs "50 52 60 61 70 75 80 86")  #修改
  set(fd_known_gpu_archs10 "50 52 60 61 70 75")		 #修改

  set(fd_known_gpu_archs11 "50 60 61 70 75 80")
endif()

######################################################################################
# A function for automatic detection of GPUs installed  (if autodetection is enabled)
# Usage:
#   detect_installed_gpus(out_variable)

文件开头包含 “fd_known_gpu_archs”“fd_known_gpu_archs10” 两个地方,删除35后,make即可通过。

[100%] Linking CUDA device code CMakeFiles/fastdeploy.dir/cmake_device_link.o
[100%] Linking CXX shared library libfastdeploy.so
[100%] Built target fastdeploy
[100%] Built target patchelf_paddle_inference
Logo

NVIDIA官方入驻,分享最新的官方资源以及活动/会议信息,精选收录AI相关技术内容,欢迎大家加入社区并参与讨论。

更多推荐