热门标签 | HotTags
当前位置:  开发笔记 > 编程语言 > 正文

深度学习之Caffe完全掌握:配置GPU驱动安装cuda

深度学习之Caffe完全掌握:配置GPU驱动安装cuda安装Nvidia驱动其他博客一般是要求安装你的电脑对应的版本,我直接装了nvidia-367&

深度学习之Caffe完全掌握:配置GPU驱动安装cuda

这里写图片描述



安装Nvidia驱动

其他博客一般是要求安装你的电脑对应的版本,我直接装了nvidia-367(为了配合cuda8.0),也可以用。
你也可以参照:http://blog.csdn.net/xuzhongxiong/article/details/52717285

root@master# sudo add-apt-repository ppa:xorg-edgers/ppa
root@master# sudo apt-get update
root@master# sudo apt-get install nvidia-367
root@master# sudo apt-get install mesa-common-dev
root@master# sudo apt-get install freeglut3-dev
root@master# nvidia-smi
Sun Feb 11 11:18:43 2018
+-----------------------------------------------------------------------------+

| NVIDIA-SMI 384.111 Driver Version: 384.111 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce 920M Off | 00000000:01:00.0 N/A | N/A |
| N/A 41C P5 N/A / N/A | 129MiB / 2004MiB | N/A Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 Not Supported |
+-----------------------------------------------------------------------------+

安装就成功了



安装cuda8.0

https://developer.nvidia.com/cuda-toolkit
一定要选择8.0,一共1.4G左右。
会有非常长的接受许可信息,一直ENTER直到输入accept。驱动之前已经安装,这里就不要选择安装驱动。其余的都直接默认或者选择是即可。
使用:

root@master# sudo sh cuda_8.0.27_linux.run
root@master# vim /etc/profile
export PATH=/usr/local/cuda-8.0/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64
root@master# source /etc/profile
root@master#
root@master#



测试cuda的Samples

root@master# cd /usr/local/cuda-8.0/samples/1_Utilities/deviceQuery
root@master# sudo make
root@master# ./deviceQueryCUDA Device Query (Runtime API) version (CUDART static linking)Detected 1 CUDA Capable device(s)Device 0: "GeForce 920M"CUDA Driver Version / Runtime Version 9.0 / 8.0CUDA Capability Major/Minor version number: 3.5Total amount of global memory: 2004 MBytes (2101542912 bytes)( 2) Multiprocessors, (192) CUDA Cores/MP: 384 CUDA CoresGPU Max Clock rate: 954 MHz (0.95 GHz)Memory Clock rate: 900 MhzMemory Bus Width: 64-bitL2 Cache Size: 524288 bytesMaximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layersMaximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layersTotal amount of constant memory: 65536 bytesTotal amount of shared memory per block: 49152 bytesTotal number of registers available per block: 65536Warp size: 32Maximum number of threads per multiprocessor: 2048Maximum number of threads per block: 1024Max dimension size of a thread block (x,y,z): (1024, 1024, 64)Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)Maximum memory pitch: 2147483647 bytesTexture alignment: 512 bytesConcurrent copy and kernel execution: Yes with 1 copy engine(s)Run time limit on kernels: YesIntegrated GPU sharing Host Memory: NoSupport host page-locked memory mapping: YesAlignment requirement for Surfaces: YesDevice has ECC support: DisabledDevice supports Unified Addressing (UVA): YesDevice PCI Domain ID / Bus ID / location ID: 0 / 1 / 0Compute Mode:with device simultaneously) >deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.0, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = GeForce 920M
Result = PASS



sample下的蒙特卡罗模拟程序

cuda的编译语句

root@master# "/usr/local/cuda-8.0"/bin/nvcc -ccbin g++ -I../../common/inc -m64
-gencode arch=compute_20,code=sm_20
-gencode arch=compute_30,code=sm_30
-gencode arch=compute_35,code=sm_35
-gencode arch=compute_37,code=sm_37
-gencode arch=compute_50,code=sm_50
-gencode arch=compute_52,code=sm_52
-gencode arch=compute_60,code=sm_60
-gencode arch=compute_60,code=compute_60
-o MonteCarloMultiGPU.o -c MonteCarloMultiGPU.cpp

程序

#include
#include
#include
#include
#include // includes, project
#include // Helper functions (utilities, parsing, timing)
#include // helper functions (cuda error checking and initialization)
#include #include "MonteCarlo_common.h"int *pArgc = NULL;
char **pArgv = NULL;#ifdef WIN32
#define strcasecmp _strcmpi
#endif////////////////////////////////////////////////////////////////////////////////
// Common functions
////////////////////////////////////////////////////////////////////////////////
float randFloat(float low, float high)
{float t = (float)rand() / (float)RAND_MAX;return (1.0f - t) * low + t * high;
}/// Utility function to tweak problem size for small GPUs
int adjustProblemSize(int GPU_N, int default_nOptions)
{int nOptions &#61; default_nOptions;// select problem sizefor (int i&#61;0; iint cudaCores &#61; _ConvertSMVer2Cores(deviceProp.major, deviceProp.minor)* deviceProp.multiProcessorCount;if (cudaCores <&#61; 32){nOptions &#61; (nOptions 2 ? nOptions : cudaCores/2);}}return nOptions;
}int adjustGridSize(int GPUIndex, int defaultGridSize)
{cudaDeviceProp deviceProp;checkCudaErrors(cudaGetDeviceProperties(&deviceProp, GPUIndex));int maxGridSize &#61; deviceProp.multiProcessorCount * 40;return ((defaultGridSize > maxGridSize) ? maxGridSize : defaultGridSize);
}///////////////////////////////////////////////////////////////////////////////
// CPU reference functions
///////////////////////////////////////////////////////////////////////////////
extern "C" void MonteCarloCPU(TOptionValue &callValue,TOptionData optionData,float *h_Random,int pathN
);//Black-Scholes formula for call options
extern "C" void BlackScholesCall(float &CallResult,TOptionData optionData
);////////////////////////////////////////////////////////////////////////////////
// GPU-driving host thread
////////////////////////////////////////////////////////////////////////////////
//Timer
StopWatchInterface **hTimer &#61; NULL;static CUT_THREADPROC solverThread(TOptionPlan *plan)
{//Init GPUcheckCudaErrors(cudaSetDevice(plan->device));cudaDeviceProp deviceProp;checkCudaErrors(cudaGetDeviceProperties(&deviceProp, plan->device));//Start the timersdkStartTimer(&hTimer[plan->device]);// Allocate intermediate memory for MC integrator and initialize// RNG statesinitMonteCarloGPU(plan);// Main computationMonteCarloGPU(plan);checkCudaErrors(cudaDeviceSynchronize());//Stop the timersdkStopTimer(&hTimer[plan->device]);//Shut down this GPUcloseMonteCarloGPU(plan);cudaStreamSynchronize(0);printf("solverThread() finished - GPU Device %d: %s\n", plan->device, deviceProp.name);CUT_THREADEND;
}static void multiSolver(TOptionPlan *plan, int nPlans)
{// allocate and initialize an array of stream handlescudaStream_t *streams &#61; (cudaStream_t *) malloc(nPlans * sizeof(cudaStream_t));cudaEvent_t *events &#61; (cudaEvent_t *)malloc(nPlans * sizeof(cudaEvent_t));for (int i &#61; 0; i //Init Each GPU// In CUDA 4.0 we can call cudaSetDevice multiple times to target each device// Set the device desired, then perform initializations on that devicefor (int i&#61;0 ; i// set the target device to perform initialization oncheckCudaErrors(cudaSetDevice(plan[i].device));cudaDeviceProp deviceProp;checkCudaErrors(cudaGetDeviceProperties(&deviceProp, plan[i].device));// Allocate intermediate memory for MC integrator// and initialize RNG stateinitMonteCarloGPU(&plan[i]);}for (int i&#61;0 ; i//Start the timersdkResetTimer(&hTimer[0]);sdkStartTimer(&hTimer[0]);for (int i&#61;0; i//Main computationsMonteCarloGPU(&plan[i], streams[i]);checkCudaErrors(cudaEventRecord(events[i], streams[i]));}for (int i&#61;0; i//Stop the timersdkStopTimer(&hTimer[0]);for (int i&#61;0 ; i}///////////////////////////////////////////////////////////////////////////////
// Main program
///////////////////////////////////////////////////////////////////////////////
#define DO_CPU
#undef DO_CPU#define PRINT_RESULTS
#undef PRINT_RESULTSvoid usage()
{printf("--method&#61;[threaded,streamed] --scaling&#61;[strong,weak] [--help]\n");printf("Method&#61;threaded: 1 CPU thread for each GPU [default]\n");printf(" streamed: 1 CPU thread handles all GPUs (requires CUDA 4.0 or newer)\n");printf("Scaling&#61;strong : constant problem size\n");printf(" weak : problem size scales with number of available GPUs [default]\n");
}int main(int argc, char **argv)
{char *multiMethodChoice &#61; NULL;char *scalingChoice &#61; NULL;bool use_threads &#61; true;bool bqatest &#61; false;bool strongScaling &#61; false;pArgc &#61; &argc;pArgv &#61; argv;printf("%s Starting...\n\n", argv[0]);if (checkCmdLineFlag(argc, (const char **)argv, "qatest")){bqatest &#61; true;}getCmdLineArgumentString(argc, (const char **)argv, "method", &multiMethodChoice);getCmdLineArgumentString(argc, (const char **)argv, "scaling", &scalingChoice);if (checkCmdLineFlag(argc, (const char **)argv, "h") ||checkCmdLineFlag(argc, (const char **)argv, "help")){usage();exit(EXIT_SUCCESS);}if (multiMethodChoice &#61;&#61; NULL){use_threads &#61; false;}else{if (!strcasecmp(multiMethodChoice, "threaded")){use_threads &#61; true;}else{use_threads &#61; false;}}if (use_threads &#61;&#61; false){printf("Using single CPU thread for multiple GPUs\n");}if (scalingChoice &#61;&#61; NULL){strongScaling &#61; false;}else{if (!strcasecmp(scalingChoice, "strong")){strongScaling &#61; true;}else{strongScaling &#61; false;}}//GPU number present in the systemint GPU_N;checkCudaErrors(cudaGetDeviceCount(&GPU_N));int nOptions &#61; 8 * 1024;nOptions &#61; adjustProblemSize(GPU_N, nOptions);// select problem sizeint scale &#61; (strongScaling) ? 1 : GPU_N;int OPT_N &#61; nOptions * scale;int PATH_N &#61; 262144;// initialize the timershTimer &#61; new StopWatchInterface*[GPU_N];for (int i&#61;0; i//Input data arrayTOptionData *optionData &#61; new TOptionData[OPT_N];//Final GPU MC resultsTOptionValue *callValueGPU &#61; new TOptionValue[OPT_N];//"Theoretical" call values by Black-Scholes formulafloat *callValueBS &#61; new float[OPT_N];//Solver configTOptionPlan *optionSolver &#61; new TOptionPlan[GPU_N];//OS thread IDCUTThread *threadID &#61; new CUTThread[GPU_N];int gpuBase, gpuIndex;int i;float time;double delta, ref, sumDelta, sumRef, sumReserve;printf("MonteCarloMultiGPU\n");printf("&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;\n");printf("Parallelization method &#61; %s\n", use_threads ? "threaded" : "streamed");printf("Problem scaling &#61; %s\n", strongScaling? "strong" : "weak");printf("Number of GPUs &#61; %d\n", GPU_N);printf("Total number of options &#61; %d\n", OPT_N);printf("Number of paths &#61; %d\n", PATH_N);printf("main(): generating input data...\n");srand(123);for (i&#61;0; i 5.0f, 50.0f);optionData[i].X &#61; randFloat(10.0f, 25.0f);optionData[i].T &#61; randFloat(1.0f, 5.0f);optionData[i].R &#61; 0.06f;optionData[i].V &#61; 0.10f;callValueGPU[i].Expected &#61; -1.0f;callValueGPU[i].Confidence &#61; -1.0f;}printf("main(): starting %i host threads...\n", GPU_N);//Get option count for each GPUfor (i &#61; 0; i //Take into account cases with "odd" option countsfor (i &#61; 0; i <(OPT_N % GPU_N); i&#43;&#43;){optionSolver[i].optionCount&#43;&#43;;}//Assign GPU option rangesgpuBase &#61; 0;for (i &#61; 0; i if (use_threads || bqatest){//Start CPU thread for each GPUfor (gpuIndex &#61; 0; gpuIndex "main(): waiting for GPU results...\n");cutWaitForThreads(threadID, GPU_N);printf("main(): GPU statistics, threaded\n");for (i &#61; 0; i "GPU Device #%i: %s\n", optionSolver[i].device, deviceProp.name);printf("Options : %i\n", optionSolver[i].optionCount);printf("Simulation paths: %i\n", optionSolver[i].pathN);time &#61; sdkGetTimerValue(&hTimer[i]);printf("Total time (ms.): %f\n", time);printf("Options per sec.: %f\n", OPT_N / (time * 0.001));}printf("main(): comparing Monte Carlo and Black-Scholes results...\n");sumDelta &#61; 0;sumRef &#61; 0;sumReserve &#61; 0;for (i &#61; 0; i ref &#61; callValueBS[i];sumDelta &#43;&#61; delta;sumRef &#43;&#61; fabs(ref);if (delta > 1e-6){sumReserve &#43;&#61; callValueGPU[i].Confidence / delta;}#ifdef PRINT_RESULTSprintf("BS: %f; delta: %E\n", callValueBS[i], delta);
#endif}sumReserve /&#61; OPT_N;}if (!use_threads || bqatest){multiSolver(optionSolver, GPU_N);printf("main(): GPU statistics, streamed\n");for (i &#61; 0; i "GPU Device #%i: %s\n", optionSolver[i].device, deviceProp.name);printf("Options : %i\n", optionSolver[i].optionCount);printf("Simulation paths: %i\n", optionSolver[i].pathN);}time &#61; sdkGetTimerValue(&hTimer[0]);printf("\nTotal time (ms.): %f\n", time);printf("\tNote: This is elapsed time for all to compute.\n");printf("Options per sec.: %f\n", OPT_N / (time * 0.001));printf("main(): comparing Monte Carlo and Black-Scholes results...\n");sumDelta &#61; 0;sumRef &#61; 0;sumReserve &#61; 0;for (i &#61; 0; i ref &#61; callValueBS[i];sumDelta &#43;&#61; delta;sumRef &#43;&#61; fabs(ref);if (delta > 1e-6){sumReserve &#43;&#61; callValueGPU[i].Confidence / delta;}#ifdef PRINT_RESULTSprintf("BS: %f; delta: %E\n", callValueBS[i], delta);
#endif}sumReserve /&#61; OPT_N;}#ifdef DO_CPUprintf("main(): running CPU MonteCarlo...\n");TOptionValue callValueCPU;sumDelta &#61; 0;sumRef &#61; 0;for (i &#61; 0; i ref &#61; callValueCPU.Expected;sumDelta &#43;&#61; delta;sumRef &#43;&#61; fabs(ref);printf("Exp : %f | %f\t", callValueCPU.Expected, callValueGPU[i].Expected);printf("Conf: %f | %f\n", callValueCPU.Confidence, callValueGPU[i].Confidence);}printf("L1 norm: %E\n", sumDelta / sumRef);
#endifprintf("Shutting down...\n");for (int i&#61;0; i"Test Summary...\n");printf("L1 norm : %E\n", sumDelta / sumRef);printf("Average reserve: %f\n", sumReserve);printf("\nNOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.\n\n");printf(sumReserve > 1.0f ? "Test passed\n" : "Test failed!\n");exit(sumReserve > 1.0f ? EXIT_SUCCESS : EXIT_FAILURE);
}


推荐阅读
  • imx6ull开发板驱动MT7601U无线网卡的方法和步骤详解
    本文详细介绍了在imx6ull开发板上驱动MT7601U无线网卡的方法和步骤。首先介绍了开发环境和硬件平台,然后说明了MT7601U驱动已经集成在linux内核的linux-4.x.x/drivers/net/wireless/mediatek/mt7601u文件中。接着介绍了移植mt7601u驱动的过程,包括编译内核和配置设备驱动。最后,列举了关键词和相关信息供读者参考。 ... [详细]
  • uniapp开发H5解决跨域问题的两种代理方法
    本文介绍了uniapp开发H5解决跨域问题的两种代理方法,分别是在manifest.json文件和vue.config.js文件中设置代理。通过设置代理根域名和配置路径别名,可以实现H5页面的跨域访问。同时还介绍了如何开启内网穿透,让外网的人可以访问到本地调试的H5页面。 ... [详细]
  • 在Docker中,将主机目录挂载到容器中作为volume使用时,常常会遇到文件权限问题。这是因为容器内外的UID不同所导致的。本文介绍了解决这个问题的方法,包括使用gosu和suexec工具以及在Dockerfile中配置volume的权限。通过这些方法,可以避免在使用Docker时出现无写权限的情况。 ... [详细]
  • 生成式对抗网络模型综述摘要生成式对抗网络模型(GAN)是基于深度学习的一种强大的生成模型,可以应用于计算机视觉、自然语言处理、半监督学习等重要领域。生成式对抗网络 ... [详细]
  • Spring源码解密之默认标签的解析方式分析
    本文分析了Spring源码解密中默认标签的解析方式。通过对命名空间的判断,区分默认命名空间和自定义命名空间,并采用不同的解析方式。其中,bean标签的解析最为复杂和重要。 ... [详细]
  • 使用Ubuntu中的Python获取浏览器历史记录原文: ... [详细]
  • Linux如何安装Mongodb的详细步骤和注意事项
    本文介绍了Linux如何安装Mongodb的详细步骤和注意事项,同时介绍了Mongodb的特点和优势。Mongodb是一个开源的数据库,适用于各种规模的企业和各类应用程序。它具有灵活的数据模式和高性能的数据读写操作,能够提高企业的敏捷性和可扩展性。文章还提供了Mongodb的下载安装包地址。 ... [详细]
  • Ubuntu安装常用软件详细步骤
    目录1.GoogleChrome浏览器2.搜狗拼音输入法3.Pycharm4.Clion5.其他软件1.GoogleChrome浏览器通过直接下载安装GoogleChro ... [详细]
  • Go Cobra命令行工具入门教程
    本文介绍了Go语言实现的命令行工具Cobra的基本概念、安装方法和入门实践。Cobra被广泛应用于各种项目中,如Kubernetes、Hugo和Github CLI等。通过使用Cobra,我们可以快速创建命令行工具,适用于写测试脚本和各种服务的Admin CLI。文章还通过一个简单的demo演示了Cobra的使用方法。 ... [详细]
  • 本文介绍了如何使用Express App提供静态文件,同时提到了一些不需要使用的文件,如package.json和/.ssh/known_hosts,并解释了为什么app.get('*')无法捕获所有请求以及为什么app.use(express.static(__dirname))可能会提供不需要的文件。 ... [详细]
  • 本文介绍了解决github无法访问和克隆项目到本地的问题。作者建议通过修改配置文件中的用户名和密码来解决访问失败的问题,并提供了详细步骤。同时,还提醒读者注意输入的用户名和密码是否正确。 ... [详细]
  • 本文介绍了在Python张量流中使用make_merged_spec()方法合并设备规格对象的方法和语法,以及参数和返回值的说明,并提供了一个示例代码。 ... [详细]
  • python3 nmap函数简介及使用方法
    本文介绍了python3 nmap函数的简介及使用方法,python-nmap是一个使用nmap进行端口扫描的python库,它可以生成nmap扫描报告,并帮助系统管理员进行自动化扫描任务和生成报告。同时,它也支持nmap脚本输出。文章详细介绍了python-nmap的几个py文件的功能和用途,包括__init__.py、nmap.py和test.py。__init__.py主要导入基本信息,nmap.py用于调用nmap的功能进行扫描,test.py用于测试是否可以利用nmap的扫描功能。 ... [详细]
  • PHP反射API的功能和用途详解
    本文详细介绍了PHP反射API的功能和用途,包括动态获取信息和调用对象方法的功能,以及自动加载插件、生成文档、扩充PHP语言等用途。通过反射API,可以获取类的元数据,创建类的实例,调用方法,传递参数,动态调用类的静态方法等。PHP反射API是一种内建的OOP技术扩展,通过使用Reflection、ReflectionClass和ReflectionMethod等类,可以帮助我们分析其他类、接口、方法、属性和扩展。 ... [详细]
  • 本文介绍了5个基本Linux命令行工具的现代化替代品,包括du、top和ncdu。这些替代品在功能上进行了改进,提高了可用性,并且适用于现代化系统。其中,ncdu是du的替代品,它提供了与du类似的结果,但在一个基于curses的交互式界面中,重点关注占用磁盘空间较多的目录。 ... [详细]
author-avatar
帝薩克斯_271
这个家伙很懒,什么也没留下!
PHP1.CN | 中国最专业的PHP中文社区 | DevBox开发工具箱 | json解析格式化 |PHP资讯 | PHP教程 | 数据库技术 | 服务器技术 | 前端开发技术 | PHP框架 | 开发工具 | 在线工具
Copyright © 1998 - 2020 PHP1.CN. All Rights Reserved | 京公网安备 11010802041100号 | 京ICP备19059560号-4 | PHP1.CN 第一PHP社区 版权所有