热门标签 | HotTags
当前位置:  开发笔记 > 编程语言 > 正文

深度学习之Caffe完全掌握:配置GPU驱动安装cuda

深度学习之Caffe完全掌握:配置GPU驱动安装cuda安装Nvidia驱动其他博客一般是要求安装你的电脑对应的版本,我直接装了nvidia-367&

深度学习之Caffe完全掌握:配置GPU驱动安装cuda

这里写图片描述



安装Nvidia驱动

其他博客一般是要求安装你的电脑对应的版本,我直接装了nvidia-367(为了配合cuda8.0),也可以用。
你也可以参照:http://blog.csdn.net/xuzhongxiong/article/details/52717285

root@master# sudo add-apt-repository ppa:xorg-edgers/ppa
root@master# sudo apt-get update
root@master# sudo apt-get install nvidia-367
root@master# sudo apt-get install mesa-common-dev
root@master# sudo apt-get install freeglut3-dev
root@master# nvidia-smi
Sun Feb 11 11:18:43 2018
+-----------------------------------------------------------------------------+

| NVIDIA-SMI 384.111 Driver Version: 384.111 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce 920M Off | 00000000:01:00.0 N/A | N/A |
| N/A 41C P5 N/A / N/A | 129MiB / 2004MiB | N/A Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 Not Supported |
+-----------------------------------------------------------------------------+

安装就成功了



安装cuda8.0

https://developer.nvidia.com/cuda-toolkit
一定要选择8.0,一共1.4G左右。
会有非常长的接受许可信息,一直ENTER直到输入accept。驱动之前已经安装,这里就不要选择安装驱动。其余的都直接默认或者选择是即可。
使用:

root@master# sudo sh cuda_8.0.27_linux.run
root@master# vim /etc/profile
export PATH=/usr/local/cuda-8.0/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64
root@master# source /etc/profile
root@master#
root@master#



测试cuda的Samples

root@master# cd /usr/local/cuda-8.0/samples/1_Utilities/deviceQuery
root@master# sudo make
root@master# ./deviceQueryCUDA Device Query (Runtime API) version (CUDART static linking)Detected 1 CUDA Capable device(s)Device 0: "GeForce 920M"CUDA Driver Version / Runtime Version 9.0 / 8.0CUDA Capability Major/Minor version number: 3.5Total amount of global memory: 2004 MBytes (2101542912 bytes)( 2) Multiprocessors, (192) CUDA Cores/MP: 384 CUDA CoresGPU Max Clock rate: 954 MHz (0.95 GHz)Memory Clock rate: 900 MhzMemory Bus Width: 64-bitL2 Cache Size: 524288 bytesMaximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layersMaximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layersTotal amount of constant memory: 65536 bytesTotal amount of shared memory per block: 49152 bytesTotal number of registers available per block: 65536Warp size: 32Maximum number of threads per multiprocessor: 2048Maximum number of threads per block: 1024Max dimension size of a thread block (x,y,z): (1024, 1024, 64)Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)Maximum memory pitch: 2147483647 bytesTexture alignment: 512 bytesConcurrent copy and kernel execution: Yes with 1 copy engine(s)Run time limit on kernels: YesIntegrated GPU sharing Host Memory: NoSupport host page-locked memory mapping: YesAlignment requirement for Surfaces: YesDevice has ECC support: DisabledDevice supports Unified Addressing (UVA): YesDevice PCI Domain ID / Bus ID / location ID: 0 / 1 / 0Compute Mode:with device simultaneously) >deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.0, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = GeForce 920M
Result = PASS



sample下的蒙特卡罗模拟程序

cuda的编译语句

root@master# "/usr/local/cuda-8.0"/bin/nvcc -ccbin g++ -I../../common/inc -m64
-gencode arch=compute_20,code=sm_20
-gencode arch=compute_30,code=sm_30
-gencode arch=compute_35,code=sm_35
-gencode arch=compute_37,code=sm_37
-gencode arch=compute_50,code=sm_50
-gencode arch=compute_52,code=sm_52
-gencode arch=compute_60,code=sm_60
-gencode arch=compute_60,code=compute_60
-o MonteCarloMultiGPU.o -c MonteCarloMultiGPU.cpp

程序

#include
#include
#include
#include
#include // includes, project
#include // Helper functions (utilities, parsing, timing)
#include // helper functions (cuda error checking and initialization)
#include #include "MonteCarlo_common.h"int *pArgc = NULL;
char **pArgv = NULL;#ifdef WIN32
#define strcasecmp _strcmpi
#endif////////////////////////////////////////////////////////////////////////////////
// Common functions
////////////////////////////////////////////////////////////////////////////////
float randFloat(float low, float high)
{float t = (float)rand() / (float)RAND_MAX;return (1.0f - t) * low + t * high;
}/// Utility function to tweak problem size for small GPUs
int adjustProblemSize(int GPU_N, int default_nOptions)
{int nOptions &#61; default_nOptions;// select problem sizefor (int i&#61;0; iint cudaCores &#61; _ConvertSMVer2Cores(deviceProp.major, deviceProp.minor)* deviceProp.multiProcessorCount;if (cudaCores <&#61; 32){nOptions &#61; (nOptions 2 ? nOptions : cudaCores/2);}}return nOptions;
}int adjustGridSize(int GPUIndex, int defaultGridSize)
{cudaDeviceProp deviceProp;checkCudaErrors(cudaGetDeviceProperties(&deviceProp, GPUIndex));int maxGridSize &#61; deviceProp.multiProcessorCount * 40;return ((defaultGridSize > maxGridSize) ? maxGridSize : defaultGridSize);
}///////////////////////////////////////////////////////////////////////////////
// CPU reference functions
///////////////////////////////////////////////////////////////////////////////
extern "C" void MonteCarloCPU(TOptionValue &callValue,TOptionData optionData,float *h_Random,int pathN
);//Black-Scholes formula for call options
extern "C" void BlackScholesCall(float &CallResult,TOptionData optionData
);////////////////////////////////////////////////////////////////////////////////
// GPU-driving host thread
////////////////////////////////////////////////////////////////////////////////
//Timer
StopWatchInterface **hTimer &#61; NULL;static CUT_THREADPROC solverThread(TOptionPlan *plan)
{//Init GPUcheckCudaErrors(cudaSetDevice(plan->device));cudaDeviceProp deviceProp;checkCudaErrors(cudaGetDeviceProperties(&deviceProp, plan->device));//Start the timersdkStartTimer(&hTimer[plan->device]);// Allocate intermediate memory for MC integrator and initialize// RNG statesinitMonteCarloGPU(plan);// Main computationMonteCarloGPU(plan);checkCudaErrors(cudaDeviceSynchronize());//Stop the timersdkStopTimer(&hTimer[plan->device]);//Shut down this GPUcloseMonteCarloGPU(plan);cudaStreamSynchronize(0);printf("solverThread() finished - GPU Device %d: %s\n", plan->device, deviceProp.name);CUT_THREADEND;
}static void multiSolver(TOptionPlan *plan, int nPlans)
{// allocate and initialize an array of stream handlescudaStream_t *streams &#61; (cudaStream_t *) malloc(nPlans * sizeof(cudaStream_t));cudaEvent_t *events &#61; (cudaEvent_t *)malloc(nPlans * sizeof(cudaEvent_t));for (int i &#61; 0; i //Init Each GPU// In CUDA 4.0 we can call cudaSetDevice multiple times to target each device// Set the device desired, then perform initializations on that devicefor (int i&#61;0 ; i// set the target device to perform initialization oncheckCudaErrors(cudaSetDevice(plan[i].device));cudaDeviceProp deviceProp;checkCudaErrors(cudaGetDeviceProperties(&deviceProp, plan[i].device));// Allocate intermediate memory for MC integrator// and initialize RNG stateinitMonteCarloGPU(&plan[i]);}for (int i&#61;0 ; i//Start the timersdkResetTimer(&hTimer[0]);sdkStartTimer(&hTimer[0]);for (int i&#61;0; i//Main computationsMonteCarloGPU(&plan[i], streams[i]);checkCudaErrors(cudaEventRecord(events[i], streams[i]));}for (int i&#61;0; i//Stop the timersdkStopTimer(&hTimer[0]);for (int i&#61;0 ; i}///////////////////////////////////////////////////////////////////////////////
// Main program
///////////////////////////////////////////////////////////////////////////////
#define DO_CPU
#undef DO_CPU#define PRINT_RESULTS
#undef PRINT_RESULTSvoid usage()
{printf("--method&#61;[threaded,streamed] --scaling&#61;[strong,weak] [--help]\n");printf("Method&#61;threaded: 1 CPU thread for each GPU [default]\n");printf(" streamed: 1 CPU thread handles all GPUs (requires CUDA 4.0 or newer)\n");printf("Scaling&#61;strong : constant problem size\n");printf(" weak : problem size scales with number of available GPUs [default]\n");
}int main(int argc, char **argv)
{char *multiMethodChoice &#61; NULL;char *scalingChoice &#61; NULL;bool use_threads &#61; true;bool bqatest &#61; false;bool strongScaling &#61; false;pArgc &#61; &argc;pArgv &#61; argv;printf("%s Starting...\n\n", argv[0]);if (checkCmdLineFlag(argc, (const char **)argv, "qatest")){bqatest &#61; true;}getCmdLineArgumentString(argc, (const char **)argv, "method", &multiMethodChoice);getCmdLineArgumentString(argc, (const char **)argv, "scaling", &scalingChoice);if (checkCmdLineFlag(argc, (const char **)argv, "h") ||checkCmdLineFlag(argc, (const char **)argv, "help")){usage();exit(EXIT_SUCCESS);}if (multiMethodChoice &#61;&#61; NULL){use_threads &#61; false;}else{if (!strcasecmp(multiMethodChoice, "threaded")){use_threads &#61; true;}else{use_threads &#61; false;}}if (use_threads &#61;&#61; false){printf("Using single CPU thread for multiple GPUs\n");}if (scalingChoice &#61;&#61; NULL){strongScaling &#61; false;}else{if (!strcasecmp(scalingChoice, "strong")){strongScaling &#61; true;}else{strongScaling &#61; false;}}//GPU number present in the systemint GPU_N;checkCudaErrors(cudaGetDeviceCount(&GPU_N));int nOptions &#61; 8 * 1024;nOptions &#61; adjustProblemSize(GPU_N, nOptions);// select problem sizeint scale &#61; (strongScaling) ? 1 : GPU_N;int OPT_N &#61; nOptions * scale;int PATH_N &#61; 262144;// initialize the timershTimer &#61; new StopWatchInterface*[GPU_N];for (int i&#61;0; i//Input data arrayTOptionData *optionData &#61; new TOptionData[OPT_N];//Final GPU MC resultsTOptionValue *callValueGPU &#61; new TOptionValue[OPT_N];//"Theoretical" call values by Black-Scholes formulafloat *callValueBS &#61; new float[OPT_N];//Solver configTOptionPlan *optionSolver &#61; new TOptionPlan[GPU_N];//OS thread IDCUTThread *threadID &#61; new CUTThread[GPU_N];int gpuBase, gpuIndex;int i;float time;double delta, ref, sumDelta, sumRef, sumReserve;printf("MonteCarloMultiGPU\n");printf("&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;&#61;\n");printf("Parallelization method &#61; %s\n", use_threads ? "threaded" : "streamed");printf("Problem scaling &#61; %s\n", strongScaling? "strong" : "weak");printf("Number of GPUs &#61; %d\n", GPU_N);printf("Total number of options &#61; %d\n", OPT_N);printf("Number of paths &#61; %d\n", PATH_N);printf("main(): generating input data...\n");srand(123);for (i&#61;0; i 5.0f, 50.0f);optionData[i].X &#61; randFloat(10.0f, 25.0f);optionData[i].T &#61; randFloat(1.0f, 5.0f);optionData[i].R &#61; 0.06f;optionData[i].V &#61; 0.10f;callValueGPU[i].Expected &#61; -1.0f;callValueGPU[i].Confidence &#61; -1.0f;}printf("main(): starting %i host threads...\n", GPU_N);//Get option count for each GPUfor (i &#61; 0; i //Take into account cases with "odd" option countsfor (i &#61; 0; i <(OPT_N % GPU_N); i&#43;&#43;){optionSolver[i].optionCount&#43;&#43;;}//Assign GPU option rangesgpuBase &#61; 0;for (i &#61; 0; i if (use_threads || bqatest){//Start CPU thread for each GPUfor (gpuIndex &#61; 0; gpuIndex "main(): waiting for GPU results...\n");cutWaitForThreads(threadID, GPU_N);printf("main(): GPU statistics, threaded\n");for (i &#61; 0; i "GPU Device #%i: %s\n", optionSolver[i].device, deviceProp.name);printf("Options : %i\n", optionSolver[i].optionCount);printf("Simulation paths: %i\n", optionSolver[i].pathN);time &#61; sdkGetTimerValue(&hTimer[i]);printf("Total time (ms.): %f\n", time);printf("Options per sec.: %f\n", OPT_N / (time * 0.001));}printf("main(): comparing Monte Carlo and Black-Scholes results...\n");sumDelta &#61; 0;sumRef &#61; 0;sumReserve &#61; 0;for (i &#61; 0; i ref &#61; callValueBS[i];sumDelta &#43;&#61; delta;sumRef &#43;&#61; fabs(ref);if (delta > 1e-6){sumReserve &#43;&#61; callValueGPU[i].Confidence / delta;}#ifdef PRINT_RESULTSprintf("BS: %f; delta: %E\n", callValueBS[i], delta);
#endif}sumReserve /&#61; OPT_N;}if (!use_threads || bqatest){multiSolver(optionSolver, GPU_N);printf("main(): GPU statistics, streamed\n");for (i &#61; 0; i "GPU Device #%i: %s\n", optionSolver[i].device, deviceProp.name);printf("Options : %i\n", optionSolver[i].optionCount);printf("Simulation paths: %i\n", optionSolver[i].pathN);}time &#61; sdkGetTimerValue(&hTimer[0]);printf("\nTotal time (ms.): %f\n", time);printf("\tNote: This is elapsed time for all to compute.\n");printf("Options per sec.: %f\n", OPT_N / (time * 0.001));printf("main(): comparing Monte Carlo and Black-Scholes results...\n");sumDelta &#61; 0;sumRef &#61; 0;sumReserve &#61; 0;for (i &#61; 0; i ref &#61; callValueBS[i];sumDelta &#43;&#61; delta;sumRef &#43;&#61; fabs(ref);if (delta > 1e-6){sumReserve &#43;&#61; callValueGPU[i].Confidence / delta;}#ifdef PRINT_RESULTSprintf("BS: %f; delta: %E\n", callValueBS[i], delta);
#endif}sumReserve /&#61; OPT_N;}#ifdef DO_CPUprintf("main(): running CPU MonteCarlo...\n");TOptionValue callValueCPU;sumDelta &#61; 0;sumRef &#61; 0;for (i &#61; 0; i ref &#61; callValueCPU.Expected;sumDelta &#43;&#61; delta;sumRef &#43;&#61; fabs(ref);printf("Exp : %f | %f\t", callValueCPU.Expected, callValueGPU[i].Expected);printf("Conf: %f | %f\n", callValueCPU.Confidence, callValueGPU[i].Confidence);}printf("L1 norm: %E\n", sumDelta / sumRef);
#endifprintf("Shutting down...\n");for (int i&#61;0; i"Test Summary...\n");printf("L1 norm : %E\n", sumDelta / sumRef);printf("Average reserve: %f\n", sumReserve);printf("\nNOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.\n\n");printf(sumReserve > 1.0f ? "Test passed\n" : "Test failed!\n");exit(sumReserve > 1.0f ? EXIT_SUCCESS : EXIT_FAILURE);
}


推荐阅读
  • 本文介绍了在rhel5.5操作系统下搭建网关+LAMP+postfix+dhcp的步骤和配置方法。通过配置dhcp自动分配ip、实现外网访问公司网站、内网收发邮件、内网上网以及SNAT转换等功能。详细介绍了安装dhcp和配置相关文件的步骤,并提供了相关的命令和配置示例。 ... [详细]
  • 本文介绍了数据库的存储结构及其重要性,强调了关系数据库范例中将逻辑存储与物理存储分开的必要性。通过逻辑结构和物理结构的分离,可以实现对物理存储的重新组织和数据库的迁移,而应用程序不会察觉到任何更改。文章还展示了Oracle数据库的逻辑结构和物理结构,并介绍了表空间的概念和作用。 ... [详细]
  • Centos7.6安装Gitlab教程及注意事项
    本文介绍了在Centos7.6系统下安装Gitlab的详细教程,并提供了一些注意事项。教程包括查看系统版本、安装必要的软件包、配置防火墙等步骤。同时,还强调了使用阿里云服务器时的特殊配置需求,以及建议至少4GB的可用RAM来运行GitLab。 ... [详细]
  • 成功安装Sabayon Linux在thinkpad X60上的经验分享
    本文分享了作者在国庆期间在thinkpad X60上成功安装Sabayon Linux的经验。通过修改CHOST和执行emerge命令,作者顺利完成了安装过程。Sabayon Linux是一个基于Gentoo Linux的发行版,可以将电脑快速转变为一个功能强大的系统。除了作为一个live DVD使用外,Sabayon Linux还可以被安装在硬盘上,方便用户使用。 ... [详细]
  • Go Cobra命令行工具入门教程
    本文介绍了Go语言实现的命令行工具Cobra的基本概念、安装方法和入门实践。Cobra被广泛应用于各种项目中,如Kubernetes、Hugo和Github CLI等。通过使用Cobra,我们可以快速创建命令行工具,适用于写测试脚本和各种服务的Admin CLI。文章还通过一个简单的demo演示了Cobra的使用方法。 ... [详细]
  • EPICS Archiver Appliance存储waveform记录的尝试及资源需求分析
    本文介绍了EPICS Archiver Appliance存储waveform记录的尝试过程,并分析了其所需的资源容量。通过解决错误提示和调整内存大小,成功存储了波形数据。然后,讨论了储存环逐束团信号的意义,以及通过记录多圈的束团信号进行参数分析的可能性。波形数据的存储需求巨大,每天需要近250G,一年需要90T。然而,储存环逐束团信号具有重要意义,可以揭示出每个束团的纵向振荡频率和模式。 ... [详细]
  • Windows下配置PHP5.6的方法及注意事项
    本文介绍了在Windows系统下配置PHP5.6的步骤及注意事项,包括下载PHP5.6、解压并配置IIS、添加模块映射、测试等。同时提供了一些常见问题的解决方法,如下载缺失的msvcr110.dll文件等。通过本文的指导,读者可以轻松地在Windows系统下配置PHP5.6,并解决一些常见的配置问题。 ... [详细]
  • 本文介绍了在Linux下安装Perl的步骤,并提供了一个简单的Perl程序示例。同时,还展示了运行该程序的结果。 ... [详细]
  • Webmin远程命令执行漏洞复现及防护方法
    本文介绍了Webmin远程命令执行漏洞CVE-2019-15107的漏洞详情和复现方法,同时提供了防护方法。漏洞存在于Webmin的找回密码页面中,攻击者无需权限即可注入命令并执行任意系统命令。文章还提供了相关参考链接和搭建靶场的步骤。此外,还指出了参考链接中的数据包不准确的问题,并解释了漏洞触发的条件。最后,给出了防护方法以避免受到该漏洞的攻击。 ... [详细]
  • Linux磁盘的分区、格式化的观察和操作步骤
    本文介绍了如何观察Linux磁盘的分区状态,使用lsblk命令列出系统上的所有磁盘列表,并解释了列表中各个字段的含义。同时,还介绍了使用parted命令列出磁盘的分区表类型和分区信息的方法。在进行磁盘分区操作时,根据分区表类型选择使用fdisk或gdisk命令,并提供了具体的分区步骤。通过本文,读者可以了解到Linux磁盘分区和格式化的基本知识和操作步骤。 ... [详细]
  • imx6ull开发板驱动MT7601U无线网卡的方法和步骤详解
    本文详细介绍了在imx6ull开发板上驱动MT7601U无线网卡的方法和步骤。首先介绍了开发环境和硬件平台,然后说明了MT7601U驱动已经集成在linux内核的linux-4.x.x/drivers/net/wireless/mediatek/mt7601u文件中。接着介绍了移植mt7601u驱动的过程,包括编译内核和配置设备驱动。最后,列举了关键词和相关信息供读者参考。 ... [详细]
  • FeatureRequestIsyourfeaturerequestrelatedtoaproblem?Please ... [详细]
  • 本文介绍了如何使用C#制作Java+Mysql+Tomcat环境安装程序,实现一键式安装。通过将JDK、Mysql、Tomcat三者制作成一个安装包,解决了客户在安装软件时的复杂配置和繁琐问题,便于管理软件版本和系统集成。具体步骤包括配置JDK环境变量和安装Mysql服务,其中使用了MySQL Server 5.5社区版和my.ini文件。安装方法为通过命令行将目录转到mysql的bin目录下,执行mysqld --install MySQL5命令。 ... [详细]
  • ubuntu用sqoop将数据从hive导入mysql时,命令: ... [详细]
  • CentOS 6.5安装VMware Tools及共享文件夹显示问题解决方法
    本文介绍了在CentOS 6.5上安装VMware Tools及解决共享文件夹显示问题的方法。包括清空CD/DVD使用的ISO镜像文件、创建挂载目录、改变光驱设备的读写权限等步骤。最后给出了拷贝解压VMware Tools的操作。 ... [详细]
author-avatar
帝薩克斯_271
这个家伙很懒,什么也没留下!
PHP1.CN | 中国最专业的PHP中文社区 | DevBox开发工具箱 | json解析格式化 |PHP资讯 | PHP教程 | 数据库技术 | 服务器技术 | 前端开发技术 | PHP框架 | 开发工具 | 在线工具
Copyright © 1998 - 2020 PHP1.CN. All Rights Reserved | 京公网安备 11010802041100号 | 京ICP备19059560号-4 | PHP1.CN 第一PHP社区 版权所有