Friday, August 13, 2010

Nvidia CUDA Toolkit 3.1 - with Fermi card optimizations

The latest Nvidia "Fermi" GPUs (Graphical Processing Units) are making their way to the stores now by way of the latest Nvidia Graphics Cards that are definitely worth a look if it has been over a year since you upgraded your graphics card - the processing power per watt now is just unbelievable!  And, the latest Nvidia CUDA Toolkit 3.1 release has some features specific to the new Fermi cards and architecture that you may want to check into; just in case you are into GPU programming for fun.

Nvidia (NASDAQ:NVDA) has moved to a modern 40nm architecture for these new GPUs, which has allowed them to be much more power-efficient while cranking out tons of graphics horsepower for gaming and/or professional applications that make use of their stream-processors (aka, "CUDA cores") on the graphics card for high-performance computing (HPC) via massively-parallel-processed algorithms.  CUDA is NVIDIA’s parallel computing architecture that enables dramatic increases in computing performance by harnessing the power of the GPU (graphics processing unit) for applications including image and video processing, computational biology and chemistry, fluid dynamics simulation, CT image reconstruction, seismic analysis, ray tracing, and much more.

Get your NVidia Fermi Graphics Card
First, get hold of a new Fermi-based Nvidia CUDA Graphics card (link: NVidia Graphics Card comparison matrix) to develop and run your new CUDA applications on.  There are some really great cards out now that offer some really nice punch for the buck (aka, "price-to-performance ratio"), including these:
  • Nvidia Geforce GTX 460 - a very reasonably priced (~ $200.00) super-powerful mainstream / desktop graphics card (targets gamers mainly) that smokes every other card on the market in this price range.  This card offers 336 CUDA processing cores and a Gigabyte of RAM to run your new Nvidia CUDA Toolkit 3.1 applications on.
  • The brand new professional-class NVidia Quadro 4000 (NOT to be confused with the old Quadro FX 4000!) -- this ~$1000 card has 256 CUDA cores coupled to 2GB of GDDR5 RAM and is well suited to apps like CAD, Photoshop CS4 / CS5, and other CUDA-enabled professional apps. The card is quite power-efficient at only 142 watts max.
Now you can start putting some new CUDA abilities to work...

Nvidia CUDA Toolkit 3.1 Release Highlights
  • GPUDirect(tm) gives 3rd party devices direct access to CUDA Memory
  • Support for 16-way concurrency allows up to 16 different kernels to run at the same time on Fermi architecture GPUs
  • Runtime / Driver interoperability enables applications to mix-n-match use of the CUDA Driver API with CUDA C Runtim and math libraries via buffer sharing and context migration
  • New language features added to CUDA C / C++ include:
    • Support for printf() in device code
    • Support for function pointers and recursion make it easier to port many existing algorithms to Fermi GPUs
  • Unified Visual Profiler now supports both CUDA C/C++ and OpenCL, and now includes support for CUDA Driver API tracing
  • Math Libraries Performance Improvements, including:
    • Improved performance of selected transcendental functions from the log, pow, erf, and gamma families
    • Significant improvements in double-precision FFT performance on Fermi-architecture GPUs for 2^n transform sizes
    • Streaming API now supported in CUBLAS for overlapping copy and compute operations
    • CUFFT Real-to-complex (R2C) and complex-to-real (C2R) optimizations for 2^n data sizes
    • Improved performance for GEMV and SYMV subroutines in CUBLAS
    • Optimized double-precision implementations of divide and reciprocal routines for the Fermi architecture
  • New and updated SDK code samples demonstrating how to use:
    • Function pointers in CUDA C/C++ kernels
    • OpenCL / Direct3D buffer sharing
    • Hidden Markov Model in OpenCL
    • Microsoft Excel GPGPU example showing how to run an Excel function on the GPU


Financial Opportunities - Nvidia (NASDAQ:NVDA) stock?
Since this blog also focuses on stock-market and investing opportunities, I have to contemplate whether the new Nvidia Fermi cards are going to drive substantial sales/revenue-gains and associated profit-gains for Nvidia corporation.  I can not help thinking that it is inevitable, especially when so many of the online retailers I went to in search of a new Nvidia GTX 460 card from were out of stock, backordered, and so forth.

And, these cards are out there already... people lucky enough to have gotten hold of them already are essentially uniformly impressed and satisfied with the performance of the GTX 460 card.  I have read all sorts of reviews from buyers saying how these cards have set a new standard in desktop gaming performance (frame-rates, etc) while also being rather reasonable in their power consumption.  Nvidia allows for running two cards together (in SLI-mode) for even higher performance, and from all the tests and reviews I have read: wow... these are FAST!

So, it seems to be nearly a guarantee that Nvidia is going to move a LOT of these cards.  The question is: at what margin?  They are being VERY competitive and aggressive with their pricing model, which suggests that margins may not be TOO large, but I do not know.  I will assume they are being sold for a profit, and that with enough volume, their margins will also be pretty decent.

And, then there is the super-computing and professional market: THAT is what I am more interested in from an investing standpoint.  These cards are being used in the top of the line supercomputers and high-performance computing systems and clusters, where a single super-computer may use 100s or 1000s of these cards in it.  And, Nvidia's top Quadro 6000 graphics card lists for $6,000 -- targetting digital production firms (think: Adobe Photoshop and Premier e.g.) and engineering firms doing real-time 3D work and the like.  These firms WILL buy the new Fermi-based cards in order to gain efficiencies at their firms (since these cards are up to 8-times faster than the prior generation; meaning: much time saved when rendering, etc).

Sure, the economy is "slow" right now, but what better way for companies to gain efficiency for a reasonable sum?  Move some processing off to new super-powered Nvidia GPUs!  If your employees spend less time waiting for computing operations to complete, perhaps you can get by with less employees (note: none of us like the sound of that, but it IS what helps drive "productivity"'; I'd just prefer seeing and freed-up employee time being redirected toward more creativity and product design and improvement, etc).

Bottom line: NVIDIA HAS SOME AWESOME GRAPHICS CARDS TO CONSIDER, and some updated tools to go with them!