About DirectCompute
After NVidia Cuda, OpenCL we got DirectX’s version (of GPGPU parallelism for numerical calculations) named DirectCompute. All three technologies are very similar and have one goal: to provide some sort of API and shader language for numerical calculations that GPGPU is capable of. And if you use it wisely the performance gains over normal CPU are huge, really huge. Not just that you offload CPU but the calculations due to GPGPU massive multithreading (and multicore) are much faster compared to even the most advanced x64 CPU.
It works like this: you prepare enough data that can be worked on in parallel, send it to GPGPU, wait till GPGPU finished processing the data and fetch the results back to the CPU where your application can use them. You can achieve up to a TFLOP with a graphics card sold today.
I guess this information is enough to see what is all about. For more information on the topic check out Cuda and OpenCL. I’d suggest to check out the DirectCompute web site but the reality is that I couldn’t find any official (or non-official) page that deals with it (it is the youngest of the three but still that could be more online information, don’t you think). However you might check the DirectCompute PDC 2009 presentation here and a simple tutorial.
Of the three I am most interested in DirectCompute because:
- Cuda is tied to NVidia GPGPU while OpenCL and DirectCompute aren’t
- I prefer using DirectX over OpenGL for a simple reason – I did work for a client using DirectX so I am more familiar with it
I have a NVidia silent GeForce 9600 series graphic card (cards from GeForce 8 and up are supported) and the good news is that NVidia has been supporting CUDA for a while and recently introduced support for DirectCompute as well. OpenCL is also supported. I don’t know the current state of support for ATI cards.
There is a potential drawback if you go with DirectCompute though. It is a part of DirectX 11 and will run on Windows 7, Windows 2008 and Vista SP2 (somebody correct me). The problem is if you have an older Windows OS like Windows XP or you want support older Windows. In that case you might consider alternatives.
Goal
The not that good aspect of a young technology such as DirectCompute is lack of samples, documentation and tutorials are almost nonexistent. What’s even worse for a .net developer as I am is apparent lack of managed support. Which turns to be there to some extent through Windows API Code Pack (WACP) managed code wrappers. Note, WACP isn’t very .netish such as Managed DirectX was but it is good enough.
So I tried the managed approach for BasicCompute11 C++ sample (that comes with DirectX SDK) through WACP and here are the steps to make it run. It should give you a good starting point if you are DirectComputing through managed code and even if you are using native code. Hopefully somebody will benefit from my experience.
BasicCompute11 sample is really a simple one. It declares a type holding an int and a float, creates two arrays of those with an incremental value ( {0, 0.0}, {1, 1.1}, {2, 2.2}, ….{8191, 8191.0}) and adds them together to a third array.
I’ll be working on Windows 7 x64 with NVidia GeForce 9600 graphics card but will create a Win32 executable.
Steps
1. Install the latest NVidia graphics drivers. Current version 195.62 works well enough.
2. Install Windows 7 SDK. It is required because WACP uses its include files and libraries.
3. Make sure that Windows 7 SDK version is currently selected. You’ll have to run [Program Files]\Microsoft SDKs\Windows\v7.0\Setup\SDKSetup.exe. Note that I had to run it from command prompt because it didn’t work from UI for some reason.
4. Install DirectX SDK August 2009 SDK.
- If it doesn't exist yet then declare an environment variable DXSDK_DIR that points to [Program Files (x86)]\Microsoft DirectX SDK (August 2009)\ path.
5. Install Windows API Code Pack. Or better, copy the source to a folder on your computer. Open those sources and do the following:
After these settings the project should compile. However, there are additional steps, because one of the methods in there is flawed. Let’s correct it.
- Open file Header Files\D3D11\Core\D3D11DeviceContext.h and find method declaration
Map(D3DResource^ resource, UInt32 subresource, D3D11::Map mapType, MapFlag mapFlags, MappedSubresource mappedResource);
(there are no overloads to Map method). Convert it to
MappedSubresource^ Map(D3DResource^ resource, UInt32 subresource, D3D11::Map mapType, MapFlag mapFlags); - Open file Source Files\D3D11\Core\D3D11DeviceContext.cpp and find the (same) method definition
DeviceContext::Map(D3DResource^ resource, UInt32 subresource, D3D11::Map mapType, D3D11::MapFlag mapFlags, MappedSubresource mappedResource)
Replace the entire method with this code:MappedSubresource^ DeviceContext::Map(D3DResource^ resource, UInt32 subresource,
D3D11::Map mapType, D3D11::MapFlag mapFlags) { D3D11_MAPPED_SUBRESOURCE mappedRes; CommonUtils::VerifyResult(GetInterface()->Map( resource->GetInterface(), subresource, static_cast(mapType), static_cast(mapFlags), &mappedRes)); return gcnew MappedSubresource(mappedRes); }
DirectX wrappers we need are now functional. Compile the project and close it.
6. Open DirectX Sample Browser (you’ll find it in Start\All Programs\Microsoft DirectX SDK (August 2009) and install BasicCompute11 sample somewhere on the disk.
7. Open the sample in Visual Studio 2008 and run it. The generated console output should look like this:
Pay attention to the output because if DirectX can’t find an adequate hardware/driver support for DirectCompute it will run the code with a reference shader – using CPU not GPGPU (“No hardware Compute Shader capable device found, trying to create ref device.”) or even won’t run at all.
If the sample did run successfully it means that your environment supports DirectCompute. Good for you.
8. Now try running my test project (attached to this article) that is pure managed C# code. It more or less replicates the original BasicCompute11 sample which is a C++ project. The main differences with the original are two:
- I removed the majority of device capability checking code (you’ll have to have compute shader functional or else…)
- I compile compute shader code (HLSL –> FX) at design time rather at run time. This is required because there are no runtime compilation managed wrappers in WACP because those are part of D3DX which isn’t part of the OS but rather redistributed separately. So you have to rely on FXC compiler (part of DirectX SDK). A note here: I’ve lost quite a lot of time (again!) figuring out why FXC doesn’t find a suitable entry point to my compute shader code when compiling (and errors out). I finally remembered that I’ve stumbled upon this problem quite a long time ago: FXC compiles only ASCII(!!!!!) files. Guys, is this 2009 or 1979? Ben 10 would say. “Oh man.”
Anyway, the FX code is included with project so you won’t have to compile it again. This time the output should look like this:
I didn’t document my rewritten BasicCompute11 project. You can search for comments in the original project.
Conclusion
Compute shaders are incredibly powerful when it comes to numerical calculations. And being able to work with them through managed code is just great. Not a lot of developers will need their power though, but if/when they’ll need them the support is there.
As for final word: I miss documentations, tutorials and samples. Hopefully eventually they’ll come but for now there is a good book on CUDA (remember, the technologies are similar) – the book comes in a PDF format along NVidia CUDA SDK.
Have fun DirectComputing!