CUDA is a parallel computing platform and application programming interface (API) model created by NVIDIA. It allows developers to write software that runs on NVIDIA GPUs (Graphics Processing Units), providing a significant performance boost over CPU-only computations. CUDA enables developers to leverage the massive parallel processing power of GPUs for general-purpose computing (GPGPU), particularly for computationally intensive tasks such as scientific computing, machine learning, and graphics rendering.
Massive Parallelism:
Programming Model:
Memory Hierarchy:
CUDA Cores:
Streams and Concurrency:
Libraries and Tools:
A simple example of a CUDA program to add two arrays element-wise is shown below:
#include <iostream>
#include <cuda_runtime.h>
__global__ void add(int *a, int *b, int *c, int N) {
int index = threadIdx.x + blockIdx.x * blockDim.x;
if (index < N) {
c[index] = a[index] + b[index];
}
}
int main() {
const int N = 1000;
int a[N], b[N], c[N];
// Initialize input arrays
for (int i = 0; i < N; i++) {
a[i] = i;
b[i] = i * 2;
}
int *d_a, *d_b, *d_c;
// Allocate memory on the GPU
cudaMalloc((void**)&d_a, N * sizeof(int));
cudaMalloc((void**)&d_b, N * sizeof(int));
cudaMalloc((void**)&d_c, N * sizeof(int));
// Copy data from host to device
cudaMemcpy(d_a, a, N * sizeof(int), cudaMemcpyHostToDevice);
cudaMemcpy(d_b, b, N * sizeof(int), cudaMemcpyHostToDevice);
// Launch kernel with one block of 256 threads per block
add<<<(N + 255) / 256, 256>>>(d_a, d_b, d_c, N);
// Copy result from device to host
cudaMemcpy(c, d_c, N * sizeof(int), cudaMemcpyDeviceToHost);
// Print some of the result
for (int i = 0; i < N; i++) {
if (i < 10) { // Print first 10 elements
std::cout << "c[" << i << "] = " << c[i] << std::endl;
}
}
// Free memory
cudaFree(d_a);
cudaFree(d_b);
cudaFree(d_c);
return 0;
}
This program performs element-wise addition of two arrays using CUDA kernels, which execute in parallel on the GPU.
Swift is a high-level, general-purpose programming language designed for performance and ease of use. Developed by Apple, Swift is commonly used for developing applications for iOS, macOS, watchOS, and tvOS. Swift also has support for parallel and distributed computing, especially in the context of scientific computing and data-driven applications.
Swift’s ecosystem and runtime provide tools to simplify parallel computing, especially through its concurrency features, which allow developers to manage tasks efficiently in parallel.
Concurrency Model:
GCD (Grand Central Dispatch):
Actors:
Parallel Collections:
DispatchQueue or other parallel mechanisms.Swift for TensorFlow (S4TF):
async/awaitA simple example demonstrating asynchronous parallel computation using Swift's async/await model:
import Foundation
// Simulate a computational task
func computeTask(_ value: Int) async -> Int {
return value * value
}
@main
struct ParallelSwiftExample {
static func main() async {
let task1 = Task { await computeTask(2) }
let task2 = Task { await computeTask(3) }
let task3 = Task { await computeTask(4) }
// Await the results of all tasks concurrently
let results = await [task1.value, task2.value, task3.value]
print("Results: $$results)") // Output: Results: [4, 9, 16]
}
}
In this example:
async keyword marks functions as asynchronous.await waits for the asynchronous tasks to finish, allowing the system to perform other work while waiting for results.| Feature | CUDA | Swift |
|---|---|---|
| Target Hardware | GPUs (NVIDIA), general-purpose processors | CPUs (Apple platforms) |
| Parallel Model | Thread-level parallelism, massive parallelism on GPU | Structured concurrency with async/await, GCD |
| Programming Language | C, C++, Fortran with CUDA extensions | Swift (native language) |
| Memory Model | Explicit management of GPU memory (e.g., shared, global, local) | Memory management through reference types, actors for thread safety |
| Libraries | cuBLAS, cuDNN, cuFFT, and others for specialized tasks | Swift standard library with GCD and concurrency APIs |
| Typical Use Cases | High-performance computing, machine learning, simulations | iOS/macOS apps, machine learning with Swift for TensorFlow |
Both CUDA and Swift offer powerful tools for parallel and distributed computing but are best suited for different types of hardware and application domains.
Open this section to load past papers