Distributed-Memory Programming with PI

8 minread

1,387words

Intermediatelevel

Distributed-Memory Programming with PI (Parallel Implementation)

In distributed-memory systems, each processor has its own local memory, and there is no global memory that is shared between processors. To perform computations across such systems, data must be communicated between the processors. One of the key challenges is how to manage data distribution and communication effectively while ensuring correct results.

One common problem often used to explain distributed-memory programming is the computation of π (Pi), specifically using Monte Carlo methods or numerical integration. In this example, we'll look at how to compute π using distributed-memory programming concepts like MPI (Message Passing Interface), which is the standard for communication in distributed-memory systems.

Computing Pi Using Monte Carlo Method in Distributed-Memory Systems

The Monte Carlo method is a statistical technique that uses random sampling to estimate numerical results. For estimating π, a common approach is to simulate points within a square that contains a quarter circle and then calculate the ratio of points that fall inside the circle to the total number of points.

Steps to Estimate π:

Set up a square of side length 1, and within it, inscribe a quarter circle with radius 1 (centered at the origin).
Randomly generate points inside the square. Each point has an (x, y) coordinate where both x and y are between 0 and 1.
Check if the point is inside the quarter circle. A point (x, y) is inside the circle if: $x^2 + y^2 \leq 1$
Calculate the ratio of points inside the circle to the total points. This ratio is proportional to the area of the quarter circle compared to the area of the square, which leads to the estimation of π: $\pi \approx 4 \times \frac{\text{Points inside the circle}}{\text{Total points generated}}$

Distributed-Memory Programming Approach (MPI)

In a distributed-memory system, we can divide the total number of points to be generated among multiple processes (nodes). Each process will handle a portion of the points, and the results will be combined to compute the final estimate of π.

We'll use MPI (Message Passing Interface) for communication between processes. The basic steps are:

Divide the work: Split the total number of points among the processes.
Local computation: Each process computes its portion of the random points and counts how many fall inside the quarter circle.
Collect and combine results: The results (number of points inside the circle) are sent to a master process (rank 0), which aggregates them to estimate π.

Example Code: Computing Pi with MPI (C Version)

This example shows how you might implement the Monte Carlo method to compute π using MPI in a distributed-memory system.

1. Initialization:

We first initialize the MPI environment and determine how many processes we have, as well as the rank (ID) of each process.

2. Dividing the Work:

Each process generates a random set of points and counts how many fall inside the quarter circle. The total number of points is divided equally among all processes.

3. Communication:

The local counts from each process are sent to the root process (rank 0), which will sum the counts from all processes and calculate the final estimate for π.

MPI C Code:

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <mpi.h>
#include <time.h>

#define NUM_POINTS 1000000  // Total number of random points to generate

// Function to generate a random double between 0 and 1
double rand_double() {
    return (double) rand() / RAND_MAX;
}

int main(int argc, char *argv[]) {
    int rank, size, local_count = 0, total_count = 0;
    double x, y, pi_estimate;
    srand(time(NULL) + rank);  // Initialize random seed with rank

    // Initialize the MPI environment
    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);  // Get the rank of the process
    MPI_Comm_size(MPI_COMM_WORLD, &size);  // Get the number of processes

    // Each process generates a subset of random points
    int points_per_process = NUM_POINTS / size;
    for (int i = 0; i < points_per_process; i++) {
        x = rand_double();
        y = rand_double();

        // Check if the point lies inside the quarter circle
        if (x * x + y * y <= 1.0) {
            local_count++;
        }
    }

    // Gather the local counts from all processes to the root process (rank 0)
    MPI_Reduce(&local_count, &total_count, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);

    // The root process calculates the estimate of pi
    if (rank == 0) {
        pi_estimate = 4.0 * total_count / NUM_POINTS;
        printf("Estimated value of pi: %f\n", pi_estimate);
    }

    // Finalize the MPI environment
    MPI_Finalize();
    return 0;
}

Explanation of the Code:

MPI Initialization:
The code starts by initializing the MPI environment with MPI_Init. Then, it retrieves the rank of the current process with MPI_Comm_rank and the total number of processes with MPI_Comm_size.
Generating Random Points:
Each process generates a subset of the total random points. It uses the rand_double() function to generate random numbers between 0 and 1 for the x and y coordinates of the points. It checks if the point falls inside the quarter circle using the condition $x^2 + y^2 \leq 1$ .
Local Count:
Each process maintains a local_count variable that counts how many of its points fall inside the circle. The number of points inside the circle is accumulated locally by each process.
Communication with MPI_Reduce:
Once all the processes have finished counting, MPI_Reduce is used to gather the local counts from all processes and sum them up in the total_count variable at the root process (rank 0). This function performs a reduction operation (in this case, summing the counts).
Calculating and Printing π:
The root process (rank 0) calculates the estimated value of π using the formula:
$\pi \approx 4 \times \frac{\text{total points inside the circle}}{\text{total points generated}}$
It then prints the result.
MPI Finalization:
Finally, MPI_Finalize is called to terminate the MPI environment and clean up any resources allocated by MPI.

Key MPI Concepts Used:

MPI_Init: Initializes the MPI environment.
MPI_Comm_rank: Gets the rank (ID) of the current process.
MPI_Comm_size: Gets the total number of processes in the communicator.
MPI_Reduce: Gathers data from all processes and performs a reduction operation (sum in this case) on that data at the root process.
MPI_Finalize: Finalizes the MPI environment.

Running the Code:

To run the above MPI program on a cluster or multi-node system, you would typically compile the program and then execute it using an MPI job scheduler (such as mpirun or mpiexec):

mpicc -o pi_mpi pi_mpi.c
mpirun -np 4 ./pi_mpi

Here, -np 4 specifies that 4 processes will be used. The number of processes can be adjusted based on the available system resources.

Advantages of Distributed-Memory Programming for Pi Estimation:

Scalability: By distributing the computation (random point generation) across multiple processes, we can handle much larger datasets and improve performance, especially when the total number of points becomes very large (millions or billions of points).
Parallelism: Each process works independently on its subset of the points, and the overall computation can be done faster as more processes are added.
Fault Tolerance: In a large-scale system, if one process fails, the other processes can continue, and techniques like checkpointing or retries can be used to recover from failure.
Efficient Use of Resources: Distributed-memory systems allow us to leverage the computational power of multiple machines, reducing the time needed to compute large-scale problems like estimating π.

Conclusion

In distributed-memory systems, programming requires explicit communication between processes. The MPI-based Monte Carlo method to estimate π is a good example of how to break a problem into smaller tasks, distribute them among multiple processes, and then combine the results. Using MPI for such parallel computations can significantly speed up processing time and make large-scale data analysis feasible across distributed systems.

Past Papers

Open this section to load past papers

Click on Show Past Papers to see past papers.

#include <stdio.h> #include <stdlib.h> #include <math.h> #include <mpi.h> #include <time.h> #define NUM_POINTS 1000000 // Total number of random points to generate // Function to generate a random double between 0 and 1 double rand_double() { return (double) rand() / RAND_MAX; } int main(int argc, char *argv[]) { int rank, size, local_count = 0, total_count = 0; double x, y, pi_estimate; srand(time(NULL) + rank); // Initialize random seed with rank // Initialize the MPI environment MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); // Get the rank of the process MPI_Comm_size(MPI_COMM_WORLD, &size); // Get the number of processes // Each process generates a subset of random points int points_per_process = NUM_POINTS / size; for (int i = 0; i < points_per_process; i++) { x = rand_double(); y = rand_double(); // Check if the point lies inside the quarter circle if (x * x + y * y <= 1.0) { local_count++; } } // Gather the local counts from all processes to the root process (rank 0) MPI_Reduce(&local_count, &total_count, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD); // The root process calculates the estimate of pi if (rank == 0) { pi_estimate = 4.0 * total_count / NUM_POINTS; printf("Estimated value of pi: %f\n", pi_estimate); } // Finalize the MPI environment MPI_Finalize(); return 0; }