Signal Simulation from Image

Signal Simulation from Image#

Overview#

In this part, we’ll simulate signals from an image. Let me give some more context. When we mean signal simulation from an image, we mean an inverse approach where we try to simulate signals received by a uniform linear array if the image represents the amplitude of scatters.

Setup#

The first step in setting up this signal simulation pipeline is to choose the physical bounds of the scatterers. These are represented by the parameters:

top-left coordinate
bottom-right coordinate.

These coordinates corresponds to the coordinates of the pixels in the top-left and bottom-right. The code then assigns the coordinates of the rest of the pixels using this information.

Similarly, this is followed by setting up the position of the sensor array. The sensor array parameters are as follows

sensor array origin: where the first sensor is placed.
sensor spacing: the arrray we’re using is a uniform linear array, which means that the sensors are equally spaced.
sensor-array direction: the sensor array direction corresponds to the vector pointing from the first sensor the rest of the sensors. Please note that the director vector is normalized.
emitter location: Corresponds to the location of the SONAR signal emitter.

In addition to the physical parameters, we also need to setup the signal parameters. The signal parameters are as follows

signal speed: speed of sound in water
signal frequency: frequency of signal emitted. Currently, we’re using a uniform signal but note that usually, chirp is used.
recording time: the amount of time the hydrophones are turned on to listen after the signal is emitted. This parameter defines the range of the system.
transmission duration: This is the length of the emitted signal.

Matlab Code#

We first start off by compiling the written CUDA code

%% Compiling CUDA code 
mexcuda signal_simulation_imaging_structure.cu

Next, we define the sensor-array parameters that will be used to simulate signals that arrive at the sensors of the signal array. We primarily setup the following parameters

sensor array origin: where the first sensor is placed.
sensor spacing: the arrray we’re using is a uniform linear array, which means that the sensors are equally spaced.
sensor-array direction: the sensor array direction corresponds to the vector pointing from the first sensor the rest of the sensors. Please note that the director vector is normalized.
emitter location: Corresponds to the location of the SONAR signal emitter.

%% Sensor Array Parameters
sensorarray_origin = [0;0];            % The location of the first sensor in the sensor-array
sensor_spacing = 1e-1;                 % the distance between sensors in the ULA
sensorarray_direction = [1;0];         % The direction from first sensor to rest of the sensors
sensorarray_direction = sensorarray_direction/(vecnorm(sensorarray_direction));
sensorarray_direction = sqrt(sensor_spacing)*sensorarray_direction;
num_sensors = 256;                     % number of sensors in the ULA
emitter_location = sensorarray_origin; % location of ping emitter

Next, we setup the parameters of the signals used in this setup.

signal speed: speed of sound in water
signal frequency: frequency of signal emitted. Currently, we’re the chirp frequency. The initial frequency of the sweep is this value and the maximum value is the half of sampling frequency.
recording time: the amount of time the hydrophones are turned on to listen after the signal is emitted. This parameter defines the range of the system.
transmission duration: This is the length of the emitted signal.
max range: The max range is the range of scan. It depends on the recording time.

%% Signal Parameters
c = 1500;                                                   % Speed of Sound in Water
signal_frequency = 100;                                     % Frequency of Transmitted Sine-wave
recording_time = 2e-1;                                      % Duration of Recording
sampling_frequency = 16e3;                                  % Sampling Frequency of Hydrophones
transmissionDuration = 1e-2;                                % Duration of transmission
timeArray = 0:(1/sampling_frequency):transmissionDuration;  % sampling intervals
transmissionSignal = chirp(timeArray, ...
                            100, ...
                            timeArray(end), ...
                            sampling_frequency/2);
max_range = transmissionDuration*c/2;

Next, we build the image that is used to represent the scatterers. We use the inbuilt Matlab function, phantom, of size 1024. We then assign the physical bounds of the scatterers using

top-left coordinate: coordinate of the top-left pixel
bottom-right coordinate: coordinate of the bottom-right pixel.

%% Buidling Signal Target
inputImage = phantom(1024);
inputImage(end/2, end/2) = 1;
top_left_coordinate = [0;-1*max_range];
bottom_right_coordinate = [max_range; 1*max_range];

Next, we call the script that does the signal simulation.

%% Calling Signal Simulation Script
tic
[distanceMatrix, ...
    amplitudeMatrix, ...
    signalMatrix] = ...
    signal_simulation_imaging_structure(sensorarray_origin, ...
                                        sensorarray_direction, ...
                                        sensor_spacing, ...
                                        num_sensors, ...
                                        emitter_location, ...
                                        inputImage, ...
                                        top_left_coordinate, ...
                                        bottom_right_coordinate, ...
                                        c, ...
                                        signal_frequency, ...
                                        recording_time, ...
                                        sampling_frequency, ...
                                        transmissionSignal);
toc

Next, we plot the resulting signal matrix.

%% Plotting sensor-wise distances
figure(1)
dynamicRangeAdjustedAmplitudeMatrix = log(1+amplitudeMatrix);
imagesc(dynamicRangeAdjustedAmplitudeMatrix); title("Pre-Convolution Output");

figure(2);
subplot(3,1,1); plot(signalMatrix(:, 1)); title("Signal Received By First Sensor");
subplot(3,1,2); plot(signalMatrix(:, 2)); title("Signal Received By Middle Sensor");
subplot(3,1,3); plot(signalMatrix(:, 3)); title("Signal Received By End Sensor");

Final Matlab Code#

Putting it all together, we get the following.

%{
=================================================================================
Aim: Simulate Signals From an Image
=================================================================================
%}

%% Basic setup
clc; clear;

%% Compiling CUDA code 
mexcuda signal_simulation_imaging_structure.cu

%% Sensor Array Parameters
sensorarray_origin = [0;0];            % The location of the first sensor in the sensor-array
sensor_spacing = 1e-1;                 % the distance between sensors in the ULA
sensorarray_direction = [1;0];         % The direction from first sensor to rest of the sensors
sensorarray_direction = sensorarray_direction/(vecnorm(sensorarray_direction));
sensorarray_direction = sqrt(sensor_spacing)*sensorarray_direction;
num_sensors = 256;                     % number of sensors in the ULA
emitter_location = sensorarray_origin; % location of ping emitter

%% Signal Parameters
c = 1500;                                                   % Speed of Sound in Water
signal_frequency = 100;                                     % Frequency of Transmitted Sine-wave
recording_time = 2e-1;                                      % Duration of Recording
sampling_frequency = 16e3;                                  % Sampling Frequency of Hydrophones
transmissionDuration = 1e-2;                                % Duration of transmission
timeArray = 0:(1/sampling_frequency):transmissionDuration;  % sampling intervals
transmissionSignal = chirp(timeArray, ...
                            100, ...
                            timeArray(end), ...
                            sampling_frequency/2);
max_range = transmissionDuration*c/2;


%% Buidling Signal Target
inputImage = phantom(1024);
inputImage = zeros(size(inputImage));
inputImage(end/2, end/2) = 1;
top_left_coordinate = [0;-1*max_range];
bottom_right_coordinate = [max_range; 1*max_range];

%% Calling Signal Simulation Script
tic
[distanceMatrix, ...
    amplitudeMatrix, ...
    signalMatrix] = ...
    signal_simulation_imaging_structure(sensorarray_origin, ...
                                        sensorarray_direction, ...
                                        sensor_spacing, ...
                                        num_sensors, ...
                                        emitter_location, ...
                                        inputImage, ...
                                        top_left_coordinate, ...
                                        bottom_right_coordinate, ...
                                        c, ...
                                        signal_frequency, ...
                                        recording_time, ...
                                        sampling_frequency, ...
                                        transmissionSignal);
toc


%% Plotting sensor-wise distances
figure(1)
dynamicRangeAdjustedAmplitudeMatrix = log(1+amplitudeMatrix);
imagesc(dynamicRangeAdjustedAmplitudeMatrix); title("Pre-Convolution Output");

figure(2);
subplot(3,1,1); plot(signalMatrix(:, 1)); title("Signal Received By First Sensor");
subplot(3,1,2); plot(signalMatrix(:, 2)); title("Signal Received By Middle Sensor");
subplot(3,1,3); plot(signalMatrix(:, 3)); title("Signal Received By End Sensor");

CUDA Code#

As usual, we start by putting together two definitions: the gateway function definition and kernel-definition.

Kernel Definition#

Final Kernel Definition#

// kernel definition
__global__ void PerPixel(mwSize *d_inputDimensions_Image,
                            double *d_bottom_left_corner,
                            double *d_top_right_corner,
                            double *d_sensor_origin,
                            double *d_sensor_direction,
                            double *d_outputPointer,
                            double *d_inputPointer_NumSensors,
                            double *d_inputPointer_SoundSpeed,
                            double *d_inputPointer_EmitterLocation,
                            double *d_inputPointer_SamplingFrequency,
                            double *d_inputPointer_InputImage,
                            double *d_inputPointer_RecordingTime,
                            double *d_outputPointer_AmplitudeMatrix)
{
    // getting address
    const int tidx_pixel = blockIdx.x;
    const int tidy_pixel = blockIdx.y;
    const int linearIndex2D_pixel = tidx_pixel + tidy_pixel*d_inputDimensions_Image[0];

    // Calculating inter-pixel spacing
    const double pixel_x_spacing = \
        (double)((d_top_right_corner[0] - d_bottom_left_corner[0])/((double)d_inputDimensions_Image[0]-1));
    const double pixel_y_spacing = \
        (double)((d_top_right_corner[1] - d_bottom_left_corner[1])/((double)d_inputDimensions_Image[1]-1));

    // Getting Pixel Characteristics
    const double pixel_x_coordinate = d_bottom_left_corner[0] + \
                                        (tidx_pixel-1)*pixel_x_spacing;
    const double pixel_y_coordinate = d_bottom_left_corner[1] + \
                                        (tidy_pixel-1)*pixel_y_spacing;

    // getting address
    const int tidx_sensor = threadIdx.x;
    const int tidy_sensor = threadIdx.y;
    const int tidz_sensor = threadIdx.z;
    const int linearIndex2D_sensor = tidx_sensor + \
                                        tidy_sensor*blockDim.x + \
                                        tidz_sensor*blockDim.x*blockDim.y;

    const int sensor_number = linearIndex2D_sensor;

    // Getting Sensor Characteristics
    const double sensor_x_coordinate = d_sensor_origin[0] + \
                                        (sensor_number-1)*d_sensor_direction[0];
    const double sensor_y_coordinate = d_sensor_origin[1] + \
                                        (sensor_number-1)*d_sensor_direction[1];

    // Calculating Distance Between Sensor and Pixel
    const double distance_between_pixel_and_sensor = \
        (double)sqrt(pow((sensor_x_coordinate-pixel_x_coordinate),2) + \
                        pow((sensor_y_coordinate-pixel_y_coordinate),2));

    // Calculating Distance Between Transmitter and Pixel
    const double distance_between_pixel_and_transmitter = \
        (double)sqrt(pow((sensor_x_coordinate - d_inputPointer_EmitterLocation[0]),2) +
                        pow((sensor_y_coordinate - d_inputPointer_EmitterLocation[1]),2));

    // Writing the Value
    if(tidx_pixel<d_inputDimensions_Image[0] && \
        tidy_pixel<d_inputDimensions_Image[1] && \
        sensor_number < (int)d_inputPointer_NumSensors[0])
    {
        // finding index to write to
        const int linearIndex2D_Output = \
            linearIndex2D_pixel + \
            sensor_number*d_inputDimensions_Image[0]*d_inputDimensions_Image[1];

        // finding time
        const double distance_travelled = distance_between_pixel_and_transmitter + \
                                            distance_between_pixel_and_sensor;
        const double time_travelled = (double)(distance_travelled/d_inputPointer_SoundSpeed[0]);
        
        // finding samples
        const int samples_travelled = (int)(time_travelled*d_inputPointer_SamplingFrequency[0]);

        // writing to the final output
        d_outputPointer[linearIndex2D_Output] = (double)samples_travelled;
    }
    
    // syncing threads
    __syncthreads();


    // Signal generation part 
    const int tidx_signal_generation = threadIdx.x + blockIdx.x*blockDim.x;
    const int tidy_signal_generation = threadIdx.y + blockIdx.y*blockDim.y;
    const int linearIndex2D_signal_generation = tidx_signal_generation + 
                                                tidy_signal_generation*gridDim.x;

    // Checking sensor validity
    if(linearIndex2D_signal_generation<d_inputPointer_NumSensors[0])
    {
        // updates
        const int current_sensor_loop = (int)linearIndex2D_signal_generation;
        const int num_samples_recording_time = \
            (int)(d_inputPointer_SamplingFrequency[0]*d_inputPointer_RecordingTime[0]);

        // Looping through all the pixels
        for(int i = 0; i<(int)d_inputDimensions_Image[0]; ++i)
        {
            for(int j = 0; j<(int)d_inputDimensions_Image[1]; ++j)
            {
                // finding linear index to the delay-num-samples
                const int linearIndex2D_loop_sample = \
                    i + \
                    j*d_inputDimensions_Image[0] + \
                    current_sensor_loop*d_inputDimensions_Image[0]*d_inputDimensions_Image[1];

                // obtaining num-sample delay
                const int num_samples_loop = d_outputPointer[linearIndex2D_loop_sample];

                // finding pixel-amplitude
                const int linearIndex2D_loop_pixel_amplitude = i + j*d_inputDimensions_Image[0];
                const int pixel_amplitude_loop = \
                    d_inputPointer_InputImage[linearIndex2D_loop_pixel_amplitude];

                // linear-indexing to the signal-matrix
                const int linearIndex2D_AmplitudeMatrix = \
                    num_samples_loop + \
                    current_sensor_loop*num_samples_recording_time;

                // accumulating
                d_outputPointer_AmplitudeMatrix[linearIndex2D_AmplitudeMatrix] = \
                    d_outputPointer_AmplitudeMatrix[linearIndex2D_AmplitudeMatrix] + \
                    (double)pixel_amplitude_loop;
            }
        }
    }

    // syncing threads
    __syncthreads();

}

Column-wise Convolution Kernel#

Final Kernel Defintiion#

__global__ void columnwiseConvolution(mwSize *d_outputDimensions_AmplitudeMatrix,
                                        double *d_outputPointer_AmplitudeMatrix,
                                        double *d_outputPointer_SignalMatrix,
                                        int numElements_TransmissionSignal,
                                        double *d_inputPointer_TransmissionSignal)
{
    // Declaring shared memory
    extern __shared__ double sharedMem[];
    
    // getting linear index into the pixels
    const int bidx = blockIdx.x; 
    const int bidy = blockIdx.y;
    const int bidz = blockIdx.z;
    const int linearIndex2D_ForMainIndex = bidx + \
                                            bidy*gridDim.x + 
                                            bidz*gridDim.x*gridDim.y;

    // mapping from linear to index of base-index
    const int signal_x = linearIndex2D_ForMainIndex % (int)d_outputDimensions_AmplitudeMatrix[0];
    const int signal_y = (int)(linearIndex2D_ForMainIndex / (int)d_outputDimensions_AmplitudeMatrix[0]);

    // getting thread index
    const int tidx_kernel = threadIdx.x;
    const int tidy_kernel = threadIdx.y;
    const int tidz_kernel = threadIdx.z;
    const int linearIndex2D_kernel = tidx_kernel + \
                                        tidy_kernel*blockDim.x + \
                                        tidz_kernel*blockDim.x*blockDim.y;
    const int kernel_length = numElements_TransmissionSignal;

    // fetching kernel value
    bool meta_condition = \
        (signal_x < (int)d_outputDimensions_AmplitudeMatrix[0]) && \
        (signal_y < (int)d_outputDimensions_AmplitudeMatrix[1]) && \
        (linearIndex2D_kernel < kernel_length) && \
        ((signal_x - linearIndex2D_kernel) >= (int)0 ) && \
        ((signal_x - linearIndex2D_kernel) < (int)d_outputDimensions_AmplitudeMatrix[0]) \
        ;

    // Initializing kernel
    if(linearIndex2D_kernel<kernel_length) sharedMem[linearIndex2D_kernel] = (double)0;

    // Checking the validity before copying
    if(meta_condition)
    {
        // calculating dot product and copying to shared memory
        sharedMem[linearIndex2D_kernel] = \
            d_outputPointer_AmplitudeMatrix[linearIndex2D_ForMainIndex - linearIndex2D_kernel]* \
            d_inputPointer_TransmissionSignal[linearIndex2D_kernel];
    }
    __syncthreads();

    // Getting first thread of all the block to write
    if(linearIndex2D_kernel == 0 && (signal_x < (int)d_outputDimensions_AmplitudeMatrix[0]) && (signal_y < (int)d_outputDimensions_AmplitudeMatrix[1]) )
    {
        // accumulating shared-memory
        double accuSum = 0;
        for(int i = 0; i<min(numElements_TransmissionSignal, signal_x); ++i)
        {
            accuSum = accuSum + (double)sharedMem[i];
        }

        // copying to the results
        d_outputPointer_SignalMatrix[linearIndex2D_ForMainIndex] = \
            (double)accuSum;

    }

    // syncing threads
    __syncthreads();
}

Gateway Function Definition#

Final gateway function definition#

// main function
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
    // meta stuff
    cudaError_t err;

    // checking number of outputs
    if(nlhs!=3)
        mexErrMsgTxt("Number of Expected Outputs are Wrong  \n");

    // Fetching the Inputs 
    CustomGPUObject SensorArrayOrigin(prhs[0]); 
    CustomGPUObject SensorArrayDirection(prhs[1]); 
    CustomGPUObject SensorArraySpacing(prhs[2]); 
    CustomGPUObject NumSensors(prhs[3]); 
    CustomGPUObject EmitterLocation(prhs[4]); 
    CustomGPUObject InputImage(prhs[5]); 
    CustomGPUObject BottomLeftCoordinate(prhs[6]); 
    CustomGPUObject TopRightCoordinate(prhs[7]); 
    CustomGPUObject SoundSpeed(prhs[8]); 
    CustomGPUObject SignalFrequency(prhs[9]); 
    CustomGPUObject RecordingTime(prhs[10]); 
    CustomGPUObject SamplingFrequency(prhs[11]); 
    CustomGPUObject TransmissionSignal(prhs[12]); 

    // Sending data to GPU
    SensorArrayOrigin.copyFromHostToDevice();
    SensorArrayDirection.copyFromHostToDevice();
    SensorArraySpacing.copyFromHostToDevice();
    NumSensors.copyFromHostToDevice();
    EmitterLocation.copyFromHostToDevice();
    InputImage.copyFromHostToDevice();
    BottomLeftCoordinate.copyFromHostToDevice();
    TopRightCoordinate.copyFromHostToDevice();
    SoundSpeed.copyFromHostToDevice();
    SignalFrequency.copyFromHostToDevice();
    RecordingTime.copyFromHostToDevice();
    SamplingFrequency.copyFromHostToDevice();
    TransmissionSignal.copyFromHostToDevice();
    
    // setting up the output-dimensions
    mwSize *outputDimensions = (mwSize *)malloc(3*sizeof(mwSize));
    outputDimensions[0] = InputImage.inputDimensions[0]; 
    outputDimensions[1] = InputImage.inputDimensions[1]; 
    outputDimensions[2] = (mwSize)NumSensors.inputPointer_real[0];

    // creating output-matrix
    plhs[0] = mxCreateNumericArray((mwSize)3, outputDimensions, mxDOUBLE_CLASS, mxREAL);
    CustomGPUObject outputPointer_class(plhs[0]); 
    outputPointer_class.copyFromHostToDevice();
    
    // Creating Outputs for the Signal
    mwSize *outputDimensions_AmplitudeMatrix = (mwSize *)malloc(2*sizeof(mwSize));
    outputDimensions_AmplitudeMatrix[0] = (mwSize)(RecordingTime.inputPointer_real[0]*SamplingFrequency.inputPointer_real[0]);
    outputDimensions_AmplitudeMatrix[1] = (mwSize)NumSensors.inputPointer_real[0];

    // Creating Second Output
    plhs[1] = mxCreateNumericArray((mwSize)2, outputDimensions_AmplitudeMatrix, mxDOUBLE_CLASS, mxREAL);
    CustomGPUObject AmplitudeMatrix(plhs[1]); AmplitudeMatrix.copyFromHostToDevice();

    // Creating Third Output
    plhs[2] = mxCreateNumericArray(AmplitudeMatrix.numDimensions, AmplitudeMatrix.inputDimensions, mxDOUBLE_CLASS, mxREAL);
    CustomGPUObject SignalMatrix(plhs[2]); SignalMatrix.copyFromHostToDevice();

    // Calling the Function 
    dim3 blockConfiguration(32,32,1);
    dim3 gridConfiguration((int)InputImage.inputDimensions[0], (int)InputImage.inputDimensions[1], 1);
    PerPixel<<<gridConfiguration,
                blockConfiguration>>>(InputImage.d_inputDimensions,
                                        BottomLeftCoordinate.d_inputPointer_real,
                                        TopRightCoordinate.d_inputPointer_real,
                                        SensorArrayOrigin.d_inputPointer_real,
                                        SensorArrayDirection.d_inputPointer_real,
                                        outputPointer_class.d_inputPointer_real,
                                        NumSensors.d_inputPointer_real,
                                        SoundSpeed.d_inputPointer_real,
                                        EmitterLocation.d_inputPointer_real,
                                        SamplingFrequency.d_inputPointer_real,
                                        InputImage.d_inputPointer_real,
                                        RecordingTime.d_inputPointer_real,
                                        AmplitudeMatrix.d_inputPointer_real);


    // Prepping for function call
    const int grid_square_root_ceil = (int)sqrt(SignalMatrix.numElements) + 1;
    int test00 = (int)sqrt(TransmissionSignal.numElements) + 1;
    const int block_square_root_ceil = pow((((int)log2(test00)) + 1),2);
    dim3 blockConfiguration_convolution(block_square_root_ceil, block_square_root_ceil, 1);
    dim3 gridConfiguration_convolution((int)grid_square_root_ceil, (int)grid_square_root_ceil, 1);

    // Function Call
    columnwiseConvolution<<<gridConfiguration_convolution,
                            blockConfiguration_convolution,
                            TransmissionSignal.numElements*sizeof(double)>>>\
                            (AmplitudeMatrix.d_inputDimensions,
                            AmplitudeMatrix.d_inputPointer_real,
                            SignalMatrix.d_inputPointer_real,
                            TransmissionSignal.numElements,
                            TransmissionSignal.d_inputPointer_real);

    // Copying the data from Device to Host
    outputPointer_class.copyFromDeviceToHost();
    AmplitudeMatrix.copyFromDeviceToHost();
    SignalMatrix.copyFromDeviceToHost();

    // Device Shut down
    err = cudaDeviceSynchronize();
    err = cudaDeviceReset();
    
}

Final CUDA Code#

/*
================================================================================
Aim: Demonstrate Signal Simulation from a High-frequency Image
================================================================================
*/ 

// header-file
#include "mex.h"
#include "booktools.h"

// kernel definition
__global__ void PerPixel(mwSize *d_inputDimensions_Image,
                            double *d_bottom_left_corner,
                            double *d_top_right_corner,
                            double *d_sensor_origin,
                            double *d_sensor_direction,
                            double *d_outputPointer,
                            double *d_inputPointer_NumSensors,
                            double *d_inputPointer_SoundSpeed,
                            double *d_inputPointer_EmitterLocation,
                            double *d_inputPointer_SamplingFrequency,
                            double *d_inputPointer_InputImage,
                            double *d_inputPointer_RecordingTime,
                            double *d_outputPointer_AmplitudeMatrix)
{
    // getting address
    const int tidx_pixel = blockIdx.x;
    const int tidy_pixel = blockIdx.y;
    const int linearIndex2D_pixel = tidx_pixel + tidy_pixel*d_inputDimensions_Image[0];

    // Calculating inter-pixel spacing
    const double pixel_x_spacing = \
        (double)((d_top_right_corner[0] - d_bottom_left_corner[0])/((double)d_inputDimensions_Image[0]-1));
    const double pixel_y_spacing = \
        (double)((d_top_right_corner[1] - d_bottom_left_corner[1])/((double)d_inputDimensions_Image[1]-1));

    // Getting Pixel Characteristics
    const double pixel_x_coordinate = d_bottom_left_corner[0] + \
                                        (tidx_pixel-1)*pixel_x_spacing;
    const double pixel_y_coordinate = d_bottom_left_corner[1] + \
                                        (tidy_pixel-1)*pixel_y_spacing;

    // getting address
    const int tidx_sensor = threadIdx.x;
    const int tidy_sensor = threadIdx.y;
    const int tidz_sensor = threadIdx.z;
    const int linearIndex2D_sensor = tidx_sensor + \
                                        tidy_sensor*blockDim.x + \
                                        tidz_sensor*blockDim.x*blockDim.y;

    const int sensor_number = linearIndex2D_sensor;

    // Getting Sensor Characteristics
    const double sensor_x_coordinate = d_sensor_origin[0] + \
                                        (sensor_number-1)*d_sensor_direction[0];
    const double sensor_y_coordinate = d_sensor_origin[1] + \
                                        (sensor_number-1)*d_sensor_direction[1];

    // Calculating Distance Between Sensor and Pixel
    const double distance_between_pixel_and_sensor = \
        (double)sqrt(pow((sensor_x_coordinate-pixel_x_coordinate),2) + \
                        pow((sensor_y_coordinate-pixel_y_coordinate),2));

    // Calculating Distance Between Transmitter and Pixel
    const double distance_between_pixel_and_transmitter = \
        (double)sqrt(pow((sensor_x_coordinate - d_inputPointer_EmitterLocation[0]),2) +
                        pow((sensor_y_coordinate - d_inputPointer_EmitterLocation[1]),2));

    // Writing the Value
    if(tidx_pixel<d_inputDimensions_Image[0] && \
        tidy_pixel<d_inputDimensions_Image[1] && \
        sensor_number < (int)d_inputPointer_NumSensors[0])
    {
        // finding index to write to
        const int linearIndex2D_Output = \
            linearIndex2D_pixel + \
            sensor_number*d_inputDimensions_Image[0]*d_inputDimensions_Image[1];

        // finding time
        const double distance_travelled = distance_between_pixel_and_transmitter + \
                                            distance_between_pixel_and_sensor;
        const double time_travelled = (double)(distance_travelled/d_inputPointer_SoundSpeed[0]);
        
        // finding samples
        const int samples_travelled = (int)(time_travelled*d_inputPointer_SamplingFrequency[0]);

        // writing to the final output
        d_outputPointer[linearIndex2D_Output] = (double)samples_travelled;
    }
    
    // syncing threads
    __syncthreads();


    // Signal generation part 
    const int tidx_signal_generation = threadIdx.x + blockIdx.x*blockDim.x;
    const int tidy_signal_generation = threadIdx.y + blockIdx.y*blockDim.y;
    const int linearIndex2D_signal_generation = tidx_signal_generation + 
                                                tidy_signal_generation*gridDim.x;

    // Checking sensor validity
    if(linearIndex2D_signal_generation<d_inputPointer_NumSensors[0])
    {
        // updates
        const int current_sensor_loop = (int)linearIndex2D_signal_generation;
        const int num_samples_recording_time = \
            (int)(d_inputPointer_SamplingFrequency[0]*d_inputPointer_RecordingTime[0]);

        // Looping through all the pixels
        for(int i = 0; i<(int)d_inputDimensions_Image[0]; ++i)
        {
            for(int j = 0; j<(int)d_inputDimensions_Image[1]; ++j)
            {
                // finding linear index to the delay-num-samples
                const int linearIndex2D_loop_sample = \
                    i + \
                    j*d_inputDimensions_Image[0] + \
                    current_sensor_loop*d_inputDimensions_Image[0]*d_inputDimensions_Image[1];

                // obtaining num-sample delay
                const int num_samples_loop = d_outputPointer[linearIndex2D_loop_sample];

                // finding pixel-amplitude
                const int linearIndex2D_loop_pixel_amplitude = i + j*d_inputDimensions_Image[0];
                const int pixel_amplitude_loop = \
                    d_inputPointer_InputImage[linearIndex2D_loop_pixel_amplitude];

                // linear-indexing to the signal-matrix
                const int linearIndex2D_AmplitudeMatrix = \
                    num_samples_loop + \
                    current_sensor_loop*num_samples_recording_time;

                // accumulating
                d_outputPointer_AmplitudeMatrix[linearIndex2D_AmplitudeMatrix] = \
                    d_outputPointer_AmplitudeMatrix[linearIndex2D_AmplitudeMatrix] + \
                    (double)pixel_amplitude_loop;
            }
        }
    }

    // syncing threads
    __syncthreads();

}

__global__ void columnwiseConvolution(mwSize *d_outputDimensions_AmplitudeMatrix,
                                        double *d_outputPointer_AmplitudeMatrix,
                                        double *d_outputPointer_SignalMatrix,
                                        int numElements_TransmissionSignal,
                                        double *d_inputPointer_TransmissionSignal)
{
    // Declaring shared memory
    extern __shared__ double sharedMem[];
    
    // getting linear index into the pixels
    const int bidx = blockIdx.x; 
    const int bidy = blockIdx.y;
    const int bidz = blockIdx.z;
    const int linearIndex2D_ForMainIndex = bidx + \
                                            bidy*gridDim.x + 
                                            bidz*gridDim.x*gridDim.y;

    // mapping from linear to index of base-index
    const int signal_x = linearIndex2D_ForMainIndex % (int)d_outputDimensions_AmplitudeMatrix[0];
    const int signal_y = (int)(linearIndex2D_ForMainIndex / (int)d_outputDimensions_AmplitudeMatrix[0]);

    // getting thread index
    const int tidx_kernel = threadIdx.x;
    const int tidy_kernel = threadIdx.y;
    const int tidz_kernel = threadIdx.z;
    const int linearIndex2D_kernel = tidx_kernel + \
                                        tidy_kernel*blockDim.x + \
                                        tidz_kernel*blockDim.x*blockDim.y;
    const int kernel_length = numElements_TransmissionSignal;

    // fetching kernel value
    bool meta_condition = \
        (signal_x < (int)d_outputDimensions_AmplitudeMatrix[0]) && \
        (signal_y < (int)d_outputDimensions_AmplitudeMatrix[1]) && \
        (linearIndex2D_kernel < kernel_length) && \
        ((signal_x - linearIndex2D_kernel) >= (int)0 ) && \
        ((signal_x - linearIndex2D_kernel) < (int)d_outputDimensions_AmplitudeMatrix[0]) \
        ;

    // Initializing kernel
    if(linearIndex2D_kernel<kernel_length) sharedMem[linearIndex2D_kernel] = (double)0;

    // Checking the validity before copying
    if(meta_condition)
    {
        // calculating dot product and copying to shared memory
        sharedMem[linearIndex2D_kernel] = \
            d_outputPointer_AmplitudeMatrix[linearIndex2D_ForMainIndex - linearIndex2D_kernel]* \
            d_inputPointer_TransmissionSignal[linearIndex2D_kernel];
    }
    __syncthreads();

    // Getting first thread of all the block to write
    if(linearIndex2D_kernel == 0 && (signal_x < (int)d_outputDimensions_AmplitudeMatrix[0]) && (signal_y < (int)d_outputDimensions_AmplitudeMatrix[1]) )
    {
        // accumulating shared-memory
        double accuSum = 0;
        for(int i = 0; i<min(numElements_TransmissionSignal, signal_x); ++i)
        {
            accuSum = accuSum + (double)sharedMem[i];
        }

        // copying to the results
        d_outputPointer_SignalMatrix[linearIndex2D_ForMainIndex] = \
            (double)accuSum;

    }

    // syncing threads
    __syncthreads();
}


// main function
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
    // meta stuff
    cudaError_t err;

    // checking number of outputs
    if(nlhs!=3)
        mexErrMsgTxt("Number of Expected Outputs are Wrong  \n");

    // Fetching the Inputs 
    CustomGPUObject SensorArrayOrigin(prhs[0]); 
    CustomGPUObject SensorArrayDirection(prhs[1]); 
    CustomGPUObject SensorArraySpacing(prhs[2]); 
    CustomGPUObject NumSensors(prhs[3]); 
    CustomGPUObject EmitterLocation(prhs[4]); 
    CustomGPUObject InputImage(prhs[5]); 
    CustomGPUObject BottomLeftCoordinate(prhs[6]); 
    CustomGPUObject TopRightCoordinate(prhs[7]); 
    CustomGPUObject SoundSpeed(prhs[8]); 
    CustomGPUObject SignalFrequency(prhs[9]); 
    CustomGPUObject RecordingTime(prhs[10]); 
    CustomGPUObject SamplingFrequency(prhs[11]); 
    CustomGPUObject TransmissionSignal(prhs[12]); 

    // Sending data to GPU
    SensorArrayOrigin.copyFromHostToDevice();
    SensorArrayDirection.copyFromHostToDevice();
    SensorArraySpacing.copyFromHostToDevice();
    NumSensors.copyFromHostToDevice();
    EmitterLocation.copyFromHostToDevice();
    InputImage.copyFromHostToDevice();
    BottomLeftCoordinate.copyFromHostToDevice();
    TopRightCoordinate.copyFromHostToDevice();
    SoundSpeed.copyFromHostToDevice();
    SignalFrequency.copyFromHostToDevice();
    RecordingTime.copyFromHostToDevice();
    SamplingFrequency.copyFromHostToDevice();
    TransmissionSignal.copyFromHostToDevice();
    
    // setting up the output-dimensions
    mwSize *outputDimensions = (mwSize *)malloc(3*sizeof(mwSize));
    outputDimensions[0] = InputImage.inputDimensions[0]; 
    outputDimensions[1] = InputImage.inputDimensions[1]; 
    outputDimensions[2] = (mwSize)NumSensors.inputPointer_real[0];

    // creating output-matrix
    plhs[0] = mxCreateNumericArray((mwSize)3, outputDimensions, mxDOUBLE_CLASS, mxREAL);
    CustomGPUObject outputPointer_class(plhs[0]); 
    outputPointer_class.copyFromHostToDevice();
    
    // Creating Outputs for the Signal
    mwSize *outputDimensions_AmplitudeMatrix = (mwSize *)malloc(2*sizeof(mwSize));
    outputDimensions_AmplitudeMatrix[0] = (mwSize)(RecordingTime.inputPointer_real[0]*SamplingFrequency.inputPointer_real[0]);
    outputDimensions_AmplitudeMatrix[1] = (mwSize)NumSensors.inputPointer_real[0];

    // Creating Second Output
    plhs[1] = mxCreateNumericArray((mwSize)2, outputDimensions_AmplitudeMatrix, mxDOUBLE_CLASS, mxREAL);
    CustomGPUObject AmplitudeMatrix(plhs[1]); AmplitudeMatrix.copyFromHostToDevice();

    // Creating Third Output
    plhs[2] = mxCreateNumericArray(AmplitudeMatrix.numDimensions, AmplitudeMatrix.inputDimensions, mxDOUBLE_CLASS, mxREAL);
    CustomGPUObject SignalMatrix(plhs[2]); SignalMatrix.copyFromHostToDevice();

    // Calling the Function 
    dim3 blockConfiguration(32,32,1);
    dim3 gridConfiguration((int)InputImage.inputDimensions[0], (int)InputImage.inputDimensions[1], 1);
    PerPixel<<<gridConfiguration,
                blockConfiguration>>>(InputImage.d_inputDimensions,
                                        BottomLeftCoordinate.d_inputPointer_real,
                                        TopRightCoordinate.d_inputPointer_real,
                                        SensorArrayOrigin.d_inputPointer_real,
                                        SensorArrayDirection.d_inputPointer_real,
                                        outputPointer_class.d_inputPointer_real,
                                        NumSensors.d_inputPointer_real,
                                        SoundSpeed.d_inputPointer_real,
                                        EmitterLocation.d_inputPointer_real,
                                        SamplingFrequency.d_inputPointer_real,
                                        InputImage.d_inputPointer_real,
                                        RecordingTime.d_inputPointer_real,
                                        AmplitudeMatrix.d_inputPointer_real);


    // Prepping for function call
    const int grid_square_root_ceil = (int)sqrt(SignalMatrix.numElements) + 1;
    int test00 = (int)sqrt(TransmissionSignal.numElements) + 1;
    const int block_square_root_ceil = pow((((int)log2(test00)) + 1),2);
    dim3 blockConfiguration_convolution(block_square_root_ceil, block_square_root_ceil, 1);
    dim3 gridConfiguration_convolution((int)grid_square_root_ceil, (int)grid_square_root_ceil, 1);

    // Function Call
    columnwiseConvolution<<<gridConfiguration_convolution,
                            blockConfiguration_convolution,
                            TransmissionSignal.numElements*sizeof(double)>>>\
                            (AmplitudeMatrix.d_inputDimensions,
                            AmplitudeMatrix.d_inputPointer_real,
                            SignalMatrix.d_inputPointer_real,
                            TransmissionSignal.numElements,
                            TransmissionSignal.d_inputPointer_real);

    // Copying the data from Device to Host
    outputPointer_class.copyFromDeviceToHost();
    AmplitudeMatrix.copyFromDeviceToHost();
    SignalMatrix.copyFromDeviceToHost();

    // Device Shut down
    err = cudaDeviceSynchronize();
    err = cudaDeviceReset();
    
}