Previous | Next --- Slide 23 of 90
Back to Lecture Thumbnails
gohan2021

Would having only 2 copies of diff here work (for current and the next)? because we only need to set the current diff to the sum of myDiff for current index and set diff for next iteration to 0 prior to the barrier. So we don't need an extra copy for the previous diff.

jaez

I have the same question ☝️.

juliewang

I believe you need to have 3 copies of diff. Consider the first round through the while loop: Everyone is updating diff[index] with their currently computed diff and clearing diff[index+1] for the next round. Then there's a barrier for everyone update diff[index]. After the barrier, threads can be doing one of three things: 1. Checking diff[index] < TOLERANCE. 2. Accumulating values into diff[index+1] 3. Clearing diff[index+2].

If there were only two copies of diff, then threads could be clearing diff[index] while other threads are still on the first cycle through the while loop and trying to update diff[index]

sanjayen

I agree with @juliewang! The key point is that right before the barrier inside the while loop, each instance sets the next index to 0.0f to prepare for the subsequent iteration. But because instances can be apart by 1 index, this can lead to interference if one instance is working on index 0 and another is working on index 1 (which will try to set index 0 back to 0.0f). It seems that we could either have an array size of 3 with one barrier or an array size of 2 with an extra barrier to protect setting the next index in diff to 0.0f. My hunch is that the former is a more efficient solution (because barrier synchronization is more expensive than allocating space for one more float).

martigp

I am slightly confused as to how the 3 diffs are used to reduce the number of barriers. Is this the two new diffs kind of prevent the necessity of synchronisation provided by two of the barriers?

gsamp

@martigp, In the previous implementation, we had 3 barriers protecting the updates of the global variables diff and done. The barriers were set in place to force all threads to agree on a single value for these variables, to avoid data races (a thread trying to set diff = 0.0f at the same time another thread is trying to set it equal to their diff += myDiff (either in a previous or future iteration of the loop). So we need three barriers: one to protect setting diff = 0.0f, another to protect setting diff += myDiff, and another to protect setting done = true.

If threads are looking at different copies of diff in the array, then we don't have a data race anymore, and we only need one barrier protecting the done global variable (which in this example has been replaced with break).

Please log in to leave a comment.