In the parallel formulation of the quicksort algorithm on shared-address-space and message-passing architectures (Section 9.4.3 ) each iteration is followed by a barrier synchronization. Is barrier synchronization necessary to ensure the correctness of the algorithm? If not, then how does the performance change in the absence of barrier synchronization?