Oct 6, 2012

Java XOR Swap Performance

Hey,
just quick post, because I was working to polish my min heap and checked about different swap techniques.

From my experience I know there are two different versions to swap two integers in an array.

  1. XOR swap algorithm
  2. Swap using temporary variable

Since swapping is used pretty much everywhere, I decided to micro benchmark these against each other using Caliper.

I've seen many people using the XOR algorithm lately, I don't know if they know that this is inefficient. Many people seem to do not care about this, because bitshifting makes them kinda look cool probably?

However, here is my code so we can sort out the performance of both pretty easy:


public class SwapBenchmark extends SimpleBenchmark {

  @Param({ "10", "100", "1000", "10000", "100000", "1000000", "10000000", "100000000" })
  private int size;

  @Param
  SwapType type;

  public enum SwapType {
    XOR, TMP
  };

  int[] array;

  @Override
  protected void setUp() throws Exception {
    array = new int[size];
    Random r = new Random();
    for (int i = 0; i < size; i++) {
      array[i] = r.nextInt();
    }
  }

  public void timeSwap(int reps) {
    for (int rep = 0; rep < reps; rep++) {
      int sum = 0;
      for (int i = 0; i < size; i++) {
        final int x = i;
        final int y = size - i - 1;
        if (type == SwapType.XOR) {
          array[x] ^= array[y];
          array[x] ^= (array[y] ^= array[x]);
          sum += i;
        } else {
          int tmpIndex = array[x];
          array[x] = array[y];
          array[y] = tmpIndex;
          sum += i;
        }
      }
      System.out.println(sum);
    }

  }

  public static void main(String[] args) {
    Runner.main(SwapBenchmark.class, args);
  }
}

So basically I'm swapping n-times in an n-size array, where n ranges from ten to 100,000,000 which is really big.

The result isn't very exciting, I used my Nehalem i7 with 3,3GHZ and latest Java7u7:

      size type        us linear runtime
       10  XOR      2,90 =
       10  TMP      2,84 =
      100  XOR      3,06 =
      100  TMP      2,85 =
     1000  XOR      3,92 =
     1000  TMP      3,52 =
    10000  XOR     20,21 =
    10000  TMP     14,22 =
   100000  XOR    183,33 =
   100000  TMP    118,57 =
  1000000  XOR   1822,98 =
  1000000  TMP   1192,19 =
 10000000  XOR  19401,65 ==
 10000000  TMP  13266,78 ==
100000000  XOR 194173,73 ==============================
100000000  TMP 134364,67 ====================


The TMP swap is much more efficient in every case. Why is this?
Wikipedia states the following:

On modern CPU architectures, the XOR technique is considerably slower than using a temporary variable to do swapping. One reason is that modern CPUs strive to execute instructions in parallel via instruction pipelines. In the XOR technique, the inputs to each operation depend on the results of the previous operation, so they must be executed in strictly sequential order. If efficiency is of tremendous concern, it is advised to test the speeds of both the XOR technique and temporary variable swapping on the target architecture.

There is actually nothing more to add. So please don't do any pre-mature optimization!
Even if it looks cool to do a bit-shifting trick ;)