Tuesday, July 28, 2009

Order matters when adding doubles

I recently discovered (or rather, was told) that summing a set of doubles from a database query may produce different values, even if the database is in a read only state and no programmatic or system errors occurred. Struck me as a bit odd until I realized that order in which the results returned from a query are not guaranteed unless an order by is specified. As a result the sequence in which the doubles are added together may change, thus the sum of those doubles may be different!

An easy way to explain this is that if the doubles are returned in an order where large doubles appear first, the small doubles do not have time to cumulate and add up to anything meaningful. Instead these doubles may be truncated or even ignored because they individually can not impact the current rolling sum. I wanted to explore this a bit further, so i wrote a java program to generate some doubles and add them up in random, ascending and descending order.

Lesson of the day? If you're creating an application that requires all the significant figures available to you in a double either very careful in your arithmetic or use a datatype with precision to spare!

java AddingDoubles 10000 10
Sum in random order: 5.022355325590349E12
Sum in decending order: 5.022355325590318E12
Sum in increasing order: 5.022355325590351E12

Code for AddingDoubles.java below.


import java.util.Random;
import java.util.Date;

/** Adding doubles demonstrates how precision may be lost
 depending on the order in which doubles are added. This effect
 is especially true when some doubles are small relative to others.
 */
 
public class AddingDoubles {
    
    public double m_data[];
    
    public AddingDoubles(int nSmall, int nBig, int nMultiplier) {
        m_data = new double[nSmall + nBig];
        Random r = new Random();
        
        // create small doubles
        for(int i = 0; i < nSmall; i++) {
            m_data[i] = r.nextDouble();
        }
        
        // create 'large' doubles
        for(int i = 0; i < nBig; i++) {
            m_data[i + nSmall] = r.nextDouble() * nMultiplier;
        }
        
        // randomize the array
        shuffle(m_data);
    }
    
    private static void shuffle(double[] a) {
        int n = a.length;
        
        for (int i = 0; i < n; i++) {
            int r = i + (int) (Math.random() * (n-i)); 
            double temp = a[i];
            a[i] = a[r];
            a[r] = temp;
        }
    }
    
    public void sort( ) {
        java.util.Arrays.sort( m_data );
    }

    public double sum( boolean reverse ) {
        double total = 0;
        
        int i = reverse ? m_data.length - 1: 0;
        int inc = reverse ? -1: 1;
        
        while( i >= 0 && i < m_data.length ) {
            total += m_data[i];
            i += inc;
        }
        
        return total;
    }
    
    public static void main(String[] args) {
        int s = Integer.parseInt(args[0]);
        int b = Integer.parseInt(args[0]);
        AddingDoubles ad = new AddingDoubles(s,b,1000000000);
        
        System.out.println( "Sum in random order: " + ad.sum( true ) );
        ad.sort();
        System.out.println( "Sum in decending order: " +ad.sum( true ) );
        System.out.println( "Sum in increasing order: " + ad.sum( false ) );
    }
    
}

Sunday, July 5, 2009

ESXi 4.0 on the Thinkpad T61 (Intel ICH8M)

I sometimes need an ESX instance that I can potentially destroy at will. I also didn't want to have to have to purchase a supported system as the cost is many times that of normal commodity hardware. An ESXi whitebox (i.e. systems that are not on VMware's Hardware Compatibility List (HCL)) are perfect for this. ESXi found my T61 laptop's Intel ICH8M SATA controller and the 82566MM NIC. It allowed me to create a VMFS (but forced me to completely wipe the local drive, there were no options to only create the VMFS in free space). I could even create VMs and boot them. So, on the surface it all looked fine. But I quickly discovered performance was worse than expected (and I wasn't expecting much on a laptop HD), but what really doomed the setup was the that sometimes the storage would just stopped responding for minutes at a time. I'd open up storage browser and the system would just sit.

So, the summary is that yes ESXi will boot on a Thinkpad T61 (or X61), but you're going to need some NFS or iSCSI storage to create stable setup that will run VMs. I ended up switching to Hyper-V for now, but ultimately I'll move to KVM.