Correlation of stories and story points

At the OOP 2012 conference I attended a session “Story Points considered harmful” by Vasco Duarte and Joseph Pelrine. The main hypothesis of them was that the sum of story points correlate highly with the number of stories so that it should be more than enough to work with the number of stories only and dismiss the use of story points completely.
I did some spreadsheet experiments with a huge number of possible story point sets. The spreadsheet created these sets for any possible combination of adjacent story points out of the base set “0.5  1  2  3  5  8  13  20  40  100”.
Examples:
• 100 sprints with random velocities based on 1, 2, 3, and 5 (= four adjacent story point values)
• 100 sprints with random velocities based on 3, 5, and 8 (= three adjacent story point values)
• 100 sprints with random velocities based on 13 and 20 (= two adjacent story point values)
• and so on
Each of these combinations has two aspects:
• x = number of adjacent story point values
• y = min/max ratio of story point values, which is the relative difference of the largest and the smallest available story points
Take a look at the following chart. The size of the bubbles is the average correlation of “Number of Stories” and “Sum of Story Points (Velocity)” for 100 sprints full of randomly distributed story points.
There are 45 distinct combinations, i.e. the calculated values are based on 4,500 simulated sprints.
Interpretation of the results:
• fewer available story point values lead to higher correlation
• the more story point values are available, the lower is the correlation
• using the full set of the “traditional” story point values (from 0.5 to 100) leads to the lowest correlation
• higher “density” (min/max ratio) of available story point values leads to higher correlation
• wide spread story point values lead to lower correlation (= “epic” uncertainty factor)
• using only the set of “2” and “3” story point values lead to an average correlation of 0.96 (minimum correlation = 0.93) which is the maximum of all average correlations calculated
My suggestion is as follows:
• Do not use story points for release planning. Use other size and epic indicators like e.g. t-shirt sizes for release planning. Reason: a release backlog contains both small and epic items which leads to wide spread story point values. As soon as you work with real numbers people tend to sum them up and do all kinds of arithmetics with them. Unfortunately you can’t predict that a 40 point item really takes 20 times the effort of a 2 point item. This is an useless traditional approach leading to false security of project planning. Change your mind set and accept the empirical process control of agile planning and estimation methods.

What is your opinion and experience working with story points? I’m looking forward to getting your feedback.