Social Networks and Pork Barrel Spending?

by Doug Finke on October 27, 2008

I blogged about data mining US Earmarks here, here, here and here. I started wondering, is there a relationship between Senators and and the sections of the bills they voted for.

Building on the initial PowerShell code, you can download it at the bottom, I used a simple similarity scoring and produced the graph below. This is a circular layout of the data, the placement of the target nodes are in the order they are passed to the AddEdge method, no weighting is done based on the calculated similarity.

I compared the votes of each the following Senators to the rest of the Senators looking for similarities. Clinton, Stevens, Kyl, Obama and Schumer. Choosing the Top 10 most similar and graphed the connections.


Raw Data

Source  Target     Tanimoto
Clinton Levin           0.7
Clinton Isakson         0.7
Clinton Pryor           0.7
Clinton Leahy          0.71
Clinton Chambliss      0.71
Clinton Nelson         0.72
Clinton Reid           0.73
Clinton Vitter         0.75
Clinton Stabenow       0.81
Clinton Schumer        0.96
Stevens Boxer          0.67
Stevens Akaka          0.67
Stevens Menendez       0.67
Stevens Coleman        0.67
Stevens Klobuchar      0.68
Stevens Leahy          0.68
Stevens Chambliss      0.68
Stevens Schumer        0.69
Stevens Inouye         0.76
Stevens Reid           0.81
Kyl     Dorgan         0.41
Kyl     Klobuchar      0.41
Kyl     Bingaman       0.42
Kyl     Wyden          0.43
Kyl     Smith          0.45
Kyl     Coleman        0.45
Kyl     Murkowski      0.47
Kyl     Ensign         0.47
Kyl     Roberts         0.5
Kyl     Thune          0.53
Obama   Wyden          0.68
Obama   Reed            0.7
Obama   Brown           0.7
Obama   Lugar          0.71
Obama   Snowe          0.71
Obama   Collins        0.71
Obama   Roberts        0.72
Obama   Bayh           0.75
Obama   Martinez       0.75
Obama   Whitehouse     0.75
Schumer Shelby         0.71
Schumer Vitter         0.72
Schumer Durbin         0.73
Schumer Levin          0.73
Schumer Leahy          0.74
Schumer Chambliss      0.75
Schumer Reid           0.76
Schumer Nelson         0.76
Schumer Stabenow       0.79
Schumer Clinton        0.96

Six Degrees of Separation

The network graph is based on the Tanimoto Coefficient.

Cosine similarity is a measure of similarity between two vectors of n dimensions by finding the angle between them, often used to compare documents in text mining.

My interpretation, take two lists, find the intersection. Add the count of the first list to the second, subtract the count of the intersection. Take this number and divide it into the intersection count.

The earmark data lists the bill section each Senator voted for, therefore, a Tanimoto coefficient can be calculated for say, what Clinton and Schumer voted on.

PowerShell Code

Lines 1 sources/loads the code containing several functions. Line 2 transforms the data from nested hash tables to a hash and array of strings the key being the Senators last name, this is used to calculate the coefficient.

   1: . .\Do-Analysis.ps1
   2: $set = Do-Transform
   3: list Clinton Stevens Kyl Obama Schumer | 
   4:  % { Do-Compare $set $_ | select -last 10 } | 
   5:   Show-NetMap C

Next Steps

This is a spike test to see if it makes sense to continue. Using the PowerShell command line enables quick data analysis. Running the above code without the Show-Map displays the dataset including the similarity rating.

Drill down from the graph into the actual data is next. This should be straight forward hooking up the double click events of the NetMap control to PowerShell code.

Also of interest is the Tanimoto Coefficient, included in the Do-Analysis.ps1, it can be used on any list of strings in any application. Here is a version I posted using C#.


Hal Rottenberg 10.28.08 at 3:16 pm

Doug, does this mean that on pork bills, that Sens Clinton & Schumer vote almost identically? That’s how I’m reading what you are portraying here.

What about the top picture? Does it mean that Obama votes most like the Senators to which he’s linked? I’m not sure if I’m understanding that one right.

Doug Finke 10.28.08 at 4:02 pm

Hal, you got it, that’s right.

In the Raw Data you can also see that Clinton->Schumer have a Tanimoto Coefficient of .96.

1.0 is identical.

