Building on the initial PowerShell code, you can download it at the bottom, I used a simple similarity scoring and produced the graph below. This is a circular layout of the data, the placement of the target nodes are in the order they are passed to the AddEdge method, no weighting is done based on the calculated similarity.
I compared the votes of each the following Senators to the rest of the Senators looking for similarities. Clinton, Stevens, Kyl, Obama and Schumer. Choosing the Top 10 most similar and graphed the connections.
Six Degrees of Separation
The network graph is based on the Tanimoto Coefficient.
Cosine similarity is a measure of similarity between two vectors of n dimensions by finding the angle between them, often used to compare documents in text mining.
My interpretation, take two lists, find the intersection. Add the count of the first list to the second, subtract the count of the intersection. Take this number and divide it into the intersection count.
The earmark data lists the bill section each Senator voted for, therefore, a Tanimoto coefficient can be calculated for say, what Clinton and Schumer voted on.
Lines 1 sources/loads the code containing several functions. Line 2 transforms the data from nested hash tables to a hash and array of strings the key being the Senators last name, this is used to calculate the coefficient.
This is a spike test to see if it makes sense to continue. Using the PowerShell command line enables quick data analysis. Running the above code without the Show-Map displays the dataset including the similarity rating.
Drill down from the graph into the actual data is next. This should be straight forward hooking up the double click events of the NetMap control to PowerShell code.
Also of interest is the Tanimoto Coefficient, included in the Do-Analysis.ps1, it can be used on any list of strings in any application. Here is a version I posted using C#.