Spell Checking strings in your PowerShell scripts

May 30, 2011

in Microsoft,Peter Norvig,PowerShell,PowerShell Tokenizer,Spelling,Yahoo

Spelling misteaks irritate readers. And for most, checking spelling is a boring and error-prone job.

This is Jon Bentley’s opening line to his chapter “A Spelling Checker” in his book “Programming Pearls”.

The PowerShell Invoke-SpellCheck script loads a custom dictionary and only checks the spelling of strings in a PowerShell script.

This PowerShell script is an extension of my port of Peter Norvig’s “How to Write a Spelling Corrector”, he is Director of Research at Google. He wrote the post to examine statistical language problems, like spelling correction, used in search engines at Google, Yahoo and Microsoft.

Download the PowerShell script  HERE.

Script Highlights

  • The script parses a PowerShell file using the built-in PowerShell Tokenizer. It finds strings in the script, such as Write-Host “The quick brown fox” or $result = “Jumps over the lazy dog” and spell checks them
  • You can spell check a single file
    • Invoke-SpellCheck .\spell.ps1

     

    Misspelled Corrections              StartLine StartColumn FullName
    ---------- -----------              --------- ----------- --------
    thex       {the, them, then, they}          1          10 .\spell.ps1
    plansx     {plans}                          1          10 .\spell.ps1
    fox        {box, fix, fog, for}             2          12 .\spell.ps1
    muse       {use, ruse, must, amuse}         3          12 .\spell.ps1
  • You can spell check an entire directory of scripts
    • dir . –Recurse *.ps1 | Invoke-SpellCheck
  • This script loads/parses/indexes over 100K words, then parses the two PowerShell scripts, extracts the strings, parses them, looks up each word and proposes corrections to words not found, in under a half second
  • You can add words to the dictionary using Notepad
  • You can pass in your own dictionary of words
  • If you want to learn more about implementing a probabilistic, statistical spelling correction algorithm, visit Mr. Norvig’s post and site. Plus his latest post “On Chomsky and the Two Cultures of Statistical Learning” is an interesting read

Invoke-SpellCheck

function Invoke-SpellCheck {
    <#
        .Synopsis
            Invoke-SpellCheck reads a single or multiple
            PowerShell script files, extracts the strings in it,
            then checks each word against a dictionary
        .Description
            A Detailed Description of what the command does
        .Example
            Invoke-SpellCheck .\spell.ps1

        .Example
            dir *.ps1 | Invoke-SpellCheck
    #>
    param (
        [Parameter(ValueFromPipelineByPropertyName=$true)]
        [string]$FullName = "C:\spell.ps1",
        [string]$Dictionary="$pwd\holmes.txt"
    )            

 Begin {
        $nwords = train ([IO.File]::ReadAllText($dictionary))
 }            

 Process {
  ForEach($token in (Get-StringTokens $FullName)) {
   ForEach ($word in [regex]::split($token.Content.ToLower(), ‘\W+’) ) {
    if(!$nwords.ContainsKey($word)) {            

                    $theSet = (deletion) +
                              (transposition) +
                              (alteration) +
                              (insertion)            

                    $corrections = @()
                    foreach($item in $theSet) {
                        if($nwords.ContainsKey($item)) {
                            $corrections += $item
                        }
                    }            

                    New-Object PSObject -Property @{
                        Misspelled  = $word
                        StartLine   = $token.StartLine
                        StartColumn = $token.StartColumn
                        Corrections = $corrections
                        FullName    = $FullName
                    } | Select Misspelled, `
                        Corrections, `
                        StartLine, `
                        StartColumn, `
                        FullName
                }
            }
        }
    }
}

Download

{ 0 comments… add one now }

Leave a Comment

You can use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Contrat Creative Commons

© 2007-2012, Doug Finke