I had an XML file with names that I needed to verify in a database table, so I thought this would be a perfect task for PowerShell. The actual format of the XML file looked like this:
<menu>
<id>symbol:U_F80_T410.ADDITIONS</id>
<name>ADDITIONS</name>
<click>True</click>
<en>True</en>
</menu>
Obtaining the XML nodes was simple with PowerShell. I need to find the symbol name, so I selected the <menu> nodes that had <id> nodes starting with 'symbol:' and extracted the name with Substring():
$menuSymbols = $xml.SelectNodes("//menu[contains(id, 'symbol:')]") | % { $_.id.Substring(7) }
Next I need the tag name of each symbol (U_F80_T410 in the sample above). Some symbols don’t have tag names, and tag names can have multiple symbol names, e.g. U_F80_T410.ADDITIONS and U_F80_T410.OVERVIEW.
Method 1: Declare array first, then use the | % ForEach-Object alias
$menuTagNames = @()
$menuSymbols | % {
$index = $_.IndexOf('.')
if ($index -gt 0) {
$tagName = $_.Substring(0, $index)
if ($menuTagNames -notcontains $tagName) {
$menuTagNames += $tagName
}
}
}
I thought there must be a more concise means to get the unique symbol names. A quick look at stackoverflow showed 'select -uniq' and 'sort -uniq' as methods to remove duplicates in PowerShell.
Method 2: Using Sort -Unique
$menuTagNames = $menuSymbols | % {
$index = $_.IndexOf('.')
if ($index -gt 0) { $_.Substring(0, $index) }
} | Sort -Unique
It should be noted that Method 2 does not always return an array, but instead it depends how many items are returned from the pipeline. If no items are returned, the result will be $null. If one item is returned, the result will be a string since the underlying data type is string. If two or more items are returned, the result will be an array.
One other point to mention is that the underlying .NET data type for the PowerShell [array] is System.Array which is not optimized for large results or for searching for items in the list. I haven’t tested the performance aspects in this scenario, but the [hashtable] should theoretically be more efficient however it requires more memory and more code. I started to convert Method 2 into a hashtable:
Method 3: Method 2 using a hashtable
$menuTagNames = @{}
$menuTagNames = $menuSymbols | % {
if ($_.IndexOf('.') -gt 0) {
$_.Substring(0, $_.IndexOf('.'))
}
} | Sort -Uniq | % $h.Add($_, $_)
I like Method 3, however it could be optimized further because the items in the 'if' conditional could be tested for duplicates immediately using the hashtable ContainsKey() method:
Method 4: Back to Method 1 except with a hashtable
$menuTagNames = @{}
$menuSymbols | % {
$index = $_.IndexOf('.')
if ($index -gt 0) {
$tagName = $_.Substring(0, $index)
if ( !$menuTagNames.ContainsKey($tagName) ) {
$menuTagNames.Add($tagName, $tagName)
}
}
}
The more I play with PowerShell, the more I like it.