So as part of a recent upgrade I was performing, I upgraded a couple of Netscaler Access Gateways from version 10.1 to version 10.5. The upgrade went very smoothly, no errors, no user calls… for a while. The next day, we started receiving some calls regarding issues with launching apps via Storefront. Some users were receiving the “SSL Error 43: The proxy denied access to…” error with their STA ticket when clicking on their application icons on the web page.

Tracking down the servers based on their STA ID in the ticket, I noticed that users only had issues when they were attempting to authenticate to Windows 2012 R2 delivery controllers. The Windows 2008 R2 delivery controllers were not denying the STA requests. Jumping on one of the Windows 2012 R2 delivery controllers, I noticed the System event log was flooded with Schannel errors for Event ID 36874 (An TLS 1.2 connection request was received from a remote client application, but none of the cipher suites supported by the client application are supported by the server. The SSL connection request has failed.) and Event ID 36888 (A fatal alert was generated and sent to the remote endpoint. This may result in termination of the connection. The TLS protocol defined fatal error code is 40. The Windows SChannel error state is 1205.). Well, we obviously have an SSL issue, but these codes aren’t exactly pointing me anywhere. Looking up the error code on the RFC page for the TLS protocol (http://tools.ietf.org/html/rfc5246) I found that error code 40 is a handshake failure (you can find this in the A.3 part of the appendix in the Alert Messages section). I can’t remember where exactly I found the enum definition for the Schannel 1205 code, but it basically means that a fatal error was send to the endpoint and the connection was being forcibly terminated. At least I now knew there was an issue with the SSL handshake between the Netscalers and the Windows 2012 R2 delivery controllers. Time for some network tracing.</p> Firing up Wireshark on the delivery controller, I could see that the connection was getting immediately reset by the server after the Client Hello from the Netscaler.

Windows_2012_R2_RST_ACK

Expanding the Client Hello packet in the capture, I could see a list of ciphers currently being offered by the Netscaler. (Note – for the sake of easier troubleshooting, I left the default grouping of ciphers in place as it was a large group of widely accepted ciphers until I identified the issue and then trimmed down the cipher list. You should limit the number of ciphers available on the virtual server of your Access Gateway to just what you need and leverage the more current stronger methods available such as AES 256 over RC4 and MD5, etc. if possible.)

Cipher suites

Next, I configured the SSL Cipher Suite Order on the windows server to match what the Netscaler was presenting in the Client Hello packet, at least the top 10 or so. This can be done using either gpedit.msc for local policy or via the Group Policy Management Console as follows:

  1. In either editor, expand Computer Configuration/Administrative Templates/Network.
  2. Click on SSL Cipher Suite Order in the SSL Configuration Settings
  3. Select the Enabled option and then follow the instructions in the Help section of the policy. Basically, all the ciphers you want will be listed on a single line separated by commas with no spaces anywhere.
  4. You must reboot the server for the changes to take effect.

SSL Cipher Order Policy

Even after the reboot, the SChannel errors were still present and the network captures were still showing the handshake failing due to a reset from the server. I’ll save you the time you will spend on re-ordering the ciphers on both the Netscaler and the Windows Server 2012 R2 Delivery Controller along with the multitude of reboots that go with it; it simply won’t work (at least at the time I published this). I stepped back and decided to try tweaking the TLS protocol versions since I wasn’t getting anywhere with the cipher suites (key exchange algorithms). For the sake of brevity, after much additional testing, headbanging, and googling I was able to get the handshake to work when I disabled TLS 1.2 on the Windows 2012 server. This forced the server to renegotiate using TLS 1.1 with the Netscaler which worked with the cipher suites I tested with that were supported by both the OS and the Netscaler. I did find a nice article supporting this here for additional reference.

To disable TLS 1.2 on the server, you need to modify a registry key:

  1. Go to HKLM\SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL\Protocols.

  2. If the TLS 1.2 key does not exists, create it.

  3. Inside the TLS 1.2 **key, create another key called **Client.

  4. Within the Client key, create two REG_DWORD values:

    a. DisabledByDefault (set value to 1).

    b. Enabled (set value to 0).

You will need to reboot one more time for the changes to take effect. This finally cleared up my SChannel errors as well as allowed me to add the controllers back as STA’s in the virtual server; in a green status this time.

A few examples of using regular expressions in group targeting in SCOM. String pattern matching (?i:fs) – This simple pattern will match “fs” in any string and is case-insensitive. For example, DFW-FS01 would match. The parenthesis and question mark stipulate a non-capturing group. A capture group stores regex matches for use later in the expression. Since we don’t need to do anything with the match, the non-capturing group makes more sense and is optimized for this case. The i: after the question mark is a modifier that stipulates a case-insensitive match. This would effectively match fs, Fs, fS, FS. (?i:fs|ps) – This expands on the previous example to match alternatives in the non-capturing group. Let’s say we wanted to add both file servers and print servers into a group expression. This example would match both DFW-FS01 and DFS-PS01. Think of the pipe symbol like an “or” conditional operator. (?i:[pf]s) – This is another way to get the same results as the previous example. It produces the same matching results. In this case we know that our file servers will be either FS01 or PS01 so we put the “p” and the “f” in brackets which means match either “p” or “f” immediately followed by “s”. (?i:[a-z][a-z][a-z]-sql-cl[\^c-z]) – We can also match by character ranges and exclude characters as well. The brackets with a-z inside mean to match any single character “a” through “z. The caret inside the last bracket negates the match so this would mean match any single character except a character “c” through “z”. This would match something like DFW-SQL-CLA, but not DFW-SQL-CLD.

Number pattern matching Parenthesis, brackets, etc still have the same function with numbers. IP ranges are a good example of common pattern matching in SCOM; some examples are as follows: **^([0-9]{1,3}).([0-9]{1,3}).([0-9]{1,3}).([0-9]{1,3})$ - **While not the best method, if you are not worried about validating that a valid number was entered in any octect, this is a simple match for an IPv4 pattern. Let’s break this down.

  1. The caret at the beginning means that this is the start of the string, there should be no characters or digits before this in the match.
  2. ([0-9]{1,3}) is a capturing group similar to what we used in the string matching earlier with the parenthesis. The [0-9] means to match any single digit between 0 and 9. The {1,3} means to repeat the match 1 to 3 times. This is how we match one octet regardless of if there are one, two, or three digits.
  3. The . or “backslash dot” is how we match the dot between octects. The “dot” is called a meta-character or special character which is used to match any single character in an expression. Since we want it to actually match the “dot” we use the “backslash” to escape and tell the expression to match the “dot” exactly, not as a meta-character.
  4. We then repeat this pattern again for each octet coming to the $ dollar sign at the end. This simply means that this should be the end of the string and no more characters or digits should come after it. Since we are matching an IP in this example expressly, we don’t expect to see anything afterwards. If you needed to match an IP address as part of a string or sentence where you expect characters after the IP address, simply remove the dollar sign.

As I mentioned earlier, this is a quick method to match an IP address; however, it will also match 999.999.999.999 which doesn’t fall into any IPv4 scheme I’ve worked with. Let’s say we want to match a specific IP address on any particular class B network. The following would meet that criteria: ^(10).([0-9]|[1-9][0-9]|1[0-9][1-9]|2[0-4][0-9]|25[0-5]).(10).(250). We already understand the first, third and fourth octet, but what about the second? Let’s split it up between the pipe: [1-9] – match any digit between zero and nine. This takes care of single digits. [1-9][0-9]  - match anything between 10 and 99. 1[0-9][1-9] – match anything between 100 and 199. 2[0-4][0-9] – match anything between 200 and 249. This is where we would limit to allowed IPv4 range. 25[0-5] – match anything between 250 and 255. That’s it. That will cover any number between 1 and 255 in the second octect.

Another example might be to match anything on the 172.24.x.0 network; but the last octet had to match 224 or 225. This would look something like this: ^(172).(24). ([0-9]|[1-9][0-9]|1[0-9][1-9]|2[0-4][0-9]|25[0-5]).(224|225)$

String and number pattern matching What if we had multiple four node clusters using a naming convention similar to DFW-SQLCL1, DFW-SQLCL2, DFW-SQLCL3, DFW-SQLCL4 and we wanted to group only the first and second nodes from all sites into a group? We would use the following expression: (?i:[a-z][a-z][a-z]-sqlcl[1|2]) or another way to shorten it would be (?i:[a-z]{1,3}-sqlcl)[1|2].

It helps to use a regular expression tester when working with these. A couple of good ones are https://regex101.com

I was recently asked to help the DBA and Storage teams with an issue related to backup authentication. From what I was told, they had been testing different authentication methods to access a Data Domain device as backup target on several SQL clusters. When attempting to normalize everything using a single account and authentication mechanism, they were running into authentication issues getting back to the share on the Data Domain due to cached  Kerberos tickets. The vendor recommended that they purge the Kerberos cache on each of the devices to clear the tickets. The kicker was that there were quite a few servers involved in this issue so logging on and manually running klist.exe would have been fairly time consuming. The DBA’s were not very keen on my first suggestion to just remotely reboot the passive nodes and let clustering work it’s magic. They responded by calling me crazy and making absurd claims about production outage this and change control that, etc. Geesh (chuckle)!

Having been shot down as a cluster-reboot-comedian, I threw together the following script to remotely run klist on each of the servers via Invoke-Method:

<#
	.SYNOPSIS
		Deletes all current kerberos tickets on specified machines
	
	.DESCRIPTION
		Uses klist.exe to purge Kerberos tickets on designated servers/workstations.
	
	.PARAMETER Targets
		String array of computer names.
	
	.EXAMPLE
		PS C:> .Remove-KerbTickets -Targets Server01, Server01, Server03

	.EXAMPLE
		PS C:> $arr = Get-ADComputer -LDAPFilter "(name=*FS01)" | Select-Object -ExpandProperty name
	 	PS C:> $arr | .Remove-KerbTickets.ps1
#>
	
[CmdletBinding()]
param
(
	[Parameter(Mandatory = $true,
			   ValueFromPipeline = $true)]
	[string[]]$Targets
)

process {
	$CurrentSessions = @()
	
	$scriptcontent = { param ($SessionItem); klist -li $SessionItem purge}
	
	foreach ($Server in $Targets) {
		$Error.Clear()
		$CurrentLogonSessions = Get-WmiObject -ComputerName $Server -Class Win32_LogonSession -ErrorAction SilentlyContinue
		
		if (!$Error) {
			foreach ($Session in $CurrentLogonSessions) {
				$UserID = [convert]::ToString($Session.LogonID, 16) # Convert the LogonID value from decimal to hex
				$UserID = '0x' + $UserID # Append hex char to the string
				$CurrentSessions += $UserID # Add string to the array
			}
			
			foreach ($SessionItem in $CurrentSessions) {
				$Error.Clear()
				$results = Invoke-Command -ComputerName $Server -ScriptBlock $scriptcontent -ArgumentList $SessionItem -ErrorAction SilentlyContinue
				If ($Error[0].Exception.Message -match "The client cannot connect") {
					Write-Host "$Server - Unable to connect via WinRM"
					break
				}
				<# Example placeholder to handle klist errors				
				if ($results -match "0xc000005f") {
					Write-Host "Session no longer exists or may have been terminated."	
				}
				#>
				if ($results -match "purged") {
					Write-Host "$Server - Ticket(s) purged"
				}
			}
			$Error.Clear()
		}
		else {
			Write-Host "$Server - WMI Error: $($Error[0].Exception.Message)"
		}
	}
	$CurrentSessions.Clear()
}

I put these steps together using information and videos from this site http://blogs.technet.com/b/xdot509/.

  1. On the 2008 CA, create the following directory structure:

    a. C:\CABackupDatabase
    b. C:\CABackupConfig

  2. Open the Certification Authority snap-in, right click on the CA name, and select Properties.
    alt text

  3. Click Next when the wizard launches.

  4. At the next screen, check the boxes next to Private key and CA certificate and Certificate database and certificate database log. Select the C:\CABackupDatabase folder created in Step 1. Click Next.alt text

  5. Enter a password to secure the private key and CA certificate. Click Next.

  6. Verify that the certificate and database files were exported in the target directory.

  7. Export the **HKLMSYSTEMCurrentControlSetServicesCertSvcConfiguration** key to the *C:\\CABackupConfig* directory created in Step 1.
    ![alt text](http://assets.afinn.net/ca_config_regkey-1.png "ca_config_regkey")

  8. Copy the C:\CABackup directory to the Windows 2012 server.

  9. I maintained the same servernames for my CA’s when I migrated so we’ll do the same here. Rename the Windows 2012 CA server to the same computer name as the 2008 root CA. As these are both offline and not joined to a domain, there shouldn’t be any naming collision in AD. If there is a static A record in DNS, that will need to be updated if the IP address is different.

  10. Add the **Active Directory Certificate Services **role to the Windows 2012 server.

  11. After restarting, finish the configuration of the ADCS.

  12. Select Certificate Authority, then Next.
    alt text

  13. Accept the default option for Standalone CA as the server is not domain joined and click

  14. Select Root CA as the CA Type and click Next.

  15. At the Specify the type of the private key screen, select Use existing private key and the option Select a certificate and use its associated private key. Click Next.
    alt text

  16. Click the Import **button and browse to the pfx certificate backed up from the Windows 2008 root CA. Enter the password then click It will take a few seconds for the certificate to appear, when it does, select the certificate name and click **Next.
    alt text

  17. Accept the defaults for the database and log locations or specify a different location. Click Next.
    alt text

  18. Verify the settings in the Confirmation page and click the Configure

  19. When the process completes, launch the **Certification Authority **mmc and verify the CA is visible and started.

  20. Right click on the name of the CA and select All Tasks/Restore CA.

  21. Click OK to stop the services.

  22. When the wizard launches, select the checkbox next to Certificate database and certificate database log. Browse to the location of the database files backed up from the Windows 2008 Root CA. Click Next.
    alt text

  23. Click Finish to start the process.

  24. Select Yes to start the services when prompted.

  25. On the Windows 2012 root CA, export the same registry key from Step 7.

  26. Go the C:\CABackupConfig folder on the Windows 2012 root CA server, right click on the .reg file and select Merge.

  27. In the Certification Authority mmc, right click the server name and stop, then start the services one more time.

  28. This will create a new CRL which can be copied to the CRL Distribution Points.

  29. If any scripts were previously copied over, they may be restored now.

One of the items currently on my plate is to move the PKI infrastructure from the Windows 2008 Servers which were also upgraded from Windows 2003) to Windows 2012 servers. I plan to break this process down into the following steps:

  1. Remove stale requests and certificates and defrag/compact the databases on the Subordinate CA’s to remove the whitespace.
  2. Backup the offline Root CA certificate, keys, and database and restore to the Windows 2012 offline Root CA.
  3. Backup the certificates, keys, database, custom scripts on each of the Subordinate CA’s and then restore to the Windows 2012 Subordinate CA’s.

This post will cover the easy part which is preparing the cleaning up and shrinking the databases. Since I have to do this more than once and will need to occasionally perform ongoing cleanup maintenance on the databases at later dates, I put the process into a simple batch file shown below. I added a lot of inline documentation because this isn’t something that is done very often and I don’t want to keep pulling up old articles and documentation explaining what I did the last time, plus I plan on handing this off to someone else for maintenance once the migration is completed so the script documentation will help that individual(s) as well.

As a side note, the certutil -deleterow process will appear to throw errors in the exit codes after it displays how may rows were affected (deleted). This is expected as there is a limitation on how many records the utility can delete at one time and if you have more records to delete than the limitation, the process triggers an error. The script below uses a loop to cycle through the process until all the specified records have been removed at which time it will complete with an exit code of 0 (zero).

@echo off
SETLOCAL ENABLEEXTENSIONS

:: *** SET THE TARGET DATE ***
REM Change the following date to reflect the date from which all prior
REM records will be removed. For example: 1/15/2014 will remove all records
REM BEFORE 1/15/2014.
SET targetdate=12/31/2014

:: *** SET THE RECORD TYPE ***
REM Change the following to reflect the type of record to be deleted.
REM I have only tested with Request and Cert but the following are available:
REM (returned from certutil -deleterow -?
REM Request - Failed and pending requests (submission date)
REM Cert    - Expired and revoked certificates (expiration date)
REM Ext     - Extension table
REM Attrib  - Attribute table
REM CRL     - CRL table (expiration date)
SET requesttype=cert

:: *** SPECIFY BACKUP OPTION ***
REM Set the following variable to "true" (no quotes) to enable backups.
SET performbackup=false

:: *** SPECIFY BACKUP TARGET LOCATION ***
REM Specify the directory path where the backup will be stored.
REM If the performbackup variable is false, this value has no effect.
SET backuploc=C:CertDBBackupDir

:: *** SPECIFY DEFRAGMENTATION OF DATABASE ***
REM Set the following variable to "true" (no quotes) to compact the database.
REM NOTE - The certificate services will be stopped for this process as the
REM CA cannot be online during compaction.
SET performdefrag=false

:: *** SPECIFY TEMP DATABASE LOCATION ***
REM Specify the directory path where current CA database is located.
REM When the process completes, the current (fragmented) database
REM will be copied to a new location and renamed with a random name.
REM If the performdefrag variable is false, this value has no effect.
SET currentdb=C:PathToCurrentCA_Database.edb

REM The certutil.exe utility has a set number of records it can delete at
REM a given time and will stop when that value is reached. To keep the
REM process continuing until all specified records are removed, we
REM implement a loop. Depending on how many records will be deleted, this
REM process can take a LONG time to complete.
:TOP
certutil -deleterow %targetdate% %requesttype%
if %ERRORLEVEL% EQU -939523027 goto TOP

REM Perform backup if specified earlier in the script.
IF "%performbackup%"=="true" (
	start /wait certutil -backupDB %backuploc%
)

REM Perform database compaction if specified earlier in the script.
IF "%performdefrag%"=="true" (
	net stop CertSvc
	start /wait esentutl /d %currentdb% &gt; results.log
        net start CertSvc
)

ECHO.
ECHO Process has completed. If database compaction was selected,
ECHO you must copy the database from the temporary location back 
ECHO to the original database location and rename it to match the
ECHO original database file name. When completed, restart the 
ECHO Certificate Services service.
ECHO .