Resolving a pesky ORA-12545 during EBS patching

I'm working with a client for the next few weeks to help them meet baseline patching requirements for EBS 11i Extended Support. You know what that means:

  1. A bazillion browser tabs
  2. An SR or two
  3. Piles of patch READMEs
  4. Resolving prerequisites until your eyes cross
  5. Memorizing individual patch numbers, whether you like it or not
  6. The reward for all the hard research work: "15 jobs running, 235 ready to run, and 50682 waiting."
  7. Sporadic ORA-12545 errors in the adworker logs

Yeah, okay, #7 caught me a bit by surprise, too. The really tricky bit was that the worker wouldn't usually be marked as "Failed" when it received an ORA-12545. Instead, it would stay in a "Running" state, and eventually the entire patch session would go idle, until I manually restarted the errored worker. There is no shortage of troubleshooting notes for ORA-12545, but the essential condition is that the client cannot resolve the hostname of the database server. Considering how sporadic the errors were, this was more than a bit strange. So we tested client connectivity using the various permutations of the EBS database hostnames, verified that there wasn't any weirdness in the RAC listener setup, and still the problems persisted.

At a high level, it's always disturbing to have sporadic network wonkiness. More practically, it wasn't very much fun to contemplate writing production patching instructions that read, "Babysit the adpatch session very closely. Manually restart any workers that fail with ORA-12545." (I could also note that it's not very much fun to have a multi-hour patch test run grind to a halt 10 minutes after leaving the office for the day, but that would be whining, so I won't do that). I was just starting to consider running SQLNet traces to try to capture more information about this 1-in-1000 chance error condition, when one of the client's DBAs mentioned that the test systems and the DNS server were in separate data centers, connected over a WAN.

Rather than suggesting that we march over to the network admins and claim that DNS requests were being dropped on the floor (what a great way to make new friends!), and lacking the time to put together an easily-reproducible test to back up such a claim, I recommended adding the hostnames and IPs of the database servers to the apps tier's /etc/hosts file. All subsequent patch runs were free of ORA-12545 errors, which left me free to concentrate on other, more interesting errors. I'm not a big fan of solution-by-assumption, but in this case I was able to carry the work forward, and the investigation into potential DNS problems can wait for a quieter time.

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*