The excitement of solving mysteries makes me bad at math
In my last post, I failed to notice that a HugePages_Total of 4645, while satisfyingly greater than zero, is definitely not 8196, the desired and expected value I had configured. Upon reflection, the reason was pretty obvious: the OS will only allocate contiguous blocks of memory for hugepages, and the server had been up for long enough that memory was a bit fragmented. One reboot later, things were much better:
sles10db:~ # cat /proc/meminfo | grep -i huge HugePages_Total: 8196 HugePages_Free: 8196 HugePages_Rsvd: 0 Hugepagesize: 2048 kB
Autoconfig giveth, and Autoconfig taketh away
When an E-Business Suite system is behaving in a way that suggests one or two mangled (some say "corrupt," but I try not to be judgmental) configuration files, a common first response is to run Autoconfig to see if the system can sort itself out. Running Autoconfig is also frequently necessary during the course of normal maintenance, e.g. during some patching, or when renaming/relocating Oracle Apps nodes. This weekend, I was reminded of an interesting side effect of running Autoconfig. After the process was complete, Grid Control started reporting metric collection errors and "Unknown" status for some components on the application tier nodes. In this case, Autoconfig was just doing its job, generating new versions of configuration files for the apps tier, complete with their original file permissions, which overwrote the permission changes I had made to enable the Grid Control agent to monitor some of the OC4J components. Good thing I keep that list of chmod commands in an easy-to-find reference location.
Oh yeah, and...
I also learned that using wireshark to try to debug a problem with a connection to an Oracle database over VPN is kind of like trying to listen to AM radio in a tunnel. You can tell there's signal there, but good luck making anything of it.