NOW they’re fixed…

A few of our test EBS instances had developed a well-deserved reputation of taking a long time to start up. Here’s what we saw:

  1. Database ground to a near-halt shortly after running adstrtal.sh
  2. Most concurrent manager processes were being marked as “dead” in the internal manager log not long after startup, and the internal manager attempted to restart them.
  3. Consequence of #2: Three to four times as many FNDLIBR processes running on the app tier server as expected.
  4. Consequence of #2 and #3: The “Active sessions” graph in Grid Control resembled Mt. Kilimanjaro, and all three database server load average numbers (1, 5, and 15-minute) on were over 100.

Needless to say, neither users (who could not connect to the test instance), developers (who could not connect to the Apps database), nor DBAs (who had to answer endless “when will the instance be up?” emails) were very happy.

The culprit? This little query, executed for each FNDLIBR process as it started up, generating a ridiculous number of “control file sequential read” waits along the way:

SELECT count(*)
  FROM v$thread;

When I took a closer look at the query, this is what I found:

SQL> select /*+ gather_plan_statistics */ count(*)
  2  from v$thread;

  COUNT(*)                                                                                                              
----------                                                                                                              
         2                                                                                                              

Elapsed: 00:00:11.17
SQL>  select *
  2  from table(DBMS_XPLAN.DISPLAY_CURSOR(NULL, NULL, 'ALLSTATS LAST'));

PLAN_TABLE_OUTPUT                                                                                                       
------------------------------------------------------------------------------------------------------------------------
SQL_ID  057v054v2svhp, child number 0                                                                                   
-------------------------------------                                                                                   
select /*+ gather_plan_statistics */ count(*) from v$thread                                                             

Plan hash value: 3150894624                                                                                             

------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name            | Starts | E-Rows | A-Rows |   A-Time   |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |                 |      1 |        |      1 |00:00:11.16 |       |       |          |
|   1 |  SORT AGGREGATE           |                 |      1 |      1 |      1 |00:00:11.16 |       |       |          |
|   2 |   NESTED LOOPS            |                 |      1 |      1 |      2 |00:00:00.37 |       |       |          |
|   3 |    MERGE JOIN CARTESIAN   |                 |      1 |      1 |   3969 |00:00:00.17 |       |       |          |
|*  4 |     FIXED TABLE FULL      | X$KCCTIR        |      1 |      1 |     63 |00:00:00.09 |       |       |          |
|   5 |     BUFFER SORT           |                 |     63 |      1 |   3969 |00:00:00.08 | 73728 | 73728 |          |
|   6 |      FIXED TABLE FULL     | X$KCCCP         |      1 |      1 |     63 |00:00:00.05 |       |       |          |
|*  7 |    FIXED TABLE FIXED INDEX| X$KCCRT (ind:1) |   3969 |      1 |      2 |00:00:11.00 |       |       |          |
------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):                                                                     
---------------------------------------------------                                                                     

   4 - filter("TR"."INST_ID"=USERENV('INSTANCE'))                                                                       
   7 - filter(("RT"."INST_ID"=USERENV('INSTANCE') AND "RTNLF"<>0 AND "TIRNUM"="RTNUM" AND "CPTNO"="RTNUM"))

That’s a lot of work to do for 2 rows. Clearly the internal manager was expecting a faster start time from its children, which explains why it kept attempting to start new ones. Repeatedly. Until we had over 200 FNDLIBR processes running instead of our expected 80-ish.

After gathering fixed object statistics, the query behaved a bit better, and we therefore expect that our “slow to awaken” instance should be a bit more speedy:

SQL> exec dbms_stats.gather_fixed_objects_stats

PL/SQL procedure successfully completed.

Elapsed: 00:03:11.45
SQL> select /*+ gather_plan_statistics */ count(*) from v$thread
  2  ;

  COUNT(*)                                                                                                              
----------                                                                                                              
         2                                                                                                              

Elapsed: 00:00:00.03
SQL>  select *
  2  from table(DBMS_XPLAN.DISPLAY_CURSOR(NULL, NULL, 'ALLSTATS LAST'));

PLAN_TABLE_OUTPUT                                                                                                       
------------------------------------------------------------------------------------------------------------------------
SQL_ID  f1pmbmcstp1rj, child number 0                                                                                   
-------------------------------------                                                                                   
select /*+ gather_plan_statistics */ count(*) from v$thread                                                             

Plan hash value: 93051267                                                                                               

-----------------------------------------------------------------------------------------------                         
| Id  | Operation                  | Name             | Starts | E-Rows | A-Rows |   A-Time   |                         
-----------------------------------------------------------------------------------------------                         
|   0 | SELECT STATEMENT           |                  |      1 |        |      1 |00:00:00.02 |                         
|   1 |  SORT AGGREGATE            |                  |      1 |      1 |      1 |00:00:00.02 |                         
|   2 |   NESTED LOOPS             |                  |      1 |      2 |      2 |00:00:00.02 |                         
|   3 |    NESTED LOOPS            |                  |      1 |      2 |      2 |00:00:00.01 |                         
|   4 |     FIXED TABLE FULL       | X$KCCRT          |      1 |      2 |      2 |00:00:00.01 |                         
|   5 |     FIXED TABLE FIXED INDEX| X$KCCCP (ind:1)  |      2 |      1 |      2 |00:00:00.01 |                         
|   6 |    FIXED TABLE FIXED INDEX | X$KCCTIR (ind:1) |      2 |      1 |      2 |00:00:00.02 |                         
-----------------------------------------------------------------------------------------------                         

LISP

After my recent public demo of Oracle E-Business Suite Diagnostics, people at my current employer are more interested in using the tool. In the course of applying the most recent patches to the IZU product, I came across a few quirks that I thought I'd share. Please note that these are 11i-related issues, so the relevance half-life is short.

First, if you have the right combination of patches, you may find that wildcard searches for input fields no longer work. We first noticed this behavior last fall, after applying the July 2010 CPU patch. The workaround is to search without wildcards (just leave the field blank), though that can get unwieldy for large result sets. This issue was recently added to Note 235307.1, which states that there is no known solution at this time, but if you're running 11i, patch 1004979 resolves the problem. Note that this is an AD patch, not an IZU patch, but the payload is pretty light.

I also discovered, after applying all of the latest patches for 11i Diagnostics, that some test submissions were failing with "Page not found" errors. The corresponding errors in the Apache logs looked like:

[Fri Feb 31 16:12:74 1942] [error] [client xx.xx.xx.x] mod_security: Access denied with code 400. 
Pattern match "!^([-_@|#!=A-Za-z0-9/ :.$]){0,255}([-_@|#!=A-Za-z0-9/ :.$]){0,255}$" at ARGS_NAMES. 
[hostname "fakehost.goofynet.org"] [uri "/OA_HTML/jtfqaadv.jsp"] [unique_id TfKkxAoAZtUQAHE3Eeg]

The ARGS_NAMES filter in IAS_ORACLE_HOME/Apache/Apache/conf/security.conf, intended to filter out potentially-nasty values in URLs, looks like this:

SecFilterSelective ARGS_NAMES "!^([-_@|#!=A-Za-z0-9/ :.$]){0,255}([-_@|#!=A-Za-z0-9/ :.$]){0,255}$"

That looks like a reasonably permissive list, but there are (two) ((very)) (((important))) ((((characters)))) (((((missing))))). Any test with parentheses in the input field names (yes, the names of the fields, not the values) will fail this filter, leading to the errors described about. In my case, I was looking at AOL setup tests, and a few of them have "Timeout (seconds)" fields.

A My Oracle Support search turns up a patch that is supposed to fix this problem (10324904), but the tests I was running did not receive updates in that patch; the parentheses were still in the field names. During a short consultation with support, I was reminded of the chances (low) that a minor bug logged against 11i Diagnostics would get very much attention at this point in the product lifecycle. Instead, I deployed the workaround described in Note 1313128.1. While the note states that changes to the ARGS_NAMES filter will not survive AutoConfig runs, it's easy enough to make the changes stick by creating a custom template for security_ux_ias1022.conf in FND_TOP/admin/template/custom, and changing the ARGS_NAMES filter as follows:

SecFilterSelective ARGS_NAMES "!^([-_@|#!=A-Za-z0-9/ ():.$]){0,255}([-_@|#!=A-Za-z0-9/ :.$]){0,255}$"

Oracle Apps DBAs, have you empowered your users?

Last week, I gave a talk to my local Oracle Applications User Group about the value of the Oracle E-Business Suite Diagnostics product (IZU). Feedback was positive, though I was still somewhat surprised that I'd been asked to discuss the topic in the first place. I've been a fan of the tool since 2006, so I've fallen into the trap of assuming that everyone knows about it already. My audience was primarily business analysts and privileged end-users, so I didn't get into technical nuts and bolts, not that there are very many. My general message was, "Here's a free tool from Oracle to help you track down EBS problems in your area of responsibility, all within the scope of the system privileges you already have."

For Apps DBAs, the same message applies: It benefits you to provide this easy-to-configure, easy-to-use EBS product to your business analysts and "Power Users." They'll get access to diagnostic tests that will help them to identify problems and potential solutions (with period close, errant sales orders, wayward invoices, stuck workflows, etc) before they even have to involve Oracle Support. This, in turn, can engender a sense of ownership and understanding of the working of the EBS products.

If the warm fuzzies derived from having empowered users aren't quite enough to motivate installing EBS Diagnostics, how about some enlightened misanthropy? With EBS Diagnostics, properly trained and encouraged users may be more likely to leave you alone. :-) Consider the usual opening salvo from Oracle Support when an EBS service request is raised:

  1. "Please run ACT/RDA/Diagnostic Apps Check for the product in question and upload the results"
  2. "What database version/applications version/product patchset are you running?"
  3. "What version of file X is installed on your system?"
  4. "Have you applied any patches recently?
  5. "Please check the value of profile option Y"

Ordinarily, unless you have very lax security rules, answering these questions (except for #5, and maybe #2) requires input from a DBA, sysadmin, or someone else with privileged access to the system (via OAM, or ad-hoc database queries). Anyone with access to EBS Diagnostics, however, can run the RDA report (see #1 above) for an EBS product and answer all of those questions:


EBS Diagnostics ACT Report Summary section

(Yes, that's a screenshot from 11i, because that's the version I used for my talk. R12 looks much the same, just bluer).

Maybe I'm preaching to the choir here, but I do still find people who either don't know about EBS Diagnostics, or don't see the value of the tool. If you've not done so already, you owe it to yourself to take a look at the Diagnostics product, starting with Note 167000.1: E-Business Suite Diagnostics Installation Guide . Put the tools in your users' hands, and see what happens. Sure, some of them might still just throw things over the wall to you, but others will undoubtedly surprise you.

Presentation slides: Using Oracle EBS Diagnostics

I somehow got roped into giving a presentation at the BC Oracle Applications User Group meeting on May 26, on the topic of Using Oracle E-Business Suite Diagnostic Tools. Moment of weakness, I guess, but with less than 12 hours to go, it'd be very bad form to bail now. ;-)

The talk itself is mostly a demo, and the slides all by themselves don't convey the full impact of me talking too fast, talking too much, screwing up the timing on my jokes, involuntarily edging toward the door, and muttering prayers to whatever god is tuned in that my demo VM doesn't misbehave too badly. Nevertheless, there are a few references linked in the slides (whose brilliant idea was that?), and I'll be promising my audience that they can visit my blog to find the slides.

Thanks to the members of the BCOAUG for the hospitality! Hope I didn't bore too many of you to death. :-) You can find the slides on Slideshare.

Why merge EBS patches?

We already know this stuff, right?

“Everyone knows” that merging E-Business Suite patches is a good idea. It should be the number one, first-pass answer to the question of how to speed up patching, before staged APPL_TOP, distributed AD, database tuning, or even adding more workers. As part of my continuing series, “Captain Obvious over-explains the basics,” I've provided an anecdote to illustrate why merging patches is a good idea.

First things first

A few short notes about when you should and should not merge E-Business Suite patches, and a disclaimer:

  • Don't merge AD patches with non-AD patches. AD patches sometimes change the same utilities that you're using to apply patches, and should be addressed separately from other product patches.
  • Don't merge a patch if Oracle says that you shouldn't. It's rare that this will happen, but you still need to study your READMEs carefully.
  • Don't merge patches across codelines (11.5.x, 12.0, 12.1).
  • Other than that, go nuts. Seriously. There's no reason you can't merge, for example, HR, Inventory, and Order Management patches. You'd be surprised how many of the same files and patch actions are delivered with patches for separate, functionally distinct products.
  • Your results may not be as dramatic as what I'm describing in this post. Patch payloads differ, systems differ.

Save time before the work even starts

The larger the set of patches you have to apply, the more potential benefit you'll derive from merging the patches. A good example of a large patching exercise is the application of Extended Support baseline patches for EBS 11.5.10.2. I ended up with 123 patches to apply across multiple products, from rollups to family packs, with all of their small-to-mid-size pre-requisites. That’s a lot of work. So much work, in fact, that running the merged patch in noapply mode took 23 minutes:

[applmgr@testsrv log]$ grep ‘AutoPatch started’ u_baseline_final.log
AutoPatch started at: Thu Mar 03 2011 15:27:45
[applmgr@testsrv log]$ ls -l u_baseline_final.log
-rw-r—r— 1 applmgr oinstall 10390236 Mar 3 15:50 u_baseline_final.log

That's serious "think time" for what amounts to a long sequence of file version checks. But if you if you think that’s bad, watch what happened when I tried to run those patches in noapply mode individually. FWIW, the alias testpatch_this runs adpatch with a defaults file and apply=no, so I could get a timing run free of waits for user intervention.

[applmgr@testsrv baseline_src]$ time for i in `ls`; do cd $i; testpatch_this; cd ..; done
<lots of patch files later…>
real 135m5.485s
user 19m31.403s
sys 5m41.994s

Fewer decisions = less work to do

So, where did the time savings come from? The admrgpch utility does a pretty simple job. It:

  1. Unzips all the patches to be merged
  2. Parses the individual patch driver files and to identify the actions required by the patch
  3. Merges actions from the current patch driver file with those of previously-parsed patches based on file version
  4. Copies the required files to the new merged patch directory
  5. Assembles the necessary patch actions into a new merged patch driver file, based on on the contents of the individual driver files

Step 3 is where we saved nearly 2 hours. Here's a summary of the results of all of the merging and version-checking:

SQL> select count(1) comparisions
  2  from patch_comps
  3  /

COMPARISIONS
------------
      190038

SQL> select merge_action          
  2       , count(1) as file_count
  3    from patch_comps
  4   group by merge_action
  5   order by merge_action 
  6  /

MERGE_ACTI FILE_COUNT
---------- ----------
addnew		96925
addnover	11243
newer		14481
older		15257
same		50056
skipnover	 2076

6 rows selected.

There were just over 190000 potential files in our merged patch, accounting for at least one action each -- sometimes more, since there are forms to be generated, PL/SQL to be compiled, and executables to link after the files are copied into place. Along the way, admrgpch discarded 65000 files with the same or older version, and replaced 14000 with more recent versions. There's some repetition in the 'newer' and 'addnover' categories, but the rough result in this case is that close to 80000 of the possible files in this collection of patches have been discarded before adpatch is even run. No matter how fast your system is, if you hand it a pile of work that's 40% lighter than it could be, you can probably expect to finish a bit sooner.

But wait, there's more! (sort of)

In this case, merging patches has already saved us some significant time just in terms of cutting down the evaluation work that needs to be done at patch time. It's also interesting to look a bit more closely at the individual patch actions that are being culled. After all, if all we're doing is preventing several thousand files from being copied to the OA_HTML directory, well...it's nice, but not exactly compelling. If we can save time in the relinking and forms and reports generation phases, on the other hand, that's more real work that can be avoided. There's a small caveat here, of course: some of this "saved work" might be avoided by adpatch anyway, if the forms/reports to be generated are older than what's already on the system. Still, why relink a bunch of executables 20 times in one patch session, when once will do?

SQL> select patch_action
  2       , repeats
  3       ,  count(1) as file_count
  4       , repeats*count(1) as duplicates
  5        from
  6	   ( select filename
  7		      , patch_action
  8		      , count(1) - 1 as repeats
  9	       from patch_actions
 10	      where patch_action in ('genform','genrep', 'link')
 11	      group by filename, patch_action
 12	   )
 13   group by patch_action, repeats
 14   order by patch_action desc, repeats desc
PATCH_ACTION				    REPEATS FILE_COUNT DUPLICATES
---------------------------------------- ---------- ---------- ----------
link						 25	     2	       50
						 24	     9	      216
						 23	    33	      759
						 22	   313	     6886
						 21	     6	      126
						 19	     1	       19
						 15	     1	       15
						 13	     3	       39
						 12	     1	       12
						 11	     2	       22
						  4	     2		8
						  3	     8	       24
						  2	     2		4
						  1	     2		2
						  0	     4		0
****************************************		       ----------
Total Dups							     8182

genrep						 18	     1	       18
						  8	     1		8
						  4	     1		4
						  3	     3		9
						  2	    19	       38
						  1	   421	      421
						  0	   379		0
****************************************		       ----------
Total Dups							      498

genform 					 21	    13	      273
						 20	    12	      240
						 19	    53	     1007
						 17	     4	       68
						 15	     6	       90
						 14	     1	       14
						 10	   119	     1190
						  9	     1		9
						  7	     1		7
						  6	     5	       30
						  5	    21	      105
						  4	   107	      428
						  3	    77	      231
						  2	   394	      788
						  1	   742	      742
						  0	  1245		0
****************************************		       ----------
Total Dups							     5222

Installing EBS 11.5.10.2 Vision? Quick list of useful links.

My EBS 11i Vision VM bit the dust recently, and I found myself needing to reinstall. I know, yes, backups. I know, okay? I KNOW. I know. *whimper*

Anyhow, time to party like it's mid-2005! This isn't going to be as super-long as my R12 Vision install series (which is now an eBook that you should totally buy). Instead, it's a quick list of links to help you get started on something you'll hopefully only need to do once, if ever. At least I now have public notes in case, heaven forfend, I need to do this again myself. Please note that these links are biased toward an installation on Oracle Enterprise Linux 5, because that's the 32-bit Linux media I had closest to hand. There are a few quirks when installing on OEL5, but thankfully they're all well-documented.

Start with the basics

Installing Oracle Applications: A Guide to Using Rapid Install Release 11i (11.5.10.2)
Just in case you need a refresher on system resource requirements and various other setup bits.
Note 316803.1: Oracle Applications Release Notes, Release 11i (11.5.10.2)
Not a lot of meat here, but it does provide a reference to the most recent Rapid Install/"startCD" patch, in the unlikely event that it changes. Don't hold your breath; the product's in Extended Support, and the release notes document hasn't been updated in 2 years.
Note 316806.1: Oracle Applications Installation Update Notes, Release 11i (11.5.10.2)
Here's the really useful stuff, listing required kernel versions, OS packages, and special instructions for various OS releases. The information in this note gets you 95% of what's needed to do the install. Make sure to apply patches 6365595 and 6078836, to avoid errors on "afmkinit.sh INSTE8_SETUP 127" and "libdb.so.3: cannot open shared object file," respectively.
Most recent Rapid Install patch
Why go digging through the release notes for the patch number?
Note 316843.1: MD5 Checksums for 11i10.2 Rapid Install Media
Just in case you think your old disk-based staging area may be suspect, and you're wondering if you need to go back to the install DVDs. You know, theoretically.

OEL5-specific stuff

Setup for Oracle's public YUM server
You might need this setup to download additional packages, if you don't want to search your install media.
Oracle Open-source "compatibility" project
This is where to grab the additional required RPMs listed in note 316806.1. If you're wondering where to find binutils 2.15, it's in compat-binutils215-2.15.92.0.2-24.i386.rpm.
Note 730444.1: Oracle Applications 11i Installation on OEL5 or RHEL5
This note provides solutions to a pair of quirks related to installation on OEL5.
Note 451994.1: Unable to Find 'kshell' in Path When Running adcfgclone.pl
Rather than chasing down and installing pdksh, as suggested in Note 730444.1, I found that a simple export KSH_VERSION='@(#)PD KSH v5.2.14 99/07/13.2', as suggested in this note, to be sufficient to satisfy the installer's requirements.
Note 747424.1: Installation of 11.5.10.2 On OEL Fails
To avoid an interruption in the installation process, use the steps described in this note (replacing LD_ASSUME_KERNEL in adgetlnxver.sh) before running the zip commands described in the pre-installation tasks for patch 6365595 in Note 316806.1. Depending on your version of OEL, this may not be necessary, but why risk it?

Thanks for joining me on this trip down memory lane. Please return your Wayback Machines, Deloreans, and telephone booths to their assigned places. Be excellent to each other.

EBS R12 Vision Install Guide is now an eBook (for Kate)

Chet and Jake have both already covered this topic, but please allow me the small conceit that someone out there might read about it here first. :) About a year ago, I wrote a series of guest posts for the ORACLENERD blog about installing a Vision instance of Oracle E-Business Suite Release 12. It proved to be inexplicably popular, and even spawned something Chet called the EBS Challenge, so Chet and I have bundled the whole series together into a PDF document that flows a bit better than a handful of posts spread over two blogs. Same quirky style, same nigh-endless stack of highly informative screenshots, but way less clicking, and you can take it with you!

Donationware

I was initially resistant to asking people to pay for content that I'd already made available for free, even considering the value-add of content restructuring and packaging. When we considered a charitable angle, though, I was all in. If you're a regular reader of Chet's blog, you'll know that one of the primary passions in his life is his daughter, Kate, who has been diagnosed with PDD-NOS. All proceeds from sales of the eBook, after Paypal takes their cut, will be directed to defraying the costs of Kate's care.

If you're interested in building your very own R12 E-Business Suite playground, I hope you'll consider picking up a copy of the PDF version of the guide. If you're one of the thousands (that's still a bit mind-boggling) of people who have already read the guide on Chet's blog, and if you derived some value from the experience, I hope you'll consider buying a copy to show your appreciation. Heck, buy a copy for everyone in your office! Think of it as an opportunity to make a direct contribution to making life easier for Kate and her parents. You even get a chance to "try before you buy;" all the content in the eBook is already available for you to review on our blogs. I probably left in the same embarrassing typos and bad grammar. You can't lose, and if you buy, Kate wins.

Finding debug and trace profiles in EBS

It happens all too often in E-Business Suite systems, especially in test and development. Someone sets a profile option to gather trace information for an SR, or to debug a pesky series of forms. They follow the instructions in the relevant My Oracle Support note to the letter, but in the ensuing rush to ship the trace info to the support analyst or to dive into the debug files, they forget to disable the profile option that produces all of that lovely verbose output.

Time passes. Then the email starts to come in.

From your systems administrators:

"Hey, any idea why /opt is at 95% on the dev database server?"

"Dude, why are there several thousand .dbg files in /var/tmp?"

From that one extra-special user who remembers that you like detail in your problem statements, and whose only noticeable flaw is a reluctance to be cloned:

"I've noticed that when I try to do anything at all in Order Management on the test system, it's really dragging. Other parts of the system seem fine, though. Could you check if anything weird is happening with the three forms shown in the attached screenshots?"

From the Service Desk:

"We have lots of people complaining that E-Business Suite is slow. Have you guys changed anything?"

After a long afternoon of looking around, you learn that:

  • /var/tmp is full of debug files from a concurrent request that runs every 10 minutes
  • /opt is filling up because someone (hey, don't snicker, maybe it was you. We've all been there) decided to set the "Initialization SQL Statement - Custom" profile for your most active user to perform a 10046 database event trace. At level 12. Without restricting by application or responsibility.
  • Thankfully, Order Management debug settings mostly affect only the Order Management module. Unfortunately, when they're not also restricted by user, everyone suffers.
  • Most of the other users complaining about slow E-Business Suite performance aren't even using the instance where the above problems arose. They're just struggling with old desktop machines, trying to keep 20 Forms windows open at once, or failing to resist adding one more really cute widget to their browser toolbar. Hey, not every E-Business Suite performance problem starts on the server side. ;)

All is well again, but it took forever to track down all of those different settings. Wouldn't it be nice if there was a quick way to find debug-related system profile options that had been set recently (or not-so-recently)? You could use a script like this, but that might be a bit too noisy if there are frequent profile option changes.

Here's a script that I use to try to capture as many of the debug and trace profile settings as possible. It's based on the query that I just linked, but filters on some common strings so you're more likely to get results that are relevant to identifying errant debug settings. It also attempts to ignore options that have been set as defaults at install or patch time. I'm not too proud of the multiple scalar subqueries inside the decode() statement. Clearly I stopped somewhere between "okay, it works" and "let me polish this 'til it gleams." :)

Oh, and I'm toying with the idea of dropping my various scripts and snippets into Teh Cloud™ at GitHub. I don't claim to be organized enough to rate a repository, but I think the idea of gists is pretty nifty.

-- check_ebs_trace_profs.sql
--  Author: John Piwowar
-- Purpose: Identify E-Business Suite system profile option settings that may
--          be related to performance-degrading debug/trace activity
--   Notes: Prompts for a cutoff date for when profile options were set
--          May need additional tweaking for multi-language installations 
set pagesize 9999
set linesize 120
set verify off
col "Profile Option" for a25
col "Option Level" for a13
col "set for" for a20
col "Value" for a20
col "Set On" for a11
col "Blame" for a20
PROMPT Enter date value in form DD-MON-YYYY for check_since
   select tl.user_profile_option_name "Profile Option"
        , decode( val.level_id
                , 10001, 'Site'
                , 10002, 'Application'
                , 10003, 'Responsibility'
                , 10004, 'User'
                , 10005, 'Server'
                , 10006, 'Organization'
                , 10007, 'Server+Resp'
                , 'No idea, boss') "Option Level"
        , decode( val.level_id
                , 10001
                , 'EVERYWHERE!'
                , 10002
                , (select application_name
                     from fnd_application_tl
                    where application_id = val.level_value)
                , 10003
                , (select responsibility_name
                     from fnd_responsibility_tl
                    where responsibility_id = val.level_value
                      and application_id = val.level_value_application_id)
                , 10004
                , (select user_name
                     from fnd_user
                    where user_id = val.level_value)
                , 10005
                , (select host || '.' || domain
                     from fnd_nodes
                    where node_id = val.level_value)
                , 10006
                , (select name
                     from hr_all_organization_units
                    where organization_id = val.level_value)
                , 10007
                , 'Look it up' --per specification El-Ay-Zed-why
                , '''Tis a mystery') "Set for"
        , val.profile_option_value "Value"
        , val.last_update_date "Set on"
        , usr.user_name "Set By"
     from fnd_profile_options opt,
          fnd_profile_option_values val,
          fnd_profile_options_tl tl,
          fnd_user usr
    where opt.profile_option_id = val.profile_option_id
      and opt.profile_option_name = tl.profile_option_name
      and regexp_like( tl.user_profile_option_name
                     , '(trace|log|debug|audit|diag|sql)'
                     , 'i'
                     )
      and not(regexp_like( tl.user_profile_option_name
                         , '(catalog|file|login|utilities)'
                         , 'i'
                         )
             )
      and usr.user_id = val.last_updated_by
      and usr.user_name  not in ( 'AUTOINSTALL'
                                , 'INITIAL SETUP'
                                , 'ANONYMOUS')
      and val.last_update_date > '&check_since'
    order by val.last_update_date desc
;

Grid Control follies: Looking for perl? Oh, it’s over there, in the perlBin.

Long time since I talked about Grid Control, but this one was weird enough to share. Hostnames and installation paths obfuscated as usual.

After a recent reconfiguration exercise for an Oracle 11g Grid Control agent, all targets monitored by the agent were showing "unknown" status, agent metrics were unavailable, and the performance page host target itself was throwing strange (to me, anyway) "No such metric, Switching to last 24 hours view" errors.

A quick look at the emagent.trc file in $ORACLE_HOME/sysman/log showed a lot of errors of the following form, for every monitored target on the system:

2010-12-03 21:50:35,716 Thread-3027233680 ERROR engine: [oracle_emd,server.domain:1830,ProcessInfo] :
 nmeegd_GetMetricData failed : Missing Properties : [perlBin]

perlBin? That's a new one for me. None of the recent work on the agent configuration had touched the perl installation, and lightweight command-line testing of the perl installed in the ORACLE_HOME didn't reveal any problems. That meant it was time to start digging in $ORACLE_HOME/sysman/config, where there are lots of files with .properties extensions. For once, the most obvious place to look turned out to be the right place.

The culprit? Oh, just one little character in the emd.properties file:

/perlBin=ORACLE_AGENT_HOME/perl/bin

Change /perlBin to perlBin, restart the agent, and all of a sudden everything looks way more sane. Whew. How that wayward / got to be there in the first place is a mystery that can wait for another day. Probably my fault; it often is. :)

“No manager available?” Say what, EBS?

Here's a quick note for those of you running Oracle E-Business Suite 11.5.10.2 in a non-PCP RAC environment. It may be an edge case, but perhaps it'll help someone else out there.

I'd received a report from a member of our development team that certain transaction managers in our E-Business Suite 11.5.10.2 test system were "broken"...but only intermittently. Don't you just love that? Actions in the Forms interface that would submit a request to a transaction manager as 'immediate' requests would return the message, "No concurrent manager is defined to process this request." This problem could persist for most of a day, only affect one developer, disappear for several days, then return and plague the entire team for a week. During that time, other concurrent managers were functioning as expected, and there weren't any obvious performance problems causing concurrent managers to become unresponsive. When tracing a Forms session shown to trigger the error, we saw the following query appear in the trace file right before the "No manager defined" error was thrown:

SELECT /*+ ORDERED USE_NL (fa fcp fr fcpp fcq) INDEX (fcq,FND_CONCURRENT_QUEUES_N1)
INDEX (fcpp,FND_CONC_PROCESSOR_PROGRAMS_U2) */ 
FCQ.CONCURRENT_QUEUE_ID 
FROM FND_APPLICATION FA, FND_CONCURRENT_PROGRAMS FCP, FND_CONC_PROCESSOR_PROGRAMS FCPP, FND_RESPONSIBILITY FR, FND_CONCURRENT_QUEUES FCQ, FND_CONCURRENT_PROCESSES FCPR 
WHERE FCQ.PROCESSOR_APPLICATION_ID = FCPP.PROCESSOR_APPLICATION_ID 
AND FCQ.CONCURRENT_PROCESSOR_ID = FCPP.CONCURRENT_PROCESSOR_ID 
AND FCPP.CONCURRENT_PROGRAM_ID = FCP.CONCURRENT_PROGRAM_ID 
AND FCPP.PROGRAM_APPLICATION_ID = FCP.APPLICATION_ID 
AND FCP.APPLICATION_ID = FA.APPLICATION_ID 
AND FA.APPLICATION_SHORT_NAME = 'xx'
AND FCP.CONCURRENT_PROGRAM_NAME = 'yyyyy' 
AND FR.RESPONSIBILITY_ID = nnnnn 
AND FR.APPLICATION_ID = nnn 
AND FR.DATA_GROUP_ID = FCQ.DATA_GROUP_ID 
AND FCQ.MANAGER_TYPE = '3' 
AND FCPR.CONCURRENT_QUEUE_ID = FCQ.CONCURRENT_QUEUE_ID
AND FCPR.QUEUE_APPLICATION_ID = FCQ.APPLICATION_ID 
AND FCPR.PROCESS_STATUS_CODE = 'A' 
AND FCPR.INSTANCE_NUMBER = USERENV('instance') ORDER BY DBMS_RANDOM.RANDOM

This query would return zero rows, and cause EBS to report "No manager available," even though we'd verified before running the test that the required transaction manager was running and successfully processing requests from other sources.

But wait. See that bit right there, where it checks USERENV('instance')? Yeah, it made me think, too.

So what was going on?

The previous query shows that the code submitting the concurrent request was not just checking for a running transaction manager, but for a manager that was running on the same database instance as the user's current session. Since we were running in a RAC environment, it was entirely possible for the transaction manager process and the user's Forms session to be connected to different database nodes. If the CM and Forms sessions were both connected to RAC node 1, or both to RAC node 2, everything worked fine. But if the CM process were running on node 1 and the Forms session connected to node 2, or vice versa, things blew up.

Why did the transaction managers care which RAC node we were using when we submitted a request? The "old way" of running EBS transaction managers in a RAC environment required the use of DBMS_PIPE, which meant that there needed to be a transaction manager assigned to each RAC node. This is the default behavior for EBS 11i. In Release 12, the default is to use Advanced Queueing (AQ) for transaction managers, so the need for a 1:1 mapping between managers and RAC nodes is eliminated. Up-to-date versions of the ATG Family Pack also allow the use of queues for transaction managers in 11i. This behavior is controlled by setting the EBS system profile option "Concurrent: TM Transport Type," which takes the value PIPE or QUEUE.

This is not new information. The instructions for setting the "Concurrent: TM Transport Type" profile option in an EBS RAC environment are very clearly stated...but only in the context of configuring Parallel Concurrent Processing (PCP). Since we were running on a single applications tier at the time, it was decided not to configure PCP until we had deployed a second apps node. We therefore did not address the TM Transport Type profile option. Once we changed the option from PIPE to QUEUE in our single-app-node, 2-db-node environment, the problems reported by the developers were resolved.

So there you go. "Concurrent: TM Transport Type=QUEUE" -- it's not just for PCP anymore. Hey, that would've made a catchy post title!

References

Note 458453.1: What is the difference between PIPE and QUEUE for profile Concurrent:TM Transport Type ? (pay particular attention to the required init.ora adjustments; they vary depending on RDBMS release)
Note 240818.1: Concurrent Processing: Transaction Manager Setup and Configuration Requirement in an 11i RAC Environment
Note 1057802.1: Best Practices for Performance for Concurrent Managers in E-Business Suite
Note 823586.1: Using Oracle 11g Release 2 Real Application Clusters with Oracle E-Business Suite Release 11i (or whichever note is appropriate for your database release)