Saturday, December 11, 2010

EBS Internal Concurrent Manager (ICM) starts all concurrent managers every minute

It was Friday morning at about 10:00am, we need to bounce the integration instance INT database tier to disable the archive log mode and apply a small patch in GL area. It was slow to shutdown and startup database. For the application patch, in the DEV, it only needs about 6 minute. However, in INT, it took about an hour.

The slowness of database caused a big problem to startup application tier. There are timeout when I tried to startup all nodes (two web/forms nodes and two concurrent manager and admin server nodes). Furthermore, the load average of database increased rapidly to about 500 and the server is no longer responding. We have to reboot the server.

After some digging, I also found that there are many processes on concurrent manager tier. The internal concurrent manager (ICM) log shows that ICM was starting all concurrent managers every minute. As there are a few concurrent processes configured for each manager, starting the concurrent manager will start all the concurrent processes. Each concurrent process has a corresponding database process. Thus, database server got overloaded and could not respond to new client request.

The slowness of database was determined to be a hugepage setting on Linux. The initial setting is for 8GB SGA. However, the INT database’s SGA got changed to 16GB. After resetting SGA back to 8GB, INT is back to normal.

However, ICM starts all concurrent managers when the database is 10 times slower than normal. This exposed a design fault for ICM. It appears that ICM is checking fnd_current_queues.target_processes and running_processes. If these two columns match, then ICM will go to sleep. Otherwise, start concurrent managers again. It appears the ICM is not checking timeout, Oracle errors etc.

A service request has been opened for Oracle to give us a patch to fix the ICM restarting concurrent managers very minute.

No comments:

Post a Comment