Quantcast
Channel: Database Administration Tips
Viewing all articles
Browse latest Browse all 214

19c Clusterware fail to Startup due to CRS-41053: checking Oracle Grid Infrastructure for file permission issues

$
0
0
I had a 19c cluster node crashed and the clusterware failed to startup due to this error:

[root@fzppon05vs1n ~]# crsctl start crs
CRS-41053: checking Oracle Grid Infrastructure for file permission issues
PRVG-11960 : Set user ID bit is not set for file "/u01/grid/12.2.0.3/bin/extjob" on node "fzppon05vs1n".
PRVG-2031 : Owner of file "/u01/grid/12.2.0.3/bin/extjob" did not match the expected value on node "fzppon05vs1n". [Expected = "root(0)" ; Found = "oracle(54321)"]
CRS-4124: Oracle High Availability Services startup failed.
CRS-4000: Command Start failed, or completed with errors.


That's weird! Because the file mentioned in the error message already has the right ownership; which is supposed to be owned by Grid owner --which is Oracle in my setup, and it shouldn't be owned by root as advised by the error message:

[root@fzppon05vs1n ~]# ll /u01/grid/12.2.0.3/bin/extjob
-rwxr-xr-x 1 oracle oinstall 2.9M Mar  4 11:42 /u01/grid/12.2.0.3/bin/extjob


The same permissions and ownership on the other RAC node as well:

[oracle@fzppon06vs1n ~]$ ls -l /u01/grid/12.2.0.3/bin/extjob
-rwxr-xr-x 1 oracle oinstall 2.9M Mar  4 12:57 /u01/grid/12.2.0.3/bin/extjob


I've tried to stop the clusterware on this node with force option and start it back, but this didn't help.

Before trying to restart the OS, just thought to check the clusterware background processes, and here is the catch:

[root@fzppon05vs1n ~]# ps -ef | grep -v grep| grep '\.bin'
root     19786     1  1 06:18 ?        00:00:39 /u01/grid/12.2.0.3/bin/ohasd.bin reboot

root     19788     1  0 06:18 ?        00:00:00 /u01/grid/12.2.0.3/bin/ohasd.bin reboot
root     19850     1  0 06:18 ?        00:00:13 /u01/grid/12.2.0.3/bin/orarootagent.bin
root     19958     1  0 06:18 ?        00:00:14 /u01/grid/12.2.0.3/bin/oraagent.bin

...

Found lots of ohasd.bin are running, while it supposed to be only one ohasd.bin process

Checking all ohasd related processes:

[root@fzppon05vs1n ~]# ps -ef | grep -v grep | grep ohasd
root      1900     1  0 06:17 ?        00:00:00 /bin/sh /etc/init.d/init.ohasd run>/dev/null 2>&1 </dev/null
root      1947  1900  0 06:17 ?     00:00:00 /bin/sh /etc/init.d/init.ohasd run>/dev/null 2>&1 </dev/null
root      19786     1  1 06:18 ?        00:00:00 /u01/grid/12.2.0.3/bin/ohasd.bin reboot
root      19788     1  0 06:18 ?        00:00:00 /u01/grid/12.2.0.3/bin/ohasd.bin reboot


Now, let's kill all ohasd processes and give it a try:

[root@fzppon05vs1n ~]# kill -91900  1947 19786 19788           

Starting back the clusterware:

[root@fzppon05vs1n ~]# crsctl start crs
CRS-4123: Oracle High Availability Services has been started.


Voilà! Started up.

Conclusion:

Above error message may look vague... I know. Moreover, it may mention a different file in the error message rather than extjob.
Don't rush and change the file's ownership as advised by the error message, first check for any redundant clusterware background processes and kill it, then try to startup the clusterware. If this didn't help; restart the node and check again for any redundant processes.


Viewing all articles
Browse latest Browse all 214

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>