Tuesday, March 4, 2014

How to resume 12c manual upgrade

Few weeks ago, I was attended "12c Upgrade Workshop". In 12c, there is a new script, catctl.pl.

The catupgrd.sql Upgrade utility is replaced with the catctl.pl utility. The new utility allows you to take advantage of CPUs by running the upgrade process in parallel thereby reducing the upgrade time.
In my test upgrade, I was have a some problem, and upgrade process was break:

So, when I fixed the problem, I try to restart upgrade again with -p parameter: (doc + some wrong examples from net)

Option “-n” specifies the degree of parallelism, the default is 4 and maximum is 8.
Option “-p” supports rerun of the upgrade by skipping successful steps.
-p Restarts the specified phase (skip successful phases on a rerun).

But without success.

Important: In $ORACLE_HOME/diagnostics directory, there is log files about upgrade, DO NOT DELETE THEM, because, than upgrade can not restart  !!
Number of files depends of parallelism.

You get this error:

When we look this file ( catctl.pl), and find usage on start of file:

Usage: catctl [-u username] [-n processes] [-d directory]
                [-t table] [-l directory] [-s script]
                [-e] [-i] [-c] filename

  Supported Options:

     -u username (prompts for password)
     -n the number of processes to use for parallel operations (default 4)
     -d directory containing the files to be run
     -t table name for phase logging
     -l directory to use for spool log files
     -s SQL script to initialize sessions
     -e sets echo off while running the scripts
     -p restart phase (skip successful phases on a rerun)
     -i identifier to use when creating spool log files
     -y display phases only

There is not -p in usage, and in options there is. But not working.

Investigating further in file, we find part of code:

Then, we figured out that it must be Phase Number from which we want to restart, Oracle is not so clever to find that in log files!!! :-)

And than finally:

is what we looking!!! Rerun succeded..

Oracle doc is not very explicit about that feature, and there is no anywhere right example with rerun/resume upgrade on 12c

Thursday, April 4, 2013

Exadata BCU script does not see/list pool from Sun ZFS Storage Appliance

A few weeks ago, I had to configure RMAN backup on Exadata, with Sun ZFS Storage Appliance.
All configuration can be made, off course, manually, ( here is some good point to start: Sun ZFS Backup Appliance Installation and Configuration Guide), but Oracle put available utility, that is a great help in such configuration.
Utility can be found as Oracle ZFS Storage Appliance Plugin Downloads (Oracle Exadata Backup Configuration Utility v1.1.2):  Oracle Exadata Backup Configuration Utility 

So, when you unpack utility, that is a bunch of scripts, and start main exadata_bcu script, you provide some answers (IP, name, shares,..) and script should connect to  Sun ZFS Storage Appliance, get pools, that you can choose pool for RMAN backup.

But, in our case, script does not see pool in ZFS Storage Appliance:
We investigate all scripts in directory, find that there is script with name list_pools.sh ! When we start, it is obvious that from exadata we can see the pool on ZFS Storage:
OK, so problem is somewhere in main or other script, after getting pools from ZFS Storage.

After some digging, we found that in the s7k_config.py, there is (152.) line:
So, script expecting, that owner of the pool is the same as nodename, but, when we start list_pools.sh, we get result without owner.
We investigate all menus and choices in ZFS Storage web and ssh commands, but nowhere is possibility to set owner.
Than, we commented that line and add this:
and then, everything works perfect further :
and configure RMAN goes as expected.

We think, that the reason, for script expecting owner, is, that Sun ZFS Storage Appliance usually comes in cluster, and than owner is first or second Appliance in cluster, but, in our case, there is only one Appliance, and there is no owner in that case.

I can't found anything about this problem on Metalink, forums, blogs etc...so decide to write post about that, maybe this will help someone. Anyway, above is how you can solve the problem if you ever have same situation! :-)

Wednesday, September 19, 2012

STANDBY did not release space in FRA, despite large reclaimable space

At customer site, we have following configuration: 3-node ASM RAC and 1-node ASM standby, both with FRA, also in ASM, version.

Few days ago, we get error, that archiving on standby was stopped.

In alert.log on primary site, on log_archive_dest_2 (to standby) was error:
ORA-07286: sksagdi: cannot obtain device information
In alert.log, on standby site:

Did you see that ?  0.00% used , cca 200GB available. FRA is 250GB size.

I did not want to delete archivelogs, on this site there is not tape to backup archivelogs with RMAN to tape, or disk (there is no space). After I check ASM group that is used for FRA, I saw that there is space to expand FRA:

 SO, to quick resolve error, I increase FRA:

and immediatelly after that, archivelogs start comming and applying.

But, it is bothered me, that, with archivelog delete policy set "TO APPLIED ON ALL STANDBY;", and RMAN regular backups on primary, archivelogs must be deleted and set to reclaimable space immediately after they applied on standby db.

When I query v$recovery_file_dest (before increasing):

Well, Oracle should be now release reclaimable space and start use this new free space . But, it is not.

Here are, how you can check is archivelogs became reclaimable:

The following query will list all the archivelogs included in v$flash_recovery_area_usage, those with rectype of 11 are the ones that are reclaimable (I reduced to last 3 with rownum, to compare with max values in next 2 queries):

which archivelogs are applied:

which archivelogs are tranfered to standby side:   
This tells me that space should be free, because, obviously, archivelogs was marked as reclaimable soon as they applied on standby (max values from last queries is the same) and I opened SR on Metalink, and try investigate further...

In the note Archiver Hung On Ora-15041, FRA not Cleaning Files Automatically. [ID 316281.1], I found the same simptoms as I have, but, that case is if you have  log_archive_dest_1 parameter set directly to some mount point or  ASM group, files will get generated but will not be considered by FRA algorithm for cleaning up when space pressure mounts up.
You must set log_archive_dest_1='location=use_db_recovery_file_dest', then Oracle will check check for files whose space is eligible for reclaiming.

But, I have this already, and in the Note, you have a little guery to verify this:

All 3 locations is YES on IS_RECOVERY_DEST_FILE column, v$archived_log.IS_RECOVERY_DEST_FILE=YES, so it should be automatically  managed.

After a few reviews and uploads with Oracle support, we find that this is BUG !!! :-)

Yes, and this is bug from version :-)  (Bug 14227959: STANDBY DID NOT RELEASE SPACE IN FRA)

So, here are the workarround:

First, you can set this parameter, to provide more detailed information about the space pressure:
and solution (for me that worked):
after that, I decreased FRA from 450 GB to 370 (used files was about 360GB), to check how Oracle will free the space.

Than, I watch alert.log 2-3 days, and space in FRA, and now, it is always something like that:

And, I asked for recomended value, how often I must execute this refresh function for aged files in FRA, and answer was to execute dbms_backup_restore.refreshAgedFiles on standby at regular intervals (once in 1 hour).

At the end, here are all files that I reference for this error:

FRA: Flashback logs are not being deleted when space is needed for archive logs [ID 1481739.1]
V$Flash_recovery_area_usage is not being updated [ID 1471471.1]
Space issue in Flash Recovery Area( FRA ) [ID 829755.1]
Database Crashed With ORA-19815 ORA-19809 ORA-16038 [ID 829254.1]
How is the space pressure managed in the Flash Recovery Area - An Example. [ID 315098.1]

Hope this helps someone.

Tuesday, May 22, 2012

OEM does not send alert notification mails! Did you maybe change SYSMAN password ? :-)

A few days ago, I got a new  db to maintain and administer. After first look, I started EM console, and ask customer for SMTP parameters. 
Of course, there is no SYS and SYSMAN password :-). After consultation with customer, we decide to change SYSMAN password, so I can configure mail alerting from OEM. 
I successfully reset SYSMAN password, and create one account with sysdba privilege, because we don't have SYS password:

 Ok, now is all ready for EM console.. I set SMTP server, mail account, rules, notification, schedule,.. Test mail for account and mail server is coming. But, when I schedule alerting and lower some thresholds for metrics (CPU, tablespace used,..)(don't want to wait some real alert :-) ) to get first alert mail, nothing happened!!!

Hmmm... Double check everything again..nothing...
Than, I found something strange: SYSMAN account is getting locked!
So, I unlock, but, after few seconds, again.. Aha, someone trying to use SYSMAN with old pass, obviosly.
OK, I solved this temporarily with creating another profile with failed_login_attempts = unlimited and alter SYSMAN with that new profile. (SYSMAN was in DEFAULT profile with failed_login_attempts = 10)

 But, still, mail alerting does not working!

Quick check for jobs parameter and do I have some alerts at all ? Quick searching on forums, net , metalink.. nothing useful a lot..
Than, I try one "trick"...I start to create SR on metalink, describe my problem in details with keywords, and on step 3 or 4, you get advises to look some Notes, based on problem/keywords you enter in describing your issue. 4 Notes is pop in my case, 3 of them was some usual general Note, but one does not ! :-)
One was: How To Change the Password of the Database User Sysman (DB Control Repository Schema) [ID 259379.1]

Waaaiit ?!!!

 I WAS change pass for SYSMAN and write SR about mail alerting !! Of course, after reading a Note, I found that, when you change SYSMAN pass, you need to stop/reset EM console !! Because my DB is, it was enough to : (before you must manually change emoms.properties file) After start EM console, my mailbox get 25 messages immediately ! :-) And, finally, I change SYSMAN user again to DEFAULT profile, and everything was ok, without locking!

What I learned today:
 1) Always backup old password hash when you change user password :-(
2) Use Metalink, it is there for us :-)

Thursday, February 9, 2012

RAC crash and ora-27504 error

A few days ago, at customer site, RAC 10g goes down. Rac has 2 nodes, rac1, rac2 (asm).
Happened that on rac2 node cpu fan stop working, all local disks (2 in raid1) was broken, so I manage to successfully connect to rac1.

On rac1, CRS was healthy, but, ASM and DB instance was down ??

After a little investigating, find in ASM alert log error something about "instance stopped ** ORA-27504"!
I tried to manually start ASM, but get this:

After reading Metalink and forums, find that it could be something with interconnect.
This was helpfully:  RAC Single Instance (ASM) startup fails with ORA-27300/ORA-27301/ORA-27302/ORA-27303 [ID 331934.1]
From Note:
"This is caused by the private NIC is not connected to a switch. The NIC driver detects that there is no cable connected and, as oracle tests the state of the NIC, the ASM instance startup fails with above error. Similarly any RAC database single instance would also fails with the same error stack."

So, then we focus on network and system logs, and find out that 2 days before was some switching between interconnect network interface besause failure.
After calling customer to check, they report to us that switch is also go away ! :-)
Ok, while customer was in finding a new switch, I wanted to try start that ASM instance, like the error said: set _disable_interface_checking = TRUE
After change init+ASM1.ora, and startup command, I get error about wrong init.ora!
Fortunately, switch was replaced with new one in short time, so, I start ASM instance after that without problems. DB instance also started after that, and everybody was happy.

But, is bugging me that can I somehow use that parameter and in this case to start ASM without looking to interface!
So, after lot off Google and reading, I opened SR on Metalink and find out this:

1) You can not modify it manually and put it in init.ora.
2) You should first fix the network interface/switch issues then restart the db and asm then issue the below command for future purpose.

3) Run it for both asm and database instance

And, in the future, you should not get this particular error.

Every day we learn something, right ? :-)

Friday, May 6, 2011

ORAPWD file- recreate, adding new entries, limits, ORA-01996,...

Last few days, i learned a lot about Oracle password file! :-)

What I wanted, is, to increase value of the ENTRIES parameter, from default to some greater value, to have possibility to add a few more users with sysdba privilege. So, before I start to do that exhibition on production databases,  I wanted to be sure that the whole process has been tested!
First, I was not sure, should I shutdown db after change orapw file or not ? Well, with a little help MOS and test DB, the anwer is, NOT !! More precisely, it is necessary to restart the DB only after you created the password for the first time.
Ok, first problem was solved, I already have created orapw files, so I just copied old original files to have backup if something goes wrong.

Then, I changed values 'entries', to test adding some new users.
First, you can see, how many and what users are in the password file:

So, for test, I added some users, and test increasing value of the ENTRIES parameter:

On entries 2, I can put 5 users with sysdba grants, entries 5 --> 5 users, entries 6 --> 9 users, entries 7 --> 9 users.. ??
Well, that was strange!
But, answer is in oracle doc :-)

Proof that a different value of the ENTRIES parameter is ok

"The actual number of allowable entries can be higher than the number of users, because the ORAPWD utility continues to assign password entries until an operating system block is filled. For example, if your operating system block size is 512 bytes, it holds four password entries. The number of password entries allocated is always a multiple of four."

Finally, I wondered whether you can see the current value of the parameter ENTRIES ??
Well, NO! You cannot see the value of the ENTRIES parameter in a view or somewhere else.(v$pwfile_users is only for number of users)
And another important thing at the end:
You can recreate the password file with a larger values for ENTRIES and will lose all the current SYSDBA users!!! You only set password for SYS user, and new users need to be granted from SYS:

Now is really the end!   :-)

Wednesday, March 30, 2011

How to execute sql script against multiple databases?

A friend of mine, ask me that question few days ago... At first moment, I give him some bash script that I have, for connecting to multiple hosts through ssh and than executing some commands, and tell him that he can easily change to connect with sqlplus with some list of connections and than run the sql script.
But, after some time, I tried to find some better way to do this...
After some reading, I found, and off course,  successfully tried this two ways of doing this:

1) EM grid control

In a book "Oracle Enterprise grid control 10g handbook" , in chapter 14 "Database Management in Grid Control", Page 447, we have workshop "run a SQL Script Against Multiple databases", where, in details, is explained how to achieve what we wanted.

preview: Oracle Enterprise grid control 10g handbook

 2) Toad ScriptManager

I will not copy/paste someone else work :-).. Here is a link and detail explanation:

Toad metod 1

3) Toad Project Manager 

And, my favorite, fastest and most elegant way:

Toad metod 2 

So much for this time!