I had an interesting issue with osuser name externalized in v$session view.
The following query shows that ‘oracle’ is the osuser for backgroud processes:

[ @ SID ]> SELECT DISTINCT osuser FROM v$session WHERE TYPE = 'BACKGROUND' AND osuser IS NOT NULL;
OSUSER
------------------------------
oracle

But when I logged in to the linux box hosting the database I saw something different:

$ ps -eaf | grep smon | grep -v grep

orasid   16674     1  0  2015 ?        00:05:26 ora_smon_SID
oracle   30826     1  0 Jan29 ?        00:12:27 ora_smon_SID11

There are two databases: SID owned by ‘orasid’ os user and SID11 owned by ‘oracle’.
So why is that the above query shows ‘oracle’ instead of ‘orasid’?

I checked also that both ‘oracle’ and ‘orasid’ users belong to ‘dba’ os group and it turns out that when you start the instance from other ‘dba’ account this osuser is populated to x$ksuse table which is the source for v$session view.
So somebody just started SID instance from ‘oracle’ user instead of ‘orasid’ and this leads to such strange output.

Advertisements

How to determine sessions with invalid package states

Tanel Poder gave us solution for Oracle 9i and 10g here and here. However, I didn’t find a script for 11g and 12c so I tried to figure it out on my own.

WITH librarycache_subquery AS (
 select 
    l.kgllkuse    -- (v$session.saddr)
   ,l.kgllksnm    sid      -- (v$session.sid)
   ,l.user_name   username -- user name
   ,l.kgllkest
   ,l.kgllksqlid  sql_id   -- sql_id
   ,l.kglnaobj    sql_text -- sql_text
   ,o.kglobsta
   ,o.kglobtt0
   ,od.kglhddmk   -- heap mask
   ,od.kglnaown   package_owner  -- dependant package owner
   ,od.kglnaobj   package_name   -- dependant package name
 from  x$kgllk l, x$kglob o, x$kgldp d, x$kglob od
 where l.kglnahsh = o.kglnahsh -- and l.kglhdpar = o.kglhdpar and l.kgllkhdl = o.kglhdadr
  and l.kgllkest is not null
  and l.kgllkflg = 1
  and o.kglobtyd = 'CURSOR'
  and bitand(o.KGLHDFLG/power(2,24),4)=4  -- o.KGLHDFLG=335618321
  and d.kglnahsh = o.kglnahsh
  and od.kglnahsh = d.kglrfhsh -- (v$object_dependency)
  and od.kglhdadr = d.kglrfhdl -- (v$object_dependency)
  and od.kglobtyd = 'PACKAGE'
  and od.kglhdflg > 0 
)
select ls.sid, ls.username, ls.sql_id, ls.sql_text, ls.package_owner, ls.package_name
from   librarycache_subquery ls
where  ls.kglobsta=5
  and  ls.kglhddmk=29 -- [hex 001D] 
union all
select ls.sid, ls.username, ls.sql_id, ls.sql_text, ls.package_owner, ls.package_name
from   librarycache_subquery ls
where  ls.kglobsta=1
  and  ls.kglobtt0 > ls.kgllkest
  and  ls.kglhddmk=29 -- [hex 001D] 
union all
select ls.sid, ls.username, ls.sql_id, ls.sql_text, ls.package_owner, ls.package_name
from   librarycache_subquery ls
where  ls.kglobsta=1
  and  ls.kglobtt0 > ls.kgllkest
  and  ls.kglhddmk=8221 -- [hex 201D] 
;

WITH clause just selects ‘candidates’ for the final report, i.e. invalidated cursors [bitand(o.KGLHDFLG/power(2,24),4)=4] for which there is a dependant package in the library cache.

It’s the source data for a final UNION ALL that lists different scenarios.

Heap mask kglhddmk=29 means a dependant package was changed/recompiled and up till now was not executed even once.

kglhddmk=8221 means the package is already successfully executed by at least one session (but other sessions may still get ORA-04068 error).

kglobtt0 lets us know a timestamp when the shared cursor was first executed after it had been invalidated.

kglobsta=5 means the shared cursor is invalidated and yet not reexecuted by any session.

Breaking down the UNION ALL into three parts I list all sessions which would get ORA-04068 when (a) the dependant package was changed/recompiled and no session has executed the broken cursor yet, when (b) there is a session which touched the invalidated shared cursor for the first time and has already got ORA-04068 error but other sessions are still ‘pending’ to get the error (these sessions will be displayed in SELECT), and when (c) the session executed the cursor and package successfully (the second run) but there are still other sessions with broken state (these will be displayed in SELECT).

ORA-04062 error detection

It’s possible to detect ORA-04062 (timestamp of package XYZ has been changed) problem before this error is thrown.

Assuming that REMOTE_DEPENDENCIES_MODE is set to TIMESTAMP (default) it’s enough to compare KGLNATIM column in x$kglob table between a remote and a local (dependant) procedure/package.

KGLNATIM column holds last timestamp change. So whenever the remote package changes that way that its timestamp changes too just compare the remote package specification KGLNATIM value with the local package body KGLNATIM value and if the latter is the older one you may hit ORA-04062.

You will hit this error when a procedure of the local package calls the remote package. But if you call another procedure of the local package that does not call the remote package you won’t get ORA-04062. It’s a runtime error.

A question is: what changes of the remote package lead to its timestamp change? Only package specification change (adding a comment too) changes the timestamp. Changing the package body logic, compiling body or spec does _not_ affect KGLNATIM (timestamp).

INVISIBLE indexes seem visible and 20-index limitation over database link

Recently I got a call regarding slow performance of a distributed query after and an index had been created on a remote table. The curious thing was that the index wasn’t used in the execution plan at all. But its existence changed optimizer behaviour and from an index scan (using other index) the plan went to full table scan. The case was even more interesting when I made new index invisible but the plan was still bad. All went fine only after the new index was completely dropped.

This is how I encountered 20-index limitation on distributed queries. The case is easy to reproduce as provided below.

Let’s create a table with 20 indexes on remote db:

DROP TABLE t_remote;

CREATE TABLE t_remote (
col01 NUMBER,
col02 NUMBER,
col03 VARCHAR2(50),
col04 NUMBER,
col05 NUMBER,
col06 VARCHAR2(50),
col07 NUMBER,
col08 NUMBER,
col09 VARCHAR2(50),
col10 NUMBER,
col11 NUMBER,
col12 VARCHAR2(50),
col13 NUMBER,
col14 NUMBER,
col15 VARCHAR2(50),
col16 NUMBER,
col17 NUMBER,
col18 VARCHAR2(50),
col19 NUMBER,
col20 NUMBER,
col21 VARCHAR2(50)
);

alter table t_remote modify (col01 not null);

INSERT INTO t_remote
SELECT
rownum, rownum, rpad('*',50,'*'),
rownum, rownum, rpad('*',50,'*'),
rownum, rownum, rpad('*',50,'*'),
rownum, rownum, rpad('*',50,'*'),
rownum, rownum, rpad('*',50,'*'),
rownum, rownum, rpad('*',50,'*'),
rownum, rownum, rpad('*',50,'*')
FROM dual
CONNECT BY level <= 10000;

COMMIT;

create unique index t_remote_i01_pk on t_remote (col01);
alter table t_remote add (constraint t_remote_i01_pk primary key (col01) using index t_remote_i01_pk);

create index t_remote_i02 on t_remote (col02);
create index t_remote_i03 on t_remote (col03);
create index t_remote_i04 on t_remote (col04);
create index t_remote_i05 on t_remote (col05);
create index t_remote_i06 on t_remote (col06);
create index t_remote_i07 on t_remote (col07);
create index t_remote_i08 on t_remote (col08);
create index t_remote_i09 on t_remote (col09);
create index t_remote_i10 on t_remote (col10);
create index t_remote_i11 on t_remote (col11);
create index t_remote_i12 on t_remote (col12);
create index t_remote_i13 on t_remote (col13);
create index t_remote_i14 on t_remote (col14);
create index t_remote_i15 on t_remote (col15);
create index t_remote_i16 on t_remote (col16);
create index t_remote_i17 on t_remote (col17);
create index t_remote_i18 on t_remote (col18);
create index t_remote_i19 on t_remote (col19);
create index t_remote_i20 on t_remote (col20);

exec dbms_stats.gather_table_stats(user,'T_REMOTE');

Now, let’s prepare a test table on local database:

drop table t_local;
CREATE TABLE t_local (
col01 NUMBER,
col02 NUMBER,
col03 VARCHAR2(50)
);

INSERT INTO t_local
SELECT
rownum, rownum, rpad('*',50,'*')
FROM dual
CONNECT BY level <= 50;

COMMIT;

create index t_local_i01 on t_local (col01);
create index t_local_i02 on t_local (col02);
create index t_local_i03 on t_local (col03);

exec dbms_stats.gather_table_stats(user,'t_local');

create database link dblink_remote connect to (...);

On local database I run the following simple query:

select l.col01, l.col02,r.col01, r.col14
from t_local l, t_remote@dblink_remote r
where l.col01=r.col01(+)
order by 1;

select * from table( dbms_xplan.display_cursor(null, null, 'typical LAST') );

Every time I execute the above query I got the following plan until t_remote table has up to 20 indexes:

Plan hash value: 901377631

------------------------------------------------------------------------------------------------
| Id  | Operation           | Name     | Rows  | Bytes | Cost (%CPU)| Time     | Inst   |IN-OUT|
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |          |       |       |    54 (100)|          |        |      |
|   1 |  SORT ORDER BY      |          |    50 |  1600 |    54   (2)| 00:00:01 |        |      |
|   2 |   NESTED LOOPS OUTER|          |    50 |  1600 |    53   (0)| 00:00:01 |        |      |
|   3 |    TABLE ACCESS FULL| T_LOCAL  |    50 |   300 |     3   (0)| 00:00:01 |        |      |
|   4 |    REMOTE           | T_REMOTE |     1 |    26 |     1   (0)| 00:00:01 | DBLIN~ | R->S |
------------------------------------------------------------------------------------------------

Remote SQL Information (identified by operation id):
----------------------------------------------------

4 - SELECT "COL01","COL14" FROM "T_REMOTE" "R" WHERE :1="COL01" (accessing
'DBLINK_REMOTE' )

Please note: remote query uses ‘WHERE :1=COL01’ predicate what forces remote Oracle db to use an index on COL01 column. This is a good plan, NL iterates over T_LOCAL tables and grabs a single row for matching predicate from remote table (a big one).

OK, let’s create another index on remote table:

create index t_remote_i21 on t_remote (col21);

Now watch what happens to the plan when 21st index on t_remote table was added:

Plan hash value: 425453934

------------------------------------------------------------------------------------------------
| Id  | Operation           | Name     | Rows  | Bytes | Cost (%CPU)| Time     | Inst   |IN-OUT|
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |          |       |       |   120 (100)|          |        |      |
|   1 |  SORT ORDER BY      |          |    50 |  1600 |   120   (2)| 00:00:02 |        |      |
|*  2 |   HASH JOIN OUTER   |          |    50 |  1600 |   119   (1)| 00:00:02 |        |      |
|   3 |    TABLE ACCESS FULL| T_LOCAL  |    50 |   300 |     3   (0)| 00:00:01 |        |      |
|   4 |    REMOTE           | T_REMOTE | 10000 |   253K|   115   (0)| 00:00:02 | DBLIN~ | R->S |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("L"."COL01"="R"."COL01")

Remote SQL Information (identified by operation id):
----------------------------------------------------

4 - SELECT "COL01","COL14" FROM "T_REMOTE" "R" (accessing 'DBLINK_REMOTE' )

Full table scan on t_remote! Even though col21 on which the index was created is not mentioned in the query at all. That looks to be a threshold on number of indexes that influences a plan stability over database links. Unfortunately this plan is not acceptable because it takes all big table from remote database to match only 50 rows.

I hoped that making index invisible will bring good performance.

On remote database:

alter index t_remote_i21 invisible;

And run the query again to see what changes:

Plan hash value: 425453934

------------------------------------------------------------------------------------------------
| Id  | Operation           | Name     | Rows  | Bytes | Cost (%CPU)| Time     | Inst   |IN-OUT|
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |          |       |       |   120 (100)|          |        |      |
|   1 |  SORT ORDER BY      |          |    50 |  1600 |   120   (2)| 00:00:02 |        |      |
|*  2 |   HASH JOIN OUTER   |          |    50 |  1600 |   119   (1)| 00:00:02 |        |      |
|   3 |    TABLE ACCESS FULL| T_LOCAL  |    50 |   300 |     3   (0)| 00:00:01 |        |      |
|   4 |    REMOTE           | T_REMOTE | 10000 |   253K|   115   (0)| 00:00:02 | DBLIN~ | R->S |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("L"."COL01"="R"."COL01")

Remote SQL Information (identified by operation id):
----------------------------------------------------

4 - SELECT "COL01","COL14" FROM "T_REMOTE" "R" (accessing 'DBLINK_REMOTE' )

Nope, still FULL table scan.

Let’s drop 21st index on remote database:

drop index t_remote_i21;

And run the query one more time:

Plan hash value: 901377631

------------------------------------------------------------------------------------------------
| Id  | Operation           | Name     | Rows  | Bytes | Cost (%CPU)| Time     | Inst   |IN-OUT|
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |          |       |       |    54 (100)|          |        |      |
|   1 |  SORT ORDER BY      |          |    50 |  1600 |    54   (2)| 00:00:01 |        |      |
|   2 |   NESTED LOOPS OUTER|          |    50 |  1600 |    53   (0)| 00:00:01 |        |      |
|   3 |    TABLE ACCESS FULL| T_LOCAL  |    50 |   300 |     3   (0)| 00:00:01 |        |      |
|   4 |    REMOTE           | T_REMOTE |     1 |    26 |     1   (0)| 00:00:01 | DBLIN~ | R->S |
------------------------------------------------------------------------------------------------

Remote SQL Information (identified by operation id):
----------------------------------------------------

4 - SELECT "COL01","COL14" FROM "T_REMOTE" "R" WHERE :1="COL01" (accessing
'DBLINK_REMOTE' )

Yes, good plan is taken again.

RMAN backup problems after DataDomain upgrade from 5.4.0.6 to 5.5.0.7

We encountered some problems with RMAN backups on DataDomain after we upgraded DD 5.4.0.6 to 5.5.0.7.

RMAN job failed with the following error message:

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of backup command on t2 channel at 11/07/2014 07:41:24
ORA-19506: failed to create sequential file, name="ar_DB_43006_1_862990882", parms=""
ORA-27028: skgfqcre: sbtbackup returned error
ORA-19511: Error received from media manager layer, error text:
sbtbackup: Could not create file ar_DB_43006_1_862990882 on host datadomain, error 5034

After some investigation I looked at DataDomain configuration and it turned out that our DD user (ddoracleuser) lost its assigment to storage-unit (oracle_su). I tried the following ddboost modify command and it sorted the problem out:

ddboost storage-unit modify oracle_su user ddoracleuser

LpxMemFree() while mapping XML data to wrong datatype

Recently a user was getting ORA-03113 error while trying to extract xml file from BLOB column and using XMLTYPE constructor.

The function used for this task was similar to this one:

FUNCTION blob_to_xmltype (blob_in IN BLOB)
RETURN XMLTYPE
AS
v_clob CLOB:=null;
v_varchar VARCHAR2(32767);
v_start PLS_INTEGER := 1;
v_buffer PLS_INTEGER := 32767;
BEGIN

DBMS_LOB.CREATETEMPORARY(v_clob, TRUE);

if DBMS_LOB.GETLENGTH(blob_in)>0 then

FOR i IN 1..CEIL(DBMS_LOB.GETLENGTH(blob_in) / v_buffer)
LOOP
v_varchar := UTL_RAW.CAST_TO_VARCHAR2(DBMS_LOB.SUBSTR(blob_in, v_buffer, v_start));
DBMS_LOB.WRITEAPPEND(v_clob, LENGTH(v_varchar), v_varchar);
v_start := v_start + v_buffer;
END LOOP;
end if;

RETURN XMLTYPE(v_clob);

END blob_to_xmltype;

And here the error:

select blob_to_xmltype(t.blob_output) from blob_table t;

ERROR:
ORA-03113: end-of-file on communication channel

In alert.log ORA-07445 was logged:

ORA-07445: exception encountered: core dump [LpxMemFree()+200] [SIGSEGV] [ADDR:0x0] [PC:0x2654A28] [SI_KERNEL(general_protection)] []

All XML data was in similar format with the following heading:

<?xml version="1.0" encoding="UTF-8"?>
<x name="nameX1">
<y name="nameY1" value="valueY1"/>
<y name="nameY2" value="valueY2"/>
<y name="nameY3" value="valueY3"/>
</x>

I was trying to analyze the problem and what I found was that BLOB column contained a proper XML file in UTF-8 encoding with some national characters.
On the other hand the database was using NLS_CHARACTERSET=EE8MSWIN1250 and NLS_NCHAR_CHARACTERSET=UTF8. The above function uses UTL_RAW.CAST_TO_VARCHAR2 and this converted some national characters to EE8MSWIN1250 characterset.

XMLTYPE constructor found some ‘invalid’ characters in the file and couldn’t parse it properly.
I replaced UTL_RAW.CAST_TO_VARCHAR2 to UTL_RAW.CAST_TO_NVARCHAR2, VARCHAR2 to NVARCHAR2 and CLOB to NCLOB to keep UTF8 and this worked out.

FUNCTION blob_to_xmltype (blob_in IN BLOB)
RETURN XMLTYPE
AS
v_clob NCLOB:=null;
v_varchar NVARCHAR2(32767);
v_start PLS_INTEGER := 1;
v_buffer PLS_INTEGER := 32767;
BEGIN

DBMS_LOB.CREATETEMPORARY(v_clob, TRUE);

if DBMS_LOB.GETLENGTH(blob_in)>0 then

FOR i IN 1..CEIL(DBMS_LOB.GETLENGTH(blob_in) / v_buffer)
LOOP
v_varchar := UTL_RAW.CAST_TO_NVARCHAR2(DBMS_LOB.SUBSTR(blob_in, v_buffer, v_start));
DBMS_LOB.WRITEAPPEND(v_clob, LENGTH(v_varchar), v_varchar);
v_start := v_start + v_buffer;
END LOOP;
end if;

RETURN XMLTYPE(v_clob);

END blob_to_xmltype;

how to list & delete expired Oracle backups from TSM console

When you backup your Oracle databases to TSM server you probably schedule RMAN scripts that delete expired backups. It’s due to the fact that recommended TDPO policy for Oracle databases with verdeleted=0, retonly=0 and backdelete=yes keeps all backup files as active and let Oracle client (RMAN) delete backups when they are meant to be expired.

But there is a problem when you switch your Oracle database from one TSM server to another one or move and register your db to new RMAN catalog. In both cases RMAN delete script will not be able to delete old backups (new server – it’s obvious, new catalog – it might happen that not all required backup history is stored in controlfile and therefore resync will not populate new catalog with required backup history). If these backups are not deleted in other way they will stay in library forever.

I had both problems some time ago and found it useful to delete expired backups directly from dsmc console. To list ‘old’ backups I prepared a script that can just list at least N-days-old Oracle backups stored in TSM and generates ‘delete’ command to help administrator to get rid of these files.

#!/usr/bin/ksh

#----------------------------------------------------------------------------
# File: tsm_get_obsolete_backups.ksh
#
# This script:
#    a) lists all obsolete Oracle backups if LIST parameter is provided
#    b) generates dsmc DELETE commands for all obsolete Oracle backups if GENDELETE parameter is provided
#
# All Oracle backups older na N days (the second parameter) are considered obsolete
#
#----------------------------------------------------------------------------

if [ $# -ne 2 ]
then
  echo "SYNTAX : $0 <LIST|GENDELETE> <Ndays>"
  exit 1
fi

ACTION=$1
DAYS=$2

TSMid=_your_tsm_user_
TSMpa=_your_tsm_user_password_

if [ "$ACTION" != "LIST" -a "$ACTION" != "GENDELETE" ]
then
  echo "ACTION should be either LIST or GENDELETE. Aborting.."
  exit 1
fi

rm -f /tmp/gob.txt

dsmadmc -id=$TSMid -pa=$TSMpa -TABdelimited "select NODE_NAME from nodes where PLATFORM_NAME like '%Oracle%'" |while read nodes 2>/dev/null
do
echo $nodes >> /tmp/gob.txt
done

for NODE in $(tail -n +10 /tmp/gob.txt | grep -v ANS8002I)
do
  echo "INFO> node: $NODE"
  echo
  rm -f /tmp/gob2.txt

  if [ "$ACTION" == "LIST" ]
  then
    dsmadmc -id=$TSMid -pa=$TSMpa -TABdelimited "select 'NODE: '||node_name||'     FILE: '||filespace_name||hl_name||ll_name||'     BACKUP_DATE: ',backup_date from backups where node_name='${NODE}' and backup_date<timestamp(current date -${DAYS} day,'00:00:00') order by BACKUP_DATE asc" |while read nodes
    do
      echo $nodes >> /tmp/gob2.txt
    done
  else
    dsmadmc -id=$TSMid -pa=$TSMpa -TABdelimited "select 'delete backup '||filespace_name||hl_name||ll_name||' -subdir=yes' from backups where node_name='${NODE}' and backup_date<timestamp(current date -${DAYS} day,'00:00:00')" |while read nodes
    do
      echo $nodes >> /tmp/gob2.txt
    done
  fi

  tail -n +10 /tmp/gob2.txt | grep -v "^AN"

done

rm -f /tmp/gob.txt

To list Oracle backups older that 1 year:

$ tsm_get_obsolete_backups.ksh LIST 365

To generate dsmc DELETE script for all Oracle backups older that 1 year:

$ tsm_get_obsolete_backups.ksh GENDELETE 365