Friday, March 19, 2010

Alert Log Scraping with Oracle's ADRCI Utility

Alert Log Scraping with Oracle's ADRCI Utility
By James Koopmann



Oracle’s new ADR with command interface shows promise for database administrators who like to script their own solution for quickly scraping the alert log and automatically looking for errors.

Oracle’s alert log is a chronological log of important messages and errors that are generated using an Oracle database. Often times it is the first place DBAs look to see if there are errors being generated or checking to see if the Oracle database is healthy, or at least not spitting out errors and complaining about something. If you have yet to pry open the hood of Oracle and have never seen an alert log, below is a snippet of some of the errors (ORA-) and messages that Oracle keeps in the alert log. Notice that I’ve included two different snippets. One is what most DBAs are familiar with, a pure text file, and one that is new to many, an XML alert log. Both are generated but the prior is, as suggested by Oracle, not being maintained and not receiving the same messaging as the XML alert log.

From the text alert log:

Wed Dec 09 14:55:16 2009
Errors in file /opt/app/oracle/diag/rdbms/db11/db11/trace/db11_ora_14798.trc:
ORA-00313: open failed for members of log group 3 of thread 1
ORA-00312: online log 3 thread 1: '/oradata/db11/redo03.log'
ORA-27037: unable to obtain file status
Linux Error: 2: No such file or directory
Additional information: 3
From the XML alert log:

client_id='' type='UNKNOWN' level='16'
module='sqlplus@ludwig (TNS V1-V3)' pid='14798'>
Errors in file /opt/app/oracle/diag/rdbms/db11/db11/trace/db11_ora_14798.trc:
ORA-00313: open failed for members of log group 3 of thread 1
ORA-00312: online log 3 thread 1: '/oradata/db11/redo03.log'
ORA-27037: unable to obtain file status
Linux Error: 2: No such file or directory
Additional information: 3



Clearly, they are very similar except for the XML tags. These XML tags cause some interesting problems for those of us that like to scrape the alert log with system utilities. It isn’t as easy as pointing the scripts at a new file out on disk. Thankfully Oracle has provided a utility (ADRCI) that can be used to extract diagnostic information generated from an Oracle database, including the alert log.

When used interactively by issuing the “adrci” command from a system prompt, which is often the way it is used, a user can easily extract alert log detail with very simple commands such as the following:

SHOW ALERT –TAIL; To display the last 10 lines of the alert log.
SHOW ALERT -TAIL 50; To display the last 50 lines of the alert log.
SHOW ALERT -P "MESSAGE_TEXT LIKE '%ORA-%'"; search for alert lines that have ORA- in them.
However, like many other DBA tasks we clearly hate logging into a database server and issuing commands. What is more expected is to have some set of commands that can be executed through a schedule to check and alert us, such as by email. Additionally if we schedule something to check on a regular interval, say 10 minutes, it becomes a better and more reliable monitoring methodology then having a DBA waste time logging into 10s or 100s of database servers every 10 minutes. Therefore, and to not belabor the point, here are the scripts that I’ve started to develop. I hope that you can use them:

This solution makes use of a very simple directory structure under the oracle user’s home directory. Here are the directories/files used:

/home/oracle/alert
This is where the main script and supporting files exist

/home/oracle/alert/CHKALERT
This file contains ORA- errors, or any other errors we are interested in checking for in the alert log

/home/oracle/alert/CHKALERT.sh
This is the script that will need to be executed

/home/oracle/alert/CHKALERT.curr
File that contains information on the last time the alert log was scanned; containing information about the errors found and put into the CHKALERT.yyyy-mm-dd files

/home/oracle/alert/lst
The directory where output is generated

/home/oracle/alert/lst/CHKALERT.adrci
Is dynamically generated during runtime and will be used as a script to the ADRCI utility

/home/oracle/alert/lst/CHKALERT.lst
Is dynamically generated during runtime and will be used as output to a SQL*Plus call to get some variables to be used when calling ADRCI

/home/oracle/alert/lst/CHKALERT.sql
Is the SQL*Plus SQL used to generate the CHKALERT.lst output

/home/oracle/alert/lst/CHKALERT.tmp
Contains header output from the alert log to get a timezone

/home/oracle/alert/lst/CHKALERT.yyyy-mm-dd
will be files generated for each day that contains ORA- errors for that day


Below is the script itself. To use, just execute by typing in at the command prompt:

[oracle@ludwig alert]$ ./CHKALERT.shThe file you will be interested in looking at, if you’ve setup the /home/oracle/alert/CHKALERT file properly to look for specific ORA errors, will be the /home/oracle/alert/lst/CHKALERT.log file. This file contains each of the ORA errors found for the last run and looks like the following:

CHKALERT.log
::::::::::::::
001:2010:03:04:02:32:29:ORA-00312: online log 1 thread 1: '/oradata/db11/redo01.log'
001:2010:03:04:02:32:29:ORA-00312: online log 2 thread 1: '/oradata/db11/redo02.log'
001:2010:03:04:02:32:29:ORA-00312: online log 3 thread 1: '/oradata/db11/redo03.log'
To fully automate the process, not covered in this article, would be to interrogate this file and if it contains something, it should be sent to the appropriate DBAs to figure out what to do with the errors found. Have fun with the script. Change as you see fit and begin to take advantage of Oracle new ADRCI utility for scraping the alert logs.

#!/bin/sh
#--File: CHKALERT.sh

#--setup
PGM="CHKALERT"
ALRT=$HOME/alert
LST=${ALRT}/lst
LOG=${LST}/${PGM}.log
CURR=${ALRT}/${PGM}.curr

#--Unix environment variables
ECHO=echo; export ECHO
CAT=/bin/cat; export CAT
RM=/bin/rm; export RM
TOUCH=/bin/touch; export TOUCH
GREP=/bin/grep; export GREP
AWK=/bin/awk; export AWK
WC=/usr/bin/wc; export WC
TAIL=/usr/bin/tail; export TAIL
HEAD=/usr/bin/head; export HEAD
SUM=/usr/bin/sum; export SUM

#--Oracle environment variables
ORACLE_SID=db11; export ORACLE_SID
ORACLE_HOME=`${GREP} ${ORACLE_SID}: /etc/oratab | ${AWK} -F: '{print $2}'`; export ORACLE_HOME
PATH=$ORACLE_HOME/bin:$PATH; export PATH

#--code
start=`date "+%Y:%m:%d:%H:%M:%S"`
${RM} ${LOG}
${TOUCH} ${LOG}

#--execute SQL to get some diagnostic variables
echo "set echo off" > ${LST}/${PGM}.sql
echo "set feedback off" >> ${LST}/${PGM}.sql
echo "set heading off" >> ${LST}/${PGM}.sql
echo "set linesize 40" >> ${LST}/${PGM}.sql
echo "set pagesize 55" >> ${LST}/${PGM}.sql
echo "set verify off" >> ${LST}/${PGM}.sql
echo "set linesize 300" >> ${LST}/${PGM}.sql
echo "SELECT 'homepath:'||replace(homepath.value,adrbase.value||'/','')" >> ${LST}/${PGM}.sql
echo " FROM v\$diag_info homepath, v\$diag_info adrbase" >> ${LST}/${PGM}.sql
echo " WHERE homepath.name = 'ADR Home'" >> ${LST}/${PGM}.sql
echo " AND adrbase.name = 'ADR Base';" >> ${LST}/${PGM}.sql
echo "SELECT 'day:'||to_char(sysdate ,'yyyy-mm-dd') FROM dual;" >> ${LST}/${PGM}.sql
echo "SELECT 'nextday:'||to_char(sysdate+1,'yyyy-mm-dd') FROM dual;" >> ${LST}/${PGM}.sql
echo "SELECT 'prevday:'||to_char(sysdate-1,'yyyy-mm-dd') FROM dual;" >> ${LST}/${PGM}.sql
echo "exit" >> ${LST}/${PGM}.sql
sqlplus -s '/as sysdba' @${LST}/${PGM}.sql > ${LST}/${PGM}.lst

#-- get diag information variables just queried from the database
homepath=`${GREP} homepath ${LST}/${PGM}.lst | ${AWK} -F":" '{print $2}'`
day=`${GREP} "^day" ${LST}/${PGM}.lst | ${AWK} -F":" '{print $2}'`
nextday=`${GREP} nextday ${LST}/${PGM}.lst | ${AWK} -F":" '{print $2}'`
prevday=`${GREP} prevday ${LST}/${PGM}.lst | ${AWK} -F":" '{print $2}'`

#-- get the timezone from the alert log (safest place to get)
#-- the proper timezone is needed to properly filter the alert log for date ranges you
#-- want to look at
echo "set echo off" > ${LST}/${PGM}.adrci
echo "set termout off" >> ${LST}/${PGM}.adrci
echo "set homepath ${homepath}" >> ${LST}/${PGM}.adrci
echo "spool ${LST}/${PGM}.tmp" >> ${LST}/${PGM}.adrci
echo "show alert -tail 1" >> ${LST}/${PGM}.adrci
echo "spool off" >> ${LST}/${PGM}.adrci
adrci script=${LST}/${PGM}.adrci 1>/dev/null 2>/dev/null
timezone=`${HEAD} -1 ${LST}/${PGM}.tmp | ${AWK} -F" " '{print $3}'`

#-- extract alert log errors for the current day (today) and previous day (yesterday)
#-- previous day alerts will be used if the current file has yesterday's day as last day;
#-- meaning that we have had a switch to a new day and might have errors still to
#-- process from the previous day
echo "set echo off" > ${LST}/${PGM}.adrci
echo "set termout off" >> ${LST}/${PGM}.adrci
echo "set homepath ${homepath}" >> ${LST}/${PGM}.adrci
echo "spool ${LST}/${PGM}.${day}" >> ${LST}/${PGM}.adrci
echo "show alert -P \"ORIGINATING_TIMESTAMP BETWEEN '${day} 00:00:00.000000 ${timezone}' AND
'${nextday} 00:00:00.000000 ${timezone}' AND MESSAGE_TEXT LIKE '%ORA-%'\" -term" >>
${LST}/${PGM}.adrci
echo "spool off" >> ${LST}/${PGM}.adrci
echo "spool ${LST}/${PGM}.${prevday}" >> ${LST}/${PGM}.adrci
echo "show alert -P \"ORIGINATING_TIMESTAMP BETWEEN '${prevday} 00:00:00.000000 ${timezone}' AND
'${day} 00:00:00.000000 ${timezone}' AND MESSAGE_TEXT LIKE '%ORA-%'\" -term" >>
${LST}/${PGM}.adrci
echo "spool off" >> ${LST}/${PGM}.adrci
adrci script=${LST}/${PGM}.adrci 1>/dev/null 2>/dev/null

#-- get current contents of the current file
#-- default to current day if no current file
if [ -r "${CURR}" ]
then
#-- if the current exists then get the information it contains
daychecksum=`${GREP} day ${CURR} | ${AWK} -F":" '{print $2}'`
daylastline=`${GREP} day ${CURR} | ${AWK} -F":" '{print $3}'`
daylastday=`${GREP} day ${CURR} | ${AWK} -F":" '{print $4}'`
else
#-- if the current does not exist then default to today
daychecksum=0
daylastline=3
daylastday=${day}
fi

#-- set the days to search through for alerts
#-- if last day in current file was yesterday then include previous day
#-- if last day in current file is not yesterday then just scan today's alerts
if [ "${daylastday}" = "${prevday}" ]
then
alertdays="${prevday} ${day}"
else
alertdays="${day}"
fi

#-- for each of the days to scan for alerts
for theday in ${alertdays}
do
#-- check alert errors for the last day.
if [ -r "${LST}/${PGM}.${theday}" ]
then
#-- If the checksum generated is DIFFERENT we should start reporting from the top.
#--
#-- If the checksum generated is the SAME we should start reporting from end of
#-- the previously generated output.
new_daychecksum=`${HEAD} -4 ${LST}/${PGM}.${theday} | ${SUM} | ${AWK} '{print $1}'`
if [ ${new_daychecksum} -ne ${daychecksum} ]
then
daychecksum=${new_daychecksum}
daylastline=3
fi

#-- get the number of lines in the generated errors so we can report to the
#-- end of the file and we know where to start next time.
new_daylastline=`${WC} -l ${LST}/${PGM}.${theday} | ${AWK} -F" " '{print $1}'`

#-- if the number of lines in the output is 3 then there are no errors found.
if [ ${new_daylastline} -ne 3 ]
then
#-- if number of lines in extracted alerts is the same as last time then no new alerts
if [ ${new_daylastline} -ne ${daylastline} ]
then
#-- find the line to begin reporting new alerts from
fromline=`expr ${new_daylastline} - ${daylastline}`
#-- produce alert lines for alerts defined in file CHKALERT
${TAIL} -${fromline} ${LST}/${PGM}.${theday} |
while read LINE
do
for ORAS in `${CAT} ${ALRT}/CHKALERT`
do
ora=`${ECHO} ${LINE} | ${GREP} ${ORAS}`
if [ $? -eq 0 ]
then
#-- you might want to do something here
#-- that is specific to certain ORA- errors
err="001"
echo "${err}:${start}:${LINE}" >> ${LOG}
fi
done
done
fi
fi
daylastline=${new_daylastline}
#-- update the current file only if the day being processed is current day
if [ "${theday}" = "${day}" ]
then
${ECHO} "day:"${daychecksum}":"${daylastline}":"${day} > ${CURR}
fi
fi
done

No comments:

Post a Comment