Configuring a client TNS entry for TAF

Discussion:

Will Beldman

2018-09-24 21:06:21 UTC

I have two databases configured under Data Guard:
* PRIM - Usually the primary
* STBY - Usually the standby

From what I read, the following configuration should work:
===============
PRIM =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = primHost)(PORT = 1521))
)
(CONNECT_DATA =
(SERVICE_NAME = PRIM)
(FAILOVER_MODE = (BACKUP = STBY)(METHOD=basic)(TYPE=select)(RETRIES=10)
(DELAY=10))
)
)

STBY =
(DESCRIPTION=
(ADDRESS_LIST=
(ADDRESS=(PROTOCOL=TCP)(HOST=stbyHost)(PORT=1521))
)
(CONNECT_DATA=(SERVICE_NAME=STBY))
)
===============

I can successfully switch over the database.

My intent is for the client to *attempt* to connect to PRIM, but if that
fails, the connection passes through to STBY instead.

(Since PRIM is now a standby but still listening for connections, to ensure
failure on PRIM, I completely shut off the database. More on this later)

As expected, my connection to STBY is perfect:
===============
sqlplus system/******@STBY

...

SQL>select DATABASE_ROLE from v$database;

DATABASE_ROLE
----------------
PRIMARY
===============

However, as *not* expected, my connection to PRIM fails immediately:
===============
sqlplus system/******@PRIM

SQL*Plus: Release 12.1.0.2.0 Production on Mon Sep 24 16:52:52 2018

Copyright (c) 1982, 2014, Oracle. All rights reserved.

ERROR:
ORA-12514: TNS:listener does not currently know of service requested in
connect
descriptor
===============
It looks like RETRIES and DELAY is not respected here as the failure is
immediate.

So a few questions:
1. As is, what am I not understanding? Why isn't my configuration doing
anything I expect it to?
2. Obviously I have another issue where PRIM is in a MOUNT status and
"accepting" connections. So my connections to PRIM produced ORA-01033. What is
the preferred way to configure TAF with Data Guard in a RAC environment? I
have a two instance primary and a two instance standby. This is my ultimate
goal but achieving success by shutting off PRIM after a switchover is a good
start.
3. I read many who recommend complimenting a switchover with a DNS update or a
quick tnsnames.ora update to fool the client into "thinking" it is still
connecting to the same database it always was. I recognize this is *a*
solution but I'm looking for something more elegant that is inline with what
Oracle describes in the documentation.

Andrea Monti

2018-09-24 22:32:32 UTC

Permalink

Hi Will

If you want to use connect-time failover you should configure a custom
service_name and a tns entry like

CONNECT_TIME_FAILOVER =
(DESCRIPTION =
(ADDRESS_LIST =
(FAILOVER=on)
(LOAD_BALANCE=on) #optional - only use this to process different
addresses from first to last
(ADDRESS = (PROTOCOL = TCP)(HOST = primHost)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = stbyHost)(PORT = 1521))
)
(CONNECT_DATA =
(SERVICE_NAME=CUSTOM_SERVICE_NAME)
(failover_mode =
(RETRIES = 180)
(DELAY = 5)
)
)
)

This will work, as lons as the service CUSTOM_SERVICE_NAME will only be
active on the PRIMARY database.

If you want to use TAF you should use Oracle RAC rather than Oracle Data
Guard. TAF will protect your sessions from instance crash; additionally,
TAF may protect your select and your transactions, too (check how to use
FAILOVER_METHOD!).
However, TAF will not protect you from database failures:Data Guard and
site switchover will protect you in case of a database failure, but they
will not protect your ongoing select and transactions, as long as I know.

regards

Andrea

Post by Will Beldman
* PRIM - Usually the primary
* STBY - Usually the standby
===============
PRIM =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = primHost)(PORT = 1521))
)
(CONNECT_DATA =
(SERVICE_NAME = PRIM)
(FAILOVER_MODE = (BACKUP =
STBY)(METHOD=basic)(TYPE=select)(RETRIES=10)
(DELAY=10))
)
)
STBY =
(DESCRIPTION=
(ADDRESS_LIST=
(ADDRESS=(PROTOCOL=TCP)(HOST=stbyHost)(PORT=1521))
)
(CONNECT_DATA=(SERVICE_NAME=STBY))
)
===============
I can successfully switch over the database.
My intent is for the client to *attempt* to connect to PRIM, but if that
fails, the connection passes through to STBY instead.
(Since PRIM is now a standby but still listening for connections, to ensure
failure on PRIM, I completely shut off the database. More on this later)
===============
...
SQL>select DATABASE_ROLE from v$database;
DATABASE_ROLE
----------------
PRIMARY
===============
===============
SQL*Plus: Release 12.1.0.2.0 Production on Mon Sep 24 16:52:52 2018
Copyright (c) 1982, 2014, Oracle. All rights reserved.
ORA-12514: TNS:listener does not currently know of service requested in
connect
descriptor
===============
It looks like RETRIES and DELAY is not respected here as the failure is
immediate.
1. As is, what am I not understanding? Why isn't my configuration doing
anything I expect it to?
2. Obviously I have another issue where PRIM is in a MOUNT status and
"accepting" connections. So my connections to PRIM produced ORA-01033. What is
the preferred way to configure TAF with Data Guard in a RAC environment? I
have a two instance primary and a two instance standby. This is my ultimate
goal but achieving success by shutting off PRIM after a switchover is a good
start.
3. I read many who recommend complimenting a switchover with a DNS update or a
quick tnsnames.ora update to fool the client into "thinking" it is still
connecting to the same database it always was. I recognize this is *a*
solution but I'm looking for something more elegant that is inline with what
Oracle describes in the documentation.

Seth Miller

2018-09-25 16:35:25 UTC

Permalink

Will,

Using only client-side failover is not sufficient for what you are trying
to do. As Andrea mentioned, there are server-side services configurations
that have to be done as well. The Oracle MAA group puts out a really good
white paper on this.
https://www.oracle.com/technetwork/database/availability/client-failover-2280805.pdf

Post by Andrea Monti
Hi Will
If you want to use connect-time failover you should configure a custom
service_name and a tns entry like
CONNECT_TIME_FAILOVER =
(DESCRIPTION =
(ADDRESS_LIST =
(FAILOVER=on)
(LOAD_BALANCE=on) #optional - only use this to process different
addresses from first to last
(ADDRESS = (PROTOCOL = TCP)(HOST = primHost)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = stbyHost)(PORT = 1521))
)
(CONNECT_DATA =
(SERVICE_NAME=CUSTOM_SERVICE_NAME)
(failover_mode =
(RETRIES = 180)
(DELAY = 5)
)
)
)
This will work, as lons as the service CUSTOM_SERVICE_NAME will only be
active on the PRIMARY database.
If you want to use TAF you should use Oracle RAC rather than Oracle Data
Guard. TAF will protect your sessions from instance crash; additionally,
TAF may protect your select and your transactions, too (check how to use
FAILOVER_METHOD!).
However, TAF will not protect you from database failures:Data Guard and
site switchover will protect you in case of a database failure, but they
will not protect your ongoing select and transactions, as long as I know.
regards
Andrea

Will Beldman

2018-09-25 19:10:37 UTC

Permalink

Thank you,

I have a poor understanding of Oracle Services and maybe TAF is the wrong term
so sorry for the confusion. Perhaps what I'm trying to achieve can't actually
be done.

To clarify my configuration, the primary is only a two node cluster and all
clients connect to either node through the SCAN address equally. The standby
is also a two node cluster.

1. I can gently take my database instances in and out of the cluster such that
clients are mostly unaffected (shutdown transactional).
2. I can do a switchover back and forth from the primary to the standby and
vice versa.
3. I'm trying to add a switchover to a secondary site *such that clients
continue to be unaffected* (or at least as minimal as possible).

Unfortunately today, the only way I can achieve this is to fudge the name
through a DNS update (I have to involve the NOC) or to instruct the users to
update their TNS entries and restart. Both are VERY disruptive!

If I am understanding you correctly, and if this can be done, I should be
creating a custom service that INCLUDES the standby instances (call it, say,
CUSTOM_SERVICE_NAME) and configuring the clients to point to that service
instead (as per the tnsnames.ora example you provided)?

Here's the config I have today:
======================
$ srvctl config database -db PRIM
Database unique name: PRIM
Database name: PRIM
...
Start options: open
Stop options: immediate
Database role: PRIMARY
...
Services: <-----------------EMPTY!
...
======================
======================
$ srvctl config database -db STBY
Database unique name: STBY
Database name: STBY
...
Start options: mount
Stop options: immediate
Database role: PHYSICAL_STANDBY
...
Services: <-----------------EMPTY!
...
======================
So after creating CUSTOM_SERVICE_NAME, both PRIM and STBY should identify
themselves as part of the CUSTOM_SERVICE_NAME service?

The other thing I read is 11g eliminated the need to use database triggers or
manually attempting to manage the service after a switchover. Is this
accurate? After a switchover, the service on the old primary should be
disabled and on the new primary should be automatically enabled? Am I reading
this correctly or is there still some manual intervention required after a
switchover?

Seth Miller

2018-09-25 19:48:56 UTC

Permalink

Will,

Using the acronym TAF is fine. There are various client and server side
connection switchover/failover mechanisms but most database folks will
understand what you mean.

Yes, it can be done without dns changes, without tnsnames changes, and with
minimal disruption to end users.

Yes, you need to create a custom service as described in the MAA document.

Yes, this will work with or without RAC and can be combined with instance
failover.

srvctl will never show the default service so it is normal that you don't
see anything from those commands.

The document should give you everything you need to set this up correctly.
A valuable lesson to take away from this is never give end users the
default service. Always create one or more services so that you can
customize, migrate, failover, change, upgrade, limit, report on, etc. the
end user's service without requiring client connection string updates.

Post by Will Beldman
Thank you,
I have a poor understanding of Oracle Services and maybe TAF is the wrong term
so sorry for the confusion. Perhaps what I'm trying to achieve can't actually
be done.
To clarify my configuration, the primary is only a two node cluster and all
clients connect to either node through the SCAN address equally. The standby
is also a two node cluster.
1. I can gently take my database instances in and out of the cluster such that
clients are mostly unaffected (shutdown transactional).
2. I can do a switchover back and forth from the primary to the standby and
vice versa.
3. I'm trying to add a switchover to a secondary site *such that clients
continue to be unaffected* (or at least as minimal as possible).
Unfortunately today, the only way I can achieve this is to fudge the name
through a DNS update (I have to involve the NOC) or to instruct the users to
update their TNS entries and restart. Both are VERY disruptive!
If I am understanding you correctly, and if this can be done, I should be
creating a custom service that INCLUDES the standby instances (call it, say,
CUSTOM_SERVICE_NAME) and configuring the clients to point to that service
instead (as per the tnsnames.ora example you provided)?
======================
$ srvctl config database -db PRIM
Database unique name: PRIM
Database name: PRIM
...
Start options: open
Stop options: immediate
Database role: PRIMARY
...
Services: <-----------------EMPTY!
...
======================
======================
$ srvctl config database -db STBY
Database unique name: STBY
Database name: STBY
...
Start options: mount
Stop options: immediate
Database role: PHYSICAL_STANDBY
...
Services: <-----------------EMPTY!
...
======================
So after creating CUSTOM_SERVICE_NAME, both PRIM and STBY should identify
themselves as part of the CUSTOM_SERVICE_NAME service?
The other thing I read is 11g eliminated the need to use database triggers or
manually attempting to manage the service after a switchover. Is this
accurate? After a switchover, the service on the old primary should be
disabled and on the new primary should be automatically enabled? Am I reading
this correctly or is there still some manual intervention required after a
switchover?

Post by Andrea Monti

that

Post by Andrea Monti

Post by Will Beldman
fails, the connection passes through to STBY instead.
(Since PRIM is now a standby but still listening for connections, to ensure
failure on PRIM, I completely shut off the database. More on this

later)

Post by Andrea Monti

Post by Will Beldman
===============
...
SQL>select DATABASE_ROLE from v$database;
DATABASE_ROLE
----------------
PRIMARY
===============
===============
SQL*Plus: Release 12.1.0.2.0 Production on Mon Sep 24 16:52:52 2018
Copyright (c) 1982, 2014, Oracle. All rights reserved.
ORA-12514: TNS:listener does not currently know of service requested in
connect
descriptor
===============
It looks like RETRIES and DELAY is not respected here as the failure is
immediate.
1. As is, what am I not understanding? Why isn't my configuration doing
anything I expect it to?
2. Obviously I have another issue where PRIM is in a MOUNT status and
"accepting" connections. So my connections to PRIM produced ORA-01033. What is
the preferred way to configure TAF with Data Guard in a RAC

environment? I

Post by Andrea Monti

Post by Will Beldman
have a two instance primary and a two instance standby. This is my ultimate
goal but achieving success by shutting off PRIM after a switchover is a good
start.
3. I read many who recommend complimenting a switchover with a DNS

update

Post by Andrea Monti

Post by Will Beldman
or a
quick tnsnames.ora update to fool the client into "thinking" it is

still

Post by Andrea Monti

Post by Will Beldman
connecting to the same database it always was. I recognize this is *a*
solution but I'm looking for something more elegant that is inline with what
Oracle describes in the documentation.

Will Beldman

2018-09-26 14:48:34 UTC

Permalink

Post by Seth Miller
A valuable lesson to take away from this is never give end users the
default service. Always create one or more services so that you can
customize, migrate, failover, change, upgrade, limit, report on, etc. the
end user's service without requiring client connection string updates.

Thank you! This has been insightful!

Stefan Knecht

2018-09-27 04:23:04 UTC

Permalink

The document posted by Seth is THE reference for setups like this.

If you're somewhat new to all this it may be a bit overwhelming, but I'd
highly recommend studying every last sentence of that paper. It has helped
me a lot in the past(it existed for older versions as well).

What makes it difficult is that there's many different major technologies
at play here - you have RAC, you have Data Guard. And then there's the
dreaded users.

To get the best possible outcome you need to first get a clear
understanding of how your applications are connecting to the database. OCI
driver? JDBC?

The next key thing to understand is that for certain features to work (best
example is TAF - Transparent Application Failover) - which is a RAC feature
and allows a connection to fail over to a different RAC instance - the
client / application must be designed to support this. This isn't something
you can just set up on the database server and be done with it,
unfortunately. It's also somewhat limited in what scenarios exactly can and
can not fail over cleanly. But I wouldn't worry about this too much - from
what you said earlier it seems your main concern is transparency to the
users with regards to the primary/standby locations.

From a TNS / connectivity point of view, what you want to ensure is that
connections are routed to whatever RAC instance is available. This means
the endpoint to use when connecting to (either the primary or the standby
cluster) should be the SCAN listener's hostname). The second part to the
puzzle is then to ensure that clients can use a single "name" to connect to
your environment, regardless of which one happens to be the primary or
standby in this case.

To set up the TNS configuration to also support that scenario, Seth's
linked MAA document is fabulous. What we used to do before all this was to
create a startup trigger that created a service based on the database role
(e.g. if it was primary, start the service "APPLICATION_RW") if it was a
standby it wouldn't be started. Nowadays, the Oracle Grid Infrastructure
does that for you, using srvctl to create a service with role affinity. The
clusterware will then make sure that it is properly handled, even with role
transitions.

Finally, the client configuration also needs to include all this. Again,
from Seth's link on page 12 there is a clear example how to do this:

SALES= (DESCRIPTION= (FAILOVER=on)
(CONNECT_TIMEOUT=5)(TRANSPORT_CONNECT_TIMEOUT=3)(RETRY_COUNT=3)
(ADDRESS_LIST= (LOAD_BALANCE=on)
(ADDRESS=(PROTOCOL=TCP)(HOST=prmy-scan)(PORT=1521))
(ADDRESS=(PROTOCOL=TCP)(HOST=stby-scan)(PORT=1521)))
(CONNECT_DATA=(SERVICE_NAME=oltpworkload)))

The key bits here are:
- The FAILOVER is directly in the DESCRIPTION entry. This means that each
entry in the ADDRESS_LIST will be contacted.
- Use the timeout parameters to control how long to spend to try and
connect.
- The ADDRESS_LIST entries will be tried in order - this means if you're
on the secondary site which is listed second in the list, connections may
be slower. You can tune that using the timeout settings
- The RAC connection strings for each of the sites are using the SCAN
(prmy-scan / stby-scan).
- The SERVICE_NAME is a service you define with srvctl with role affinity
(the document also tells you how to do that). This is key to make this work.

HTH

Stefan

Post by Will Beldman

Thank you! This has been insightful!

--
//
zztat - The Next-Gen Oracle Performance Monitoring and Reaction Framework!
Visit us at zztat.net | @zztat_oracle | fb.me/zztat | zztat.net/blog/

Mladen Gogala

2018-09-26 02:36:10 UTC

Permalink

Hi!

Replies are in-line

No, you can't. If you take one of your instances down, the clients
connected to that instance will experience an error. They may reconnect,
but they will experience an error.

Post by Will Beldman
2. I can do a switchover back and forth from the primary to the standby and
vice versa.

Yes you can. DG broker has made that process very, very easy.

Post by Will Beldman
3. I'm trying to add a switchover to a secondary site *such that clients
continue to be unaffected* (or at least as minimal as possible).

You can create a special service and switch it using a startup trigger,
like described here:

https://www.realdbamagic.com/data-guard-client-settings/

However, I am unclear why would you do that? If the purpose is a DR
test, then you should assume that you will lose the whole primary data
center. A manager in a company that has survived the hurricane Sandy has
said that his data center is sleeping with the fishes. Unfortunately,
that was not a joke, although it was a clear reference to the movie "The
Godfather" and therefore very funny. What happened next is that the DR
site was activated and resumed processing, with the total downtime
being 10 minutes. You will need spare application servers, spare DNS
server(s), spare web servers, spare routers, spare firewalls and
standby databases. Spare application servers will have DNS pointing to
the DR TNS entries. My point is that using the primary application
server, redirected to the standby database does not constitute a valid
DR test and tells you nothing about whether your company can survive
something like a hurricane, which is ultimately the goal of any DR test.

Post by Will Beldman
Unfortunately today, the only way I can achieve this is to fudge the name
through a DNS update (I have to involve the NOC) or to instruct the use
rs to
update their TNS entries and restart. Both are VERY disruptive!

Why are you switching database roles? Don't switch roles and there will
be no disruption.

--
Mladen Gogala
Database Consultant
Tel: (347) 321-1217

--
http://www.freelists.org/webpage/oracle-l