Discussion:
experience w/ UUID's for surrogate keys across entire schema?
Chris Stephens
2018-09-13 21:17:19 UTC
Permalink
we have an application that will be deployed at an unknown number of
locations. the application requires the ability to ingest data from every
location into a central repository. the application will also store data in
non-oracle databases at some locations. all locations will have the same
schema. all surrogate keys for all tables across all locations have to be
unique. some of these locations running Oracle will be "small" and some
will require beefy RAC systems to support the workload.

2 methods of satisfying this requirement are being considered:

1) use a multi-column or concatenated column PK with some sort of "site
id" + sequence based values for oracle systems and figure out the
equivalent in the other database products. To allow for scalability on busy
RAC systems I currently plan to recomment a site_id + inst_id +
mod(session_id, <some number>) + sequence value.

2) use UUID's generated in the application code for all surrogate keys
everywhere.

I'm being asked to weigh in on the advantages/disadvantages of both. Option
1 is ideal from an Oracle perspective if performance is primary concern
(there will be locations that will stress any hardware we throw at it).
Option 2 is preferred by the developers and I think is functionally fine.

I think the primary concern for option 2 is the completely random nature of
UUIDs and the fact that all these PK/Unq indexes will need to be fully
cached to maintain performance which will stress oracle's buffer management
and result in far more physical I/O for inserts as well as for other
database activity that gets aged out of buffer cache. I have explained that
concern as best I can but the developers are (justifiably asking for more
concrete answers other than "this will likely become a problem at some
point if the application activity exceeds the ability of Oracle, on the
current hardware, to maintain active blocks in the buffer cache". I am
going to try and come up with a test to demonstrate the degraded
performance on our current 3-node 12.2 RAC system (child's play for what
eventual production and test systems will be). Does anyone have suggestions
on how to best set up that test and demonstrate the downfalls of UUID
approach at some currently unknown scale? Also, does anyone have any
experience using externally generated UUID's as surrogate keys in Oracle?
Good/bad/indifferent?

==
==

I'm most certainly reaching out to oracle-l more than ever before as these
are problems I've never had to deal with and truly appreciate all the
people that take the time to chime in.

Thanks!
Chris
"Reen, Elizabeth " (Redacted sender "elizabeth.reen" for DMARC)
2018-09-13 21:45:56 UTC
Permalink
For option 2, how do they plan to sync the UUIDs across all the sites? How do they expect to handle network latency? Is there going to be a key master?

It may be functionally fine, but I see an operational nightmare.


Liz

Elizabeth Reen
CPB Database Group Manager

Service Now Group: CPB-ORACLE-DB-SUPPORT


From: oracle-l-***@freelists.org [mailto:oracle-l-***@freelists.org] On Behalf Of Chris Stephens
Sent: Thursday, September 13, 2018 5:17 PM
To: oracle-l
Subject: experience w/ UUID's for surrogate keys across entire schema?

we have an application that will be deployed at an unknown number of locations. the application requires the ability to ingest data from every location into a central repository. the application will also store data in non-oracle databases at some locations. all locations will have the same schema. all surrogate keys for all tables across all locations have to be unique. some of these locations running Oracle will be "small" and some will require beefy RAC systems to support the workload.

2 methods of satisfying this requirement are being considered:

1) use a multi-column or concatenated column PK with some sort of "site id" + sequence based values for oracle systems and figure out the equivalent in the other database products. To allow for scalability on busy RAC systems I currently plan to recomment a site_id + inst_id + mod(session_id, <some number>) + sequence value.

2) use UUID's generated in the application code for all surrogate keys everywhere.

I'm being asked to weigh in on the advantages/disadvantages of both. Option 1 is ideal from an Oracle perspective if performance is primary concern (there will be locations that will stress any hardware we throw at it). Option 2 is preferred by the developers and I think is functionally fine.

I think the primary concern for option 2 is the completely random nature of UUIDs and the fact that all these PK/Unq indexes will need to be fully cached to maintain performance which will stress oracle's buffer management and result in far more physical I/O for inserts as well as for other database activity that gets aged out of buffer cache. I have explained that concern as best I can but the developers are (justifiably asking for more concrete answers other than "this will likely become a problem at some point if the application activity exceeds the ability of Oracle, on the current hardware, to maintain active blocks in the buffer cache". I am going to try and come up with a test to demonstrate the degraded performance on our current 3-node 12.2 RAC system (child's play for what eventual production and test systems will be). Does anyone have suggestions on how to best set up that test and demonstrate the downfalls of UUID approach at some currently unknown scale? Also, does anyone have any experience using externally generated UUID's as surrogate keys in Oracle? Good/bad/indifferent?

==
==

I'm most certainly reaching out to oracle-l more than ever before as these are problems I've never had to deal with and truly appreciate all the people that take the time to chime in.

Thanks!
Chris
Chris Stephens
2018-09-14 12:30:09 UTC
Permalink
Thanks for response Elizabeth. Part of the reason for choosing UUID's is
that they are always (almost) guaranteed to be unique so there doesn't need
to be any syncing. There is no referential integrity across sites.

The different sites will not need to communicate with each other to operate
so network latency isn't much of a concern. However, bandwidth might be
though not my problem at the moment. :) There will be situations where data
will need to be extracted from one site and loaded into the central
repository site in a batch oriented matter.
Post by "Reen, Elizabeth " (Redacted sender "elizabeth.reen" for DMARC)
For option 2, how do they plan to sync the UUIDs across
all the sites? How do they expect to handle network latency? Is there
going to be a key master?
It may be functionally fine, but I see an operational nightmare.
Liz
Elizabeth Reen
CPB Database Group Manager
Service Now Group: CPB-ORACLE-DB-SUPPORT
*Sent:* Thursday, September 13, 2018 5:17 PM
*To:* oracle-l
*Subject:* experience w/ UUID's for surrogate keys across entire schema?
we have an application that will be deployed at an unknown number of
locations. the application requires the ability to ingest data from every
location into a central repository. the application will also store data in
non-oracle databases at some locations. all locations will have the same
schema. all surrogate keys for all tables across all locations have to be
unique. some of these locations running Oracle will be "small" and some
will require beefy RAC systems to support the workload.
1) use a multi-column or concatenated column PK with some sort of "site
id" + sequence based values for oracle systems and figure out the
equivalent in the other database products. To allow for scalability on busy
RAC systems I currently plan to recomment a site_id + inst_id +
mod(session_id, <some number>) + sequence value.
2) use UUID's generated in the application code for all surrogate keys everywhere.
I'm being asked to weigh in on the advantages/disadvantages of both.
Option 1 is ideal from an Oracle perspective if performance is primary
concern (there will be locations that will stress any hardware we throw at
it). Option 2 is preferred by the developers and I think is functionally
fine.
I think the primary concern for option 2 is the completely random nature
of UUIDs and the fact that all these PK/Unq indexes will need to be fully
cached to maintain performance which will stress oracle's buffer management
and result in far more physical I/O for inserts as well as for other
database activity that gets aged out of buffer cache. I have explained that
concern as best I can but the developers are (justifiably asking for more
concrete answers other than "this will likely become a problem at some
point if the application activity exceeds the ability of Oracle, on the
current hardware, to maintain active blocks in the buffer cache". I am
going to try and come up with a test to demonstrate the degraded
performance on our current 3-node 12.2 RAC system (child's play for what
eventual production and test systems will be). Does anyone have suggestions
on how to best set up that test and demonstrate the downfalls of UUID
approach at some currently unknown scale? Also, does anyone have any
experience using externally generated UUID's as surrogate keys in Oracle?
Good/bad/indifferent?
==
==
I'm most certainly reaching out to oracle-l more than ever before as these
are problems I've never had to deal with and truly appreciate all the
people that take the time to chime in.
Thanks!
Chris
"Reen, Elizabeth " (Redacted sender "elizabeth.reen" for DMARC)
2018-09-14 14:20:07 UTC
Permalink
How do create them? I am not familiar with this UUID.

Liz

Elizabeth Reen
CPB Database Group Manager

Service Now Group: CPB-ORACLE-DB-SUPPORT


From: Chris Stephens [mailto:***@gmail.com]
Sent: Friday, September 14, 2018 8:30 AM
To: Reen, Elizabeth [ICG-IT]
Cc: oracle-l
Subject: Re: experience w/ UUID's for surrogate keys across entire schema?

Thanks for response Elizabeth. Part of the reason for choosing UUID's is that they are always (almost) guaranteed to be unique so there doesn't need to be any syncing. There is no referential integrity across sites.

The different sites will not need to communicate with each other to operate so network latency isn't much of a concern. However, bandwidth might be though not my problem at the moment. :) There will be situations where data will need to be extracted from one site and loaded into the central repository site in a batch oriented matter.
On Thu, Sep 13, 2018 at 4:47 PM Reen, Elizabeth <***@citi.com<mailto:***@citi.com>> wrote:
For option 2, how do they plan to sync the UUIDs across all the sites? How do they expect to handle network latency? Is there going to be a key master?

It may be functionally fine, but I see an operational nightmare.


Liz

Elizabeth Reen
CPB Database Group Manager
Service Now Group: CPB-ORACLE-DB-SUPPORT


From: oracle-l-***@freelists.org<mailto:oracle-l-***@freelists.org> [mailto:oracle-l-***@freelists.org<mailto:oracle-l-***@freelists.org>] On Behalf Of Chris Stephens
Sent: Thursday, September 13, 2018 5:17 PM
To: oracle-l
Subject: experience w/ UUID's for surrogate keys across entire schema?

we have an application that will be deployed at an unknown number of locations. the application requires the ability to ingest data from every location into a central repository. the application will also store data in non-oracle databases at some locations. all locations will have the same schema. all surrogate keys for all tables across all locations have to be unique. some of these locations running Oracle will be "small" and some will require beefy RAC systems to support the workload.

2 methods of satisfying this requirement are being considered:

1) use a multi-column or concatenated column PK with some sort of "site id" + sequence based values for oracle systems and figure out the equivalent in the other database products. To allow for scalability on busy RAC systems I currently plan to recomment a site_id + inst_id + mod(session_id, <some number>) + sequence value.

2) use UUID's generated in the application code for all surrogate keys everywhere.

I'm being asked to weigh in on the advantages/disadvantages of both. Option 1 is ideal from an Oracle perspective if performance is primary concern (there will be locations that will stress any hardware we throw at it). Option 2 is preferred by the developers and I think is functionally fine.

I think the primary concern for option 2 is the completely random nature of UUIDs and the fact that all these PK/Unq indexes will need to be fully cached to maintain performance which will stress oracle's buffer management and result in far more physical I/O for inserts as well as for other database activity that gets aged out of buffer cache. I have explained that concern as best I can but the developers are (justifiably asking for more concrete answers other than "this will likely become a problem at some point if the application activity exceeds the ability of Oracle, on the current hardware, to maintain active blocks in the buffer cache". I am going to try and come up with a test to demonstrate the degraded performance on our current 3-node 12.2 RAC system (child's play for what eventual production and test systems will be). Does anyone have suggestions on how to best set up that test and demonstrate the downfalls of UUID approach at some currently unknown scale? Also, does anyone have any experience using externally generated UUID's as surrogate keys in Oracle? Good/bad/indifferent?

==
==

I'm most certainly reaching out to oracle-l more than ever before as these are problems I've never had to deal with and truly appreciate all the people that take the time to chime in.

Thanks!
Chris
Chris Stephens
2018-09-14 16:47:45 UTC
Permalink
Thanks Jonathan!

The UUID/GUID's will all be generated externally by the python code. Not
sure if that is dependent on OS mechanisms or not. Will definitely need to
check that out.

I found the thread you mentioned yesterday and read through. Lots of
interesting info.

I don't think the application will be doing any sort of range (psuedo-time)
based queries that would lend themselves to sequence option but will
definitely need to confirm that. I'm more worried about the buffer cache
"pollution" and index splits at this point.

Chris
In Oracle there's a sys_guid() function so, e.g.,
select sys_guid() from dual.
It returns a raw(16)
I've just been reading some notes on a few tests I did of sys_guid() a
couple of years ago, and there's all sorts of details to worry about. In
particular there are several diffent types of UUID, and it looks as if the
linux and Windows implementations of sys_guid() return different different
types - which could mean an increased chance of collisions.
There also the interesting oddity that the linux implementation shows a
"sequence like" starting 6 bytes if you select lots of values from a single
session. (The Windows version shows random leading bytes).
There's some interesting detail highlighted in a discussion on the Oracle
dev com database forum: https://community.oracle.com/thread/3994173
Regards
Jonathan Lewis
________________________________________
Sent: 14 September 2018 15:20
To: 'Chris Stephens'
Cc: oracle-l
Subject: RE: experience w/ UUID's for surrogate keys across entire schema?
How do create them? I am not familiar with this UUID.
Liz
Elizabeth Reen
CPB Database Group Manager
Service Now Group: CPB-ORACLE-DB-SUPPORT
Sent: Friday, September 14, 2018 8:30 AM
To: Reen, Elizabeth [ICG-IT]
Cc: oracle-l
Subject: Re: experience w/ UUID's for surrogate keys across entire schema?
Thanks for response Elizabeth. Part of the reason for choosing UUID's is
that they are always (almost) guaranteed to be unique so there doesn't need
to be any syncing. There is no referential integrity across sites.
The different sites will not need to communicate with each other to
operate so network latency isn't much of a concern. However, bandwidth
might be though not my problem at the moment. :) There will be situations
where data will need to be extracted from one site and loaded into the
central repository site in a batch oriented matter.
For option 2, how do they plan to sync the UUIDs across
all the sites? How do they expect to handle network latency? Is there
going to be a key master?
It may be functionally fine, but I see an operational nightmare.
Liz
Elizabeth Reen
CPB Database Group Manager
Service Now Group: CPB-ORACLE-DB-SUPPORT
On Behalf Of Chris Stephens
Sent: Thursday, September 13, 2018 5:17 PM
To: oracle-l
Subject: experience w/ UUID's for surrogate keys across entire schema?
we have an application that will be deployed at an unknown number of
locations. the application requires the ability to ingest data from every
location into a central repository. the application will also store data in
non-oracle databases at some locations. all locations will have the same
schema. all surrogate keys for all tables across all locations have to be
unique. some of these locations running Oracle will be "small" and some
will require beefy RAC systems to support the workload.
1) use a multi-column or concatenated column PK with some sort of "site
id" + sequence based values for oracle systems and figure out the
equivalent in the other database products. To allow for scalability on busy
RAC systems I currently plan to recomment a site_id + inst_id +
mod(session_id, <some number>) + sequence value.
2) use UUID's generated in the application code for all surrogate keys everywhere.
I'm being asked to weigh in on the advantages/disadvantages of both.
Option 1 is ideal from an Oracle perspective if performance is primary
concern (there will be locations that will stress any hardware we throw at
it). Option 2 is preferred by the developers and I think is functionally
fine.
I think the primary concern for option 2 is the completely random nature
of UUIDs and the fact that all these PK/Unq indexes will need to be fully
cached to maintain performance which will stress oracle's buffer management
and result in far more physical I/O for inserts as well as for other
database activity that gets aged out of buffer cache. I have explained that
concern as best I can but the developers are (justifiably asking for more
concrete answers other than "this will likely become a problem at some
point if the application activity exceeds the ability of Oracle, on the
current hardware, to maintain active blocks in the buffer cache". I am
going to try and come up with a test to demonstrate the degraded
performance on our current 3-node 12.2 RAC system (child's play for what
eventual production and test systems will be). Does anyone have suggestions
on how to best set up that test and demonstrate the downfalls of UUID
approach at some currently unknown scale? Also, does anyone have any
experience using externally generated UUID's as surrogate keys in Oracle?
Good/bad/indifferent?
==
==
I'm most certainly reaching out to oracle-l more than ever before as these
are problems I've never had to deal with and truly appreciate all the
people that take the time to chime in.
Thanks!
Chris
Sayan Malakshinov
2018-09-14 12:46:49 UTC
Permalink
Hi Chris,

Of course, you can extend option 2 as multicolumn (SITE_ID,UUID) and use
partitioning by SITE_ID to avoid some performance problems, but anyway
because of the nature of UUID all inserts will require split operations on
indexes, so sequences usually are better choice. It's much easier to solve
problems with concurrency for right block of index (for example, using hash
partitioning) than deal with constant splits
--
Best regards,
Sayan Malakshinov
http://orasql.org
Jonathan Lewis
2018-09-15 11:33:49 UTC
Permalink
Some of my email has failed to arrive at the Oracle-L archive again, so I thought I'd try posting directly to the list rather than trying any "reply" option.
One of the posts I made on the UUID question contained a link to a discussion about UUIDs on Oracle-L from a couple of years ago that might be worth reading.

https://community.oracle.com/thread/3994173

Regards
Jonathan Lewis
--
http://www.freelists.org/webpage/oracle-l
"Reen, Elizabeth " (Redacted sender "elizabeth.reen" for DMARC)
2018-09-17 15:46:52 UTC
Permalink
Never thought of those side effects, it's good to know.

Thanks,

Liz

Elizabeth Reen
CPB Database Group Manager

Service Now Group: CPB-ORACLE-DB-SUPPORT


-----Original Message-----
From: oracle-l-***@freelists.org [mailto:oracle-l-***@freelists.org] On Behalf Of Jonathan Lewis
Sent: Saturday, September 15, 2018 7:34 AM
To: oracle-***@freelists.org
Subject: experience w/ UUID's for surrogate keys across entire schema?


Some of my email has failed to arrive at the Oracle-L archive again, so I thought I'd try posting directly to the list rather than trying any "reply" option.
One of the posts I made on the UUID question contained a link to a discussion about UUIDs on Oracle-L from a couple of years ago that might be worth reading.

https://urldefense.proofpoint.com/v2/url?u=https-3A__community.oracle.com_thread_3994173&d=DwIFAw&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=yWMFosURAngbt8VLeJtKLVJGefQxustAZ9UxecV7xpc&m=H40jgiVKzasljyh2NBepPgYSRoH-gbQv6-BACelXVVw&s=UBHa_R1QzdbkUw3ufFXzmio0vKIrJJPA_34wX1w494Q&e=

Regards
Jonathan Lewis
--
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.freelists.org_webpage_oracle-2Dl&d=DwIFAw&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=yWMFosURAngbt8VLeJtKLVJGefQxustAZ9UxecV7xpc&m=H40jgiVKzasljyh2NBepPgYSRoH-gbQv6-BACelXVVw&s=hQWOXjjroLAaZ0VEIo8mJ5syGDBVz52mMcOb5bQ9J1A&e=


--
http://www.freelists.org/webpage/oracle-l

Loading...