Book a Demo

Author Topic: Inferring PK - FK Relationships  (Read 5481 times)

TyreeJackson

  • EA Novice
  • *
  • Posts: 5
  • Karma: +0/-0
  • I love YaBB 1G - SP1!
    • View Profile
Inferring PK - FK Relationships
« on: June 13, 2006, 04:44:21 pm »
I have 3 database systems that I am responsible for inheriting from another development team.  The schemas are not documented and furthermore, no foreign relationships were ever defined in the databases, even though the data is relational.

Does anyone know if EA can infer relationships when reverse engineering a database that only has unique indexes and primary keys?  If not, does anyone know if there is a plugin for EA that can add that functionality.

Thanks!
Tyree Jackson

sargasso

  • EA Practitioner
  • ***
  • Posts: 1406
  • Karma: +1/-2
  • 10 COMFROM 30; 20 HALT; 30 ONSUB(50,90,10)
    • View Profile
Re: Inferring PK - FK Relationships
« Reply #1 on: June 13, 2006, 05:07:23 pm »
Even if it could, I would not trust the resulting model as far as I could kick it.  Consider a simple case of two indices on two integral columns as possible FKs to two other tables, which one should it pick for which one.

bruce
"It is not so expressed, but what of that?
'Twere good you do so much for charity."

Oh I forgot, we aren't doing him are we.

Paolo F Cantoni

  • EA Guru
  • *****
  • Posts: 8626
  • Karma: +259/-129
  • Inconsistently correct systems DON'T EXIST!
    • View Profile
Re: Inferring PK - FK Relationships
« Reply #2 on: June 13, 2006, 05:51:09 pm »
Quote
I have 3 database systems that I am responsible for inheriting from another development team.  The schemas are not documented and furthermore, no foreign relationships were ever defined in the databases, even though the data is relational.

Does anyone know if EA can infer relationships when reverse engineering a database that only has unique indexes and primary keys?  If not, does anyone know if there is a plugin for EA that can add that functionality.

Thanks!
Tyree Jackson
Just to strongly echo bruce's point, and (possibly to put some theoretical framework around it).  Value based Relational Database systems (such as all the current) ones, support the relational model but also allow you to join "apples" with "oranges", if the values in the columns just happen to be the same datatype.

Thus, to describe the underlying data as relational without defining the formal relationships is actually a non-sequitur.  It is exactly those relationships that give the model its power.  In particular, if the RDBMS hasn't defined what are technically known as Inversion Indexes (the indexes on the local table that are the inverse end of the Foreign Key relation) then any set of local columns could be mappable, if the data type is correct.

One of the primary functions of a Data Model is to clarify communication and embed the business rules so that inconsistent data can't be entered into the database (because of the formal referential integrity).  If the development team hasn't realised this thus far, then this is the time to document the model as NO ONE knows what's going on!  Time and time again, I have found that the definition of a data model (and enforcing the referential integrity) has picked up errors that have passed every other test (the most recent and interesting included a system that had passed three levels of system and user testing and had been in production for over six months - no-one had picked up the fact that invalid data was being propagated.

If the argument is: "We can't have referential integrity will slow down the system"
I have two observations:
1)  Ask the enquirer:  "How quick do you want the wrong answer?"
2) I have also had association with systems that were designed to be fast (before being right).  The system mentioned above (having full temporal referential integrity) ended up being, at times, 33 times faster than the alternate system - which like the one you mention had no relationships and did everything in memory to maintain "throughput".  Every time a developer tells you you can't use referential integrity because of speed reasons - suspect a poor design.  These assertions are often made with NO empirical evidence whatsoever.

The following quote from a topical historical figure may be pertinent:
Those who are enamored of practice without theory are like a pilot who goes into a ship without rudder or compass and never has any certainty where he is going. Practice should always be based on a sound knowledge of theory.
[size=10]Leonardo da Vinci (1452–1519)[/size]

HTH,
Paolo
Inconsistently correct systems DON'T EXIST!
... Therefore, aim for consistency; in the expectation of achieving correctness....
-Semantica-
Helsinki Principle Rules!

TyreeJackson

  • EA Novice
  • *
  • Posts: 5
  • Karma: +0/-0
  • I love YaBB 1G - SP1!
    • View Profile
Re: Inferring PK - FK Relationships
« Reply #3 on: June 16, 2006, 11:33:37 am »
Quote
Just to strongly echo bruce's point, and (possibly to put some theoretical framework around it).  Value based Relational Database systems (such as all the current) ones, support the relational model but also allow you to join "apples" with "oranges", if the values in the columns just happen to be the same datatype.

Thus, to describe the underlying data as relational without defining the formal relationships is actually a non-sequitur.  It is exactly those relationships that give the model its power.  In particular, if the RDBMS hasn't defined what are technically known as Inversion Indexes (the indexes on the local table that are the inverse end of the Foreign Key relation) then any set of local columns could be mappable, if the data type is correct.

One of the primary functions of a Data Model is to clarify communication and embed the business rules so that inconsistent data can't be entered into the database (because of the formal referential integrity).  If the development team hasn't realised this thus far, then this is the time to document the model as NO ONE knows what's going on!  Time and time again, I have found that the definition of a data model (and enforcing the referential integrity) has picked up errors that have passed every other test (the most recent and interesting included a system that had passed three levels of system and user testing and had been in production for over six months - no-one had picked up the fact that invalid data was being propagated.

If the argument is: "We can't have referential integrity will slow down the system"
I have two observations:
1)  Ask the enquirer:  "How quick do you want the wrong answer?"
2) I have also had association with systems that were designed to be fast (before being right).  The system mentioned above (having full temporal referential integrity) ended up being, at times, 33 times faster than the alternate system - which like the one you mention had no relationships and did everything in memory to maintain "throughput".  Every time a developer tells you you can't use referential integrity because of speed reasons - suspect a poor design.  These assertions are often made with NO empirical evidence whatsoever.

The following quote from a topical historical figure may be pertinent:
Those who are enamored of practice without theory are like a pilot who goes into a ship without rudder or compass and never has any certainty where he is going. Practice should always be based on a sound knowledge of theory.
[size=10]Leonardo da Vinci (1452–1519)[/size]
HTH,
Paolo


Perhaps I should have clarified my intent and situation here.  I'm not attempting to define foreign key relationships for the purpose of deploying those relationships back into the database.  I am only attempting to understand 3 databases, that I have been made responsible for, that were acquired from another vendor.  

The other vendor, having already transferred these assets to us, are now out of the picture and can not be called on to provide any assistance.  Additionally, this vendor never documented the database thoroughly.  As such, I am now attempting to analyze an existing set of systems that are already running in production.

I am looking for a way to reverse engineer the schemas with inferred relationships for the sole purpose of trying to understand the schema and how data "may" relate within.  I obviously would not trust the inferred relationships, but at least it would give me a running start.  I'm only looking for possible relationships from which I will then identify the probable ones and subsequently verify those.

FYI> Embarcadero's ER Studio has this functionality.  However, I am an enthusiastic user of EA, and as such was hoping that EA may have someway to do this.  Plus I already have a corporate license to EA, and am not looking forward to having to go to my management to explain that I need a $3000 license for ER Studio for this one task.
From: http://www.embarcadero.com/resources/faqs/erfaq.html#Q32
.  ER/Studio can reverse engineer relationships even though relationships are not specified in a database. When you reverse-engineer a database you are given the option to Infer Primary Keys or to Infer Foreign Keys (based on indexes or unique constraints).

Again, allow me to reiterate.  I am not looking to trust the results of any such inferrence at all.  I'm just looking to leverage the results as clues in my investigation of the existing systems.  That said, I appreciate your reply and your implications of my not understanding theory (btw: I have 9 years of experience in designing relational databases).

However, my original question still stands:

Does anyone know if EA can infer relationships when reverse engineering a database that only has unique indexes and primary keys?  If not, does anyone know if there is a plugin for EA that can add that functionality.

Thanks.
Tyree Jackson

Paolo F Cantoni

  • EA Guru
  • *****
  • Posts: 8626
  • Karma: +259/-129
  • Inconsistently correct systems DON'T EXIST!
    • View Profile
Re: Inferring PK - FK Relationships
« Reply #4 on: June 16, 2006, 05:40:21 pm »
Tyree, apologies for the inference - it wasn't intended.  The Leonardo quote wasn't directed at you, but developers who have no Data Management background and try to second guess DB performance.  I've seen it all too often.

I've actually been a user of ERStudio (on-and-off ) for about 8 years (and it took 5 of those to finally get them to start to fix the broken internal conceptual and physical model of the product).

Thanks for the clarification of your intent, it simplifies matters considerably!

Given the state of the system you took over, and from my knowledge of the ERStudio internals (having spent most of that time writing automation to move around ER's internal model) I'd still NOT rely on that functionality since it relies too much on name mapping (in addition to indexes and constraints) - which, if the DB is like many DBs of this type I've come across, a highly suspect notion.  That is, if they haven't even enforced RI then how can you trust the naming?   In addition, I know for a fact (since I've had to discover ways to get around the problem ) that ER Studio internally handles columns involved in  multiple foreign keys in a broken way.

Since, like bruce, we now all agree we wouldn't trust the ER Studio inference too far, and you intend to manually verify the inferences, you might as well do that in EA.  I would download the trial version of ER Studio, reverse engineer, infer the relationships and then generate a new empty database which I'd then reverse engineer into EA.   Since you are only really interested in a a one-off process, there's no point in keeping ER Studio any longer.

NOTE: EA still has a way to go with database support, but Sparx tell us Views are coming in 6.5.

NOTE ALSO: also my comments refer to versions of ER Studio up to 6.6.1 (I haven't used 7 yet - but based on Embarcadero's track record, not that much is likely to have changed).

I'm not saying EA is better than ERStudio for DB work (and if you search the forum you'll see I'm no apologist for Sparx), but if you're trying to justify $3000 versus $300...

So, again, apologies for any slight I inadvertently caused,you...

As for the DB design you've inherited, my comiserations:
       "The structure of a system tends to mirror the structure of the group producing it."
-Mel Conway
[/color]
Cheerz,
Paolo
Inconsistently correct systems DON'T EXIST!
... Therefore, aim for consistency; in the expectation of achieving correctness....
-Semantica-
Helsinki Principle Rules!

Dave_Bullet

  • EA User
  • **
  • Posts: 295
  • Karma: +0/-0
    • View Profile
Re: Inferring PK - FK Relationships
« Reply #5 on: June 19, 2006, 03:36:27 pm »
I have a similar requirement.

You can write some SQL to populate the t_connector table with the implied PK / FK relationships.  I haven't worked it out yet but when I do I will post it here.

Please note - it is likely you will have to re-run any SQL script to recreate PK / FK relationships as on a re-import and sync - EA will probably blow them away (which it should) since they don't physically exist.

Cheers,
David.
"I know I'm close to a good design, but it's like the balloon animals, squeeze in one spot and the problem moves down the line"

sargasso

  • EA Practitioner
  • ***
  • Posts: 1406
  • Karma: +1/-2
  • 10 COMFROM 30; 20 HALT; 30 ONSUB(50,90,10)
    • View Profile
Re: Inferring PK - FK Relationships
« Reply #6 on: June 19, 2006, 04:29:07 pm »
As a final note of warning, I have found on many occassions when faced with rdb's without FK definitions, that people have populated the physical database with whatever relationships they desire and then code the "RI" rules  themselves.

IOW even if you re-model a nice hypothetical db - the existent data and existent code can often mean you take your nice model, screw it up and throw it away.

bruce
« Last Edit: June 19, 2006, 04:29:32 pm by sargasso »
"It is not so expressed, but what of that?
'Twere good you do so much for charity."

Oh I forgot, we aren't doing him are we.