Book a Demo

Author Topic: Modelling attribute dependencies for an ETL proces  (Read 7551 times)

Mostard

  • EA Novice
  • *
  • Posts: 5
  • Karma: +0/-0
    • View Profile
Modelling attribute dependencies for an ETL proces
« on: April 29, 2011, 12:12:56 am »
Hi all,

I am evaluating the product to model data dependencies for an ETL process for a data warehouse. In other words, I want a way to document data "lineage" dependencies for fields in database tables.

The idea is that both data warehouse developers and source system developers would have a centralized place for inter-system impact analysis.

To do so, I reverse engineered the source database system into one model, and created my data warehouse database model in another.

Alas, I don't find a way to create "dependencies" from the table fields in my "data warehouse" db to the table fields in the "source system" (other than just typing them in the attribute notes, but I can do that in excel, don't need EA for that!)

Or I am wrong?

I also tried it another way: instead of creating attributes for my table fields, I created child "objects" which I could then connect (using dependency connectors) to the tables in the source system model, and model the source system field it is dependent on, as the connector's "name". At least I can model that, and create an RTF report on these dependencies.

However, this is still not really what I want:
- I cannot assign datatypes (etc...) to these uml objects the way I do that with table attributes
- These dependencies are not included in the HTML documents (which I prefer over the RTF stuff)

Does someone have some hints on how to approach this kind of modelling? I know, the above explanation is maybe not enough, but if someone would like to help me out, I'll send/post the EA project/screenshots to further explain :)

Thanks!


Sunshine

  • EA Practitioner
  • ***
  • Posts: 1350
  • Karma: +121/-10
  • Its the results that count
    • View Profile
Re: Modelling attribute dependencies for an ETL pr
« Reply #1 on: April 29, 2011, 07:19:09 am »
If I understand you correctly I think you are trying to map attributes in one table to attributes in another? i.e. source to target. You could try this approach;
    [1]Create an relationship between the source and target tables. Information Flow, trace or  dependency might be better than association to distinguish relationship type from that of the database.

    [2]Right click on one end of the relationship and select "
link to element feature". A dialog appears so select attribute in the feature type and then the attribute in the list that appears below. Click OK button.

[3] Repeat step 2 for the other end of the relationship choosing the attribute to map to.

[4] Repeat steps 1 to 3 to map other tables and attributes.

[/list]
You should end up with relationship that map one attribute in source table to an attribute in the target table.
Hope that is what you were looking for.  :)
Happy to help
:)

Geert Bellekens

  • EA Guru
  • *****
  • Posts: 13505
  • Karma: +572/-33
  • Make EA work for YOU!
    • View Profile
    • Enterprise Architect Consultant and Value Added Reseller
Re: Modelling attribute dependencies for an ETL pr
« Reply #2 on: April 29, 2011, 05:28:25 pm »
Another way to create relations between attributes is to use tagged values of type "RefGUID".

Geert

Mostard

  • EA Novice
  • *
  • Posts: 5
  • Karma: +0/-0
    • View Profile
Re: Modelling attribute dependencies for an ETL pr
« Reply #3 on: April 29, 2011, 09:03:02 pm »
Quote

Nice, I didn't know that could be done. It's definately better than creating another set of "objects". In the diagram that relation is indeed visualizing source/attributes involved. But this information still doesn't appear in the HTML documentation - these relations appear in the "other links" section of the table documentation, but they don't show the source/target attributes etc.

Can I adapt the HTML templates so that these (flow/dependency) relations appear for example right with the attribute detail information, in the "attributes" section of the table documentation? (The HTML template editor scares me, to be honest...  :-[)

Thanks,
M.

Mostard

  • EA Novice
  • *
  • Posts: 5
  • Karma: +0/-0
    • View Profile
Re: Modelling attribute dependencies for an ETL pr
« Reply #4 on: April 29, 2011, 09:18:00 pm »
Quote
Another way to create relations between attributes is to use tagged values of type "RefGUID".

Geert

Thanks for the answer, but I'm afraid I lost you there.

You're saying I add tagged values to these relations? Or to the attributes? I don't see a "type" specifier. I would paste the value from "copy node GUID to clipboard" option when I right click something, but I don't manage to do this for table attributes.

Geert Bellekens

  • EA Guru
  • *****
  • Posts: 13505
  • Karma: +572/-33
  • Make EA work for YOU!
    • View Profile
    • Enterprise Architect Consultant and Value Added Reseller
Re: Modelling attribute dependencies for an ETL pr
« Reply #5 on: April 29, 2011, 09:39:34 pm »
To be able to use this you'll have to define your own tagged values type.
In that definition you can tell EA what type of information this tagged value should accept, and for which types it is applicable.
This allow you to create tagged values that refer to other model elements.
You don't have to worry about the actual GUID's, EA will shown a nice select dialog when you click on the [...]
More explanation about tagged values configuration here: http://www.sparxsystems.com/enterprise_architect_user_guide/8.0/modeling_languages/predefinedtaggedvaluetypes.html

Geert

Mostard

  • EA Novice
  • *
  • Posts: 5
  • Karma: +0/-0
    • View Profile
Re: Modelling attribute dependencies for an ETL pr
« Reply #6 on: April 29, 2011, 10:20:14 pm »
Thanks, I'm getting there!

I created such a structured tagged value with in the "detail" definition:

Code: [Select]
Type=RefGUID;
Values=Attribute,Operation;

Indeed I can now select select an attributes using the [...]

But when I generate HTML documentation, it now shows, in the attribute details of the table page:

[...]

public static varchar(10)
  Volgnr

Details:

<<key>>Ordered
SourceSystemDependency=«column»Volgnummer

[...]

As you can see, I lost information about the actual source table. Can I fix that? (And, by the way, I already know it's a column. Don't need that explicit stereotype, but that's a detail.

Geert Bellekens

  • EA Guru
  • *****
  • Posts: 13505
  • Karma: +572/-33
  • Make EA work for YOU!
    • View Profile
    • Enterprise Architect Consultant and Value Added Reseller
Re: Modelling attribute dependencies for an ETL pr
« Reply #7 on: April 29, 2011, 11:11:33 pm »
I guess in that case you'll have to modify the html template to add the attributes owner name. (and remove the stereotype).

Unfortunately I don't know the first thing about modifying the html template for EA... :-[

Geert

Sunshine

  • EA Practitioner
  • ***
  • Posts: 1350
  • Karma: +121/-10
  • Its the results that count
    • View Profile
Re: Modelling attribute dependencies for an ETL pr
« Reply #8 on: May 02, 2011, 05:52:14 pm »
Quote

Nice, I didn't know that could be done. It's definately better than creating another set of "objects". In the diagram that relation is indeed visualizing source/attributes involved. But this information still doesn't appear in the HTML documentation - these relations appear in the "other links" section of the table documentation, but they don't show the source/target attributes etc.

Can I adapt the HTML templates so that these (flow/dependency) relations appear for example right with the attribute detail information, in the "attributes" section of the table documentation? (The HTML template editor scares me, to be honest...  :-[)

I don't see why you can't modify the HTML template to put in the information you want.  You have just got grab the bull by the horns and try it out. Remember backup your stuff regularly and you have those backups as a kind of parachute if anything goes wrong.
Before you start just think of answering the following question?
What information do I have to supply to whom and what format is best? If all you need to do is provide some document of the mappings from one database to another for an ETL developer to implement then consider a RTF document instead of HTML if it scares you.

As the SAS say "Who dares wins".
Or put another way by Chaucer "Nothing ventured, nothing gained".
;D

Happy to help
:)

Sunshine

  • EA Practitioner
  • ***
  • Posts: 1350
  • Karma: +121/-10
  • Its the results that count
    • View Profile
Re: Modelling attribute dependencies for an ETL pr
« Reply #9 on: May 02, 2011, 06:04:48 pm »
Quote
Another way to create relations between attributes is to use tagged values of type "RefGUID".

Geert

Interesting I didn't realise you could do that. Learn something everyday with EA. Upon further investigation you may wish to consider using RefGUIDList as some ETL mappings are not one to one but one to many or vice versa. For instance one attribute in a table in a database may go to many attributes in many tables in many other databases.

I prefer using the "link to element feature" with the trace relationship because it shows the data flows graphically between table and databases.
:)
Happy to help
:)

Mostard

  • EA Novice
  • *
  • Posts: 5
  • Karma: +0/-0
    • View Profile
Re: Modelling attribute dependencies for an ETL pr
« Reply #10 on: May 03, 2011, 01:06:27 am »
Quote
I prefer using the "link to element feature" with the trace relationship because it shows the data flows graphically between table and databases.
:)

Of the two solutions, that's what I like most too, because it contains the data "two ways".

But for now, I after experimenting some more, I don't manage to properly document them either in RTF or HTML

To be honest, the output format RTF/HTML is not crucial, but I need "some" way of spitting out a report to document the system

1 - RTF:
I can report on connector linked features, but I can only show the name/value/stereotype and type. And I don't see a way to override the "value" evalution to not show as "<<column>> ColumnName" but instead show the owner table.

2 - HTML:
The tags that can be used seem to be very limited.
As per the documentation (http://www.sparxsystems.com.au/downloads/resources/booklets/reporting_uml_models.pdf), the "link item" section has no tags for showing any connections with their linked elements; and neither has the "content - attributes item".

There is a way to document tagged values on the attributes (cfr. Geert's solution), but that just documents as (for example) "<<column>> ColumnName>>" (omitting the owner table) as described earlier. And again, I don't find a way to adapt the behaviour of the #TAGS# structure.