Sparx Systems Forum
Enterprise Architect => Automation Interface, Add-Ins and Tools => Topic started by: Ian Mitchell on February 26, 2020, 11:11:24 pm
-
I need to create a performance enhancement bit of SQL, but I think I just went beyond my abilities.
The problem is to get
(1) all the elements in a package and
(2) where they have connections to other element, details of the other element as well.
I managed to get 1/2 way with:
SELECT *
FROM (T_OBJECT LEFT OUTER JOIN T_CONNECTOR ON T_OBJECT.OBJECT_ID = T_CONNECTOR.START_OBJECT_ID)
WHERE T_OBJECT.package_ID=1235
(no, I won't be using Select *...)
This is great - it returns all the elements, and details of their connectors. OK, there are some duplicate rows where an element has multiple connectors, but I'm fine with that.
What I can't seem to get to work it to re-join this back to t_object, to get all the details of each related object.
I've tried all kinds of variations with more joins, lots of brackets/no brackets, but I now feel like a monkey writing random SQL and hoping for an answer...
-
I am sure Geert will reply with a fancy SQL statement using Inner Joins etc, but I don't tend use these, will the following work for you?
SELECT source.Name, source.Object_Type, t_connector.Connector_Type, target.Name, target.Object_Type
FROM t_object source, t_connector, t_package, t_object target
WHERE source.Object_ID = t_connector.Start_Object_ID
AND target.Object_ID = t_connector.End_Object_ID
AND source.Package_ID = t_package.Package_ID
AND t_package.Package_ID = 1235
ORDER By source.Name ASC
This query uses alias as specified in the FROM clause
All the best
Phil
-
Thanks Phil, but this one doesn't seem to return elements which don't have connectors, and I need them as well.
I guess I could just run a second query without the clever 'join' bits..
Cheers
-
select * from t_object where Package_ID=1235
or exists (select * from t_connector, t_object other_end where other_end.Package_ID=1235 and t_object.Object_ID=t_connector.Start_Object_ID and other_end.Object_ID=t_connector.End_Object_ID)
or exists (select * from t_connector, t_object other_end where other_end.Package_ID=1235 and t_object.Object_ID=t_connector.End_Object_ID and other_end.Object_ID=t_connector.Start_Object_ID)The eap file I used to test could not handle checking both connector ends in the one exists and ended up doing a complete scan over the entire t_object, t_connector space.
-
It's really not that hard. ???
You need to to a left join to t_connector and another left join to t_object.
You need left joins because you also want to return those that don't have connectors.
This is the query I would write for SQL Server
select * from t_object o
left join t_connector c on o.Object_ID in (c.Start_Object_ID, c.End_Object_ID)
left join t_object o2 on o2.Object_ID in (c.Start_Object_ID, c.End_Object_ID)
and o2.Package_ID <> o.Package_ID
where o.Package_ID = 1235
I join with t_connector on both start and end object ID
and then I join with t_object again, but with the extra condition that the packageID has to be different (otherwise we would get duplicates as it would join with itself, or with elements from the same package
Now if you want to make this query work on .eap files as well you will have to dumb it down.
Consider the JET SQL interpreter as a slightly mentally challenged 6 year old
select * from ((t_object o
left join t_connector c on (o.Object_ID = c.Start_Object_ID
or o.Object_ID = c.End_Object_ID))
left join t_object o2 on (o2.Object_ID = c.Start_Object_ID
or o2.Object_ID = c.End_Object_ID))
where o.Package_ID = 1235
and ( o2.Package_ID is null or o2.Package_ID <> o.Package_ID)
It's exactly the same thing, just written differently.
Geert
-
Thanks Geert - this works perfectly, of course.
The only bit I don't understand is the last line:
and ( o2.Package_ID is null or o2.Package_ID <> o.Package_ID)Without this, I get lots more duplicate data, but I don't really understand why. Other than the nice one was done by you, and other by me...
Also, I was very brave, and did another join from 'o2' to any t_xref entries (to get additional stereotype information).
Something like
select * from
((
(t_object o left join t_connector c on (o.Object_ID = c.Start_Object_ID or o.Object_ID = c.End_Object_ID) ) left join t_object o2 on (o2.Object_ID = c.Start_Object_ID or o2.Object_ID = c.End_Object_ID))
left join t_xref x on (o2.Ea_guid= x.Client)
)
where o.package_ID IN (1235,1236)
and ( o2.Package_ID is null or o2.Package_ID <> o.Package_ID)
I think I may have discovered Geerts Secret of SQL - add lots of brackets, everywhere, until it works. Then stop.
-
Thanks Geert - this works perfectly, of course.
The only bit I don't understand is the last line:
and ( o2.Package_ID is null or o2.Package_ID <> o.Package_ID)Without this, I get lots more duplicate data, but I don't really understand why. Other than the nice one was done by you, and other by me...
That part is actually the equivalent of the second join constraint in the "regular" format.
It says that element o2 should be from a different package as o. (and it also allows for o2 to be null, otherwise we would have an inner join again)
If you don't include that constraint the query would join at least each element o with itself in the form of o2.
You see, we didn't specify if o or o2 had to be the source or the target. So it could be twice the source and therefore join with itself.
I hope I'm making sense to you... :-\
Geert
-
Is the query I suggested any easier to understand?
-
If I'm honest, no.
But that says more about my SQL knowledge that the quality of your reply.
I'll have a go at yours, when I have some spare brain space.
I finally decided what I really, really wanted.
("Tell me what you ...")
It's a monster query which, for the elements in a given set of packages...
- the elements, and their t_xref stereotype data
- all their connectors, with their clever stereotype data from t_xref
- all the target elements, with their t_xref data as well.
..and all playing nicely when there are no connectors, or no t_xref stereotype data for anyone.
At the moment, it's two queries, joined in code. I tried with one, and my head exploded.
-
My query isn't returning any information about any connectors, but it can easily be extended to get the full stereotype description.
Get all the objects in a package
select t_object.*, t_xref.Description
from t_object left join t_xref on (t_xref.Client=t_object.ea_guid and t_xref.Name='Stereotypes')
where Package_ID=1235
Also include any objects where a connector is linked to that object with the other end in the target package.
or exists (select * from t_connector, t_object other_end where other_end.Package_ID=1235 and t_object.Object_ID=t_connector.Start_Object_ID and other_end.Object_ID=t_connector.End_Object_ID)
or exists (select * from t_connector, t_object other_end where other_end.Package_ID=1235 and t_object.Object_ID=t_connector.End_Object_ID and other_end.Object_ID=t_connector.Start_Object_ID)Each of those lines is a bit like doing a join on the tables in the inner query, without actually doing the join.
-
Just as an FYI for other users, I just replaced some of my own SQL, which used the kind of non-explict join (which Phil also uses above) like WHERE table1.col1 = tale2.col2
with the Geert style, using explicit JOIN statements.
The two sets of SQL were pretty much the same, but the Geert-style was around 60x faster. 2500ms vs 42.
I always believed that DBMS optimizers were smart enough to make the two SQLs run the same way, but it seems not.
BTW - each SQL is running via the EA 'SQLQuery' API.
I'm not saying this is a universal result, but it has encouraged my to be smarter in my SQL.
@Eve - sorry, haven't tried your "OR EXISTS" approach yet, so don't know how it compares.
-
I can imagine that there is sometimes a difference.
With the JOIN style you already limit the table before joining.
With the WHERE style, it first makes a gigantic cartesian product (joining each row of each table B with each row of table B, with each row of table C,...) and only then you tell it do not include certain results (like 99,99% of the results probably)
I guess the more advanced databases (SQL Server, MySQL) will be better at optimizing stuff like that then the more basic ones (JET == MS Access)
Query optimization is sometimes weird. I've have wildly different query times between SQL Server and MySQL depending on the query style i used
The difference was unions against "or" statements
simplified
select * from t_object o
where o.Object_Type = 'Class'
union
select * from t_object o
where o.Object_Type = 'Actvity'versus
select * from t_object o
where o.Object_Type = 'Class'
or o.Object_Type = 'Activity'
The union variant was much faster on SQL Server, whereas the "or" variant was much faster on MySQL
And with "much" I mean a difference between 20 seconds and 20 minutes
Geert