Book a Demo

Author Topic: Parallel script execution  (Read 10799 times)

kjourdan

  • EA User
  • **
  • Posts: 71
  • Karma: +0/-0
    • View Profile
Parallel script execution
« on: November 07, 2016, 11:28:51 pm »
With a DBMS repository, it appears possible to run multiple scripts simultaneously. I believe I could run multiple instances of EA and run a different script in each instance or alternatively multiple users could each run a different script from their machine. Currently, I have some scripts that significant amount of time to execute (eg. 16 hrs). Some of these scripts could be split up and executed in parallel (eg. each script operating on specific stereotypes).

Has anybody had any experience with running multiple scripts in parallel and seen a reduction in overall execution time?

qwerty

  • EA Guru
  • *****
  • Posts: 13584
  • Karma: +397/-301
  • I'm no guru at all
    • View Profile
Re: Parallel script execution
« Reply #1 on: November 08, 2016, 01:25:59 am »
The bottle neck is most likely the database, not the client. So parallelizing things will not have that bug effect. Often it's simply counter-productive.

q.

Uffe

  • EA Practitioner
  • ***
  • Posts: 1859
  • Karma: +133/-14
  • Flutes: 1; Clarinets: 1; Saxes: 5 and counting
    • View Profile
Re: Parallel script execution
« Reply #2 on: November 08, 2016, 02:31:38 am »
Hello,


Well it all depends.

If a significant proportion of the total processing is done in the client, meaning your server is mostly idling, then it's worth a try. If your server is already going flat out, parallellizing the clients won't help because the amount of work for the server won't change.

But there's also the question of whether the client side can be parallellized at all. Are your scripts independent of one another? It's alright if they just pull stuff out of the repository, but if they make changes as well you need to make sure that they don't interfere with each other -- otherwise you might end up with something that runs fast but produces incorrect results.

Whether you can make any performance gains from running multiple EA clients on the same machine I can't say. That's also worth a try.

Finally, as with any optimization job, it's measure, measure, measure. You have to find where the bottlenecks actually are, otherwise you're just groping in the dark. Check the server first of all: if it's staggering under the load, that's where you need to spend your resources.

HTH,


/Uffe
My theories are always correct, just apply them to the right reality.

Geert Bellekens

  • EA Guru
  • *****
  • Posts: 13523
  • Karma: +574/-33
  • Make EA work for YOU!
    • View Profile
    • Enterprise Architect Consultant and Value Added Reseller
Re: Parallel script execution
« Reply #3 on: November 08, 2016, 05:49:39 am »
Instead of trying to run the scripts in parallel, try to make them smarter.
What the hell are you doing that takes 16 hours to complete? :o

The most obvious performance gains are
- use SQL selects instead of iterating through the whole model
- make sure you don't iterate an EA.Collection more then once. (otherwise put the objects in an arrayList or something like that and iterate over that list). EA.Collections are not pure "in-memory" lists

Sometimes turning off GUI response can be a big help too.

Geert

qwerty

  • EA Guru
  • *****
  • Posts: 13584
  • Karma: +397/-301
  • I'm no guru at all
    • View Profile
Re: Parallel script execution
« Reply #4 on: November 08, 2016, 07:12:28 am »
And the most (un-)obvious: use hash tables instead of collections. My Perl and Python script are often faster than any C-stuff simply because they support associative arrays in a very smart way.

q.

kjourdan

  • EA User
  • **
  • Posts: 71
  • Karma: +0/-0
    • View Profile
Re: Parallel script execution
« Reply #5 on: November 17, 2016, 06:12:52 am »
Some additional info.  I have split one of these scripts into multiple smaller and run them from multiple instances successfully with a significant performance improvment (all 8 cores on my computer working hard instead of just 1). 

A background on the problem.  I have a significant number of classes that contain references (full qualified path) to other classes as tagged values. The script should use these tagged values to create connections between the source element and the referenced element.  One of the problems has to do with duplicate classes and non-unique element names. For example, if 2 components have a common interface (one provides and one requires), both components may define the same interface. The tagged value for one component would point to its interface definition within its package structure. An SQL query on all interfaces having a specific name would return multiple results.  A check of the fully qualified path would be needed to ensure the correct element is being connected.

I would like to make my script(s) work smarter and reduce the results from the SQL queries.  One thought is to use the Alias field to hold the full qualified path of all my elements. Doing so would allow me to do an SQL with the full qualified path instead of just the element name, stereotype, etc.. Is there any issue with using Alias for this purpose? Does EA use this for anything specifically?

Geert Bellekens

  • EA Guru
  • *****
  • Posts: 13523
  • Karma: +574/-33
  • Make EA work for YOU!
    • View Profile
    • Enterprise Architect Consultant and Value Added Reseller
Re: Parallel script execution
« Reply #6 on: November 17, 2016, 06:45:43 am »
I would't use the alias for this purpose.

What you can do is start building a dictionary of elements (using their fully qualified name as key)
In the past this has greatly increased the performance of some scripts having to use the fully qualified name often.
This means that you only have to visit a part of the tree once, and you only have to look for the part that isn't in the dictionary yet.

This this VBScript https://github.com/GeertBellekens/Enterprise-Architect-VBScript-Library/blob/master/Framework/Utils/ModelInfo.vbs
This especially speeds up the process if you have to resolve a lot of closely related fully qualified names. (where a big part of the name is the same)

Geert


RodneyRichardson

  • EA Novice
  • *
  • Posts: 11
  • Karma: +0/-0
    • View Profile
Re: Parallel script execution
« Reply #7 on: November 18, 2016, 10:15:19 pm »
A couple of thoughts:
* Have you considered using a Dependency rather than a tagged value to represent the relationship? This would involve a single look-up, instead of having to search for each element.
* I've had quite a lot of success using C# Interop for Automation (http://sparxsystems.com/enterprise_architect_user_guide/13.0/automation/setup.html). This would allow you to control the number of threads/processes, and hence do things in parallel. This involves the same object model, but a separate executable which connects to a single running instance of EA.

Geert Bellekens

  • EA Guru
  • *****
  • Posts: 13523
  • Karma: +574/-33
  • Make EA work for YOU!
    • View Profile
    • Enterprise Architect Consultant and Value Added Reseller
Re: Parallel script execution
« Reply #8 on: November 18, 2016, 10:58:37 pm »
* Have you considered using a Dependency rather than a tagged value to represent the relationship? This would involve a single look-up, instead of having to search for each element.
Or simply a RefGUID tagged value to reference an element instead of using the fully qualified name.

Geert

kjourdan

  • EA User
  • **
  • Posts: 71
  • Karma: +0/-0
    • View Profile
Re: Parallel script execution
« Reply #9 on: November 23, 2016, 05:40:07 am »
The problem is related to xmi files that are being imported.  Connections between elements in these xmi files are captured as qualified paths.  So after importing these xmi files, connections need to be established between various elements so that if an element is later moved, the relationship is still captured (the textual relationship could be updated based on the connection between two elements.

The script is expected to look for these tagged values and see if the referenced element exists.  If so, a connection between the elements is created.  Otherwise, a warning is logged to indicate the referenced element does not exist.

I will look into C# but had heard that performing various processes outside EA generally took longer than doing these from internal scripts. In this case, the ability to run multiple concurrent jobs might be beneficial.