Sparx Systems Forum
Enterprise Architect => General Board => Topic started by: Thomas Mercer-Hursh on February 08, 2008, 10:29:26 am
-
Any of us who have been around for a while have seen an enormous explosion, not just in the capacity of hardware, but in the size of databases and such. These days, a 100GB ERP database is almost common place. This got me to wondering, though, what is a large EA database, both in terms of size and in terms of the number of objects, components, connectors, and the like. I.e., what is making them large ... lots of code, lots of text, or is it just the complexity of the model. Is it a particular type of system that makes them large? And, are those large repositories still running on JET or are they mostly on "real" databases?
No real significance ... just curious.
-
No one seems to want to chip in on this, so I'll through one out and you can tell me if it is small, medium, or large. This is an empirical component model of an existing ABL ERP system, so no Use Cases or Requirements or Sequence diagrams or any of that abstract side of things. Also, basically no diagrams yet. Just the database, code components, menu system, and the links among them and tags to describe things.
The database (OpenEdge, of course) is just under 250MB and there are over 27K objects, 100K attributes, 37K object tags, 70K connectors, and 410K connector tags (many giving properties to the links between program components and data.
-
... small, medium, or large...
What answer do you expect? A value in kilogram?
Edit: What is "cold" for an Inuit? What is "warm" for a bushman?
-
Perhaps take a moment to import the .Net and Java libraries into a project; use different root packages.
Build a small application using one or both of the above, referring to types defined in the libraries.
EA should handle this fine, and should let you engineer the necessary code in both directions.
How does the result compare with what you're looking at? Is the performance OK?
As Thomas Killian points out, there are no objective criteria in your question. You'll have to be the judge.
David
-
I wasn't expecting an absolute answer any more than I would expect an absolute measure of "big" for ABL ERP systems. But, empirically, I know that these ERP databases used to be "big" if they were multiple GB, but today they are very often in excess of 100GB, with a fair number in excess of .5TB. Similarly, I know a number of the code bases for these installations which are 2-4 Mloc of ABL code. One typically thinks in terms of one line of ABL being equivalent to 4-10 lines of C or Java, so that is upwards of perhaps 20-40 Mloc equivalent. I have heard of one house that is approaching 10Mloc of ABL.
So, empirically, that gives me a baseline for "big" in the context of these ERP systems. It is just empirical observation of what is out there in the world. And, as noted for the databases, something that is changing with time as disk gets cheaper and people keep more historical data.
I was merely curious what sort of numbers were typical for EA and what database they were using to support it.
-
That makes sense.
It also means that the experiment I mentioned would be of very limited value, since you are looking at the amount of stored code as well as the aggregate record count.
David
-
Indeed ... we aren't pulling the actual code into the model yet, but clearly a couple of Mloc of code is going to add a little heft to the DB without actually changing its complexity much.
-
Actually, now that I've thought it over a bit, I doubt you'll have much trouble.
Since you're talking about source code, you won't likely be searching and sorting the stuff. You'll mostly be 'holding' it between forward and reverse engineering and transformations.
Most of the code will likely be stored in BLOB form. This won't have much downside effect on aggregate record count, it just increases file size.
If you get into multi-GB files, then the Jet engine will peter out fairly quickly. However, the various other back ends should keep up with you all the way.
Perhaps you can use an OpenEdge repository; that would certainly carry the freight!
David
-
I'm already using an OpenEdge DB as a repository ... in fact, I am populating the model using ABL. I have just gotten working a tool which reads:
1) The schema directly from the target OE database;
2) A "bill of materials" produced from Analyst, a tool from Joanju software, which provides the call structure of the code including internal procedures, functions, superprocedures, etc. along with all links between program units and data tables and fields with access mode and field lists;
3) The menu tables from the customer data base (site specific);
4) Program and module descriptions from the customer data base (site specific); and
5) A text file of additional program descriptions (site specific.
It builds a very complete data model and a quite thorough component model with the menu structure at the top coming down to what we call a Functional Unit, i.e., a container for all code accessed from a particular menu selection, through to the compile units, the program and include files which go into those compile units, and the internal procedures and functions inside the programs and include files.
Seems like a nice running head start on analyzing the legacy system. I'll be publishing all this shortly and will post here when the info is up.
Meanwhile, I am still curious about how big a really big model is.
-
Well done!
I know you've been at this a while. It sounds like it is really coming along.
I'm also pleased that you have made such headway on the legacy code breakdown. I feared that would prove a real stumbling block.
David
-
I am very pleased with how it has come together thus far. I think we really have a pretty good model for analyzing a lot about these large legacy systems. I am optimistic about being able to implement roundtripping, which has already been done for OO ABL, and I am pretty sure that we can do some concrete simple transformations at this level. Of course, the holy grail is a transformation to a more abstract representation followed by MDA to a new architecture, but that will be a *little* bit harder! :)
-
;)
I have a database with the exact location of every electron in the current universe. It's about 10^213.68 terrabytes.
The data model is 36k.
;)
Next question.
b.
-
Sure thing bruce,
Does it store the spin states? You could probably do it with only one electron per electron state.
The big question is not how big the data model is, but how well it performs for things like forward and reverse engineering. For example, can we change the repository and regenerate the universe the new way? Does it handle deletion of obsolete items gracefully?
I'm still waiting for the changes they promised for the upcoming Reality build 2...
-
It's about 10^213.68 terrabytes.
Using JET, no doubt.
-
With a zip utility.
-
terrabytes
no doubt. the appropriate measure. Or maybe Sunbites (sic!) would would be better?
-
Using JET, no doubt.
Thomas,
Don't knock JET - the biggest Access DB in the world is surprisingly big... I can't recall the details but it was impressive!
Paolo
-
Merely big is easy. Performance, stability, integrity ... count for a lot.
-
JET 4.0 has a 2GB limit. That is per .mdb file, but you can link files to exceed that limit.
-
:)
Now I'm working on getting up the "position and velocity" database.....
I'll be back a bit later.
bruce
-
:)
Now I'm working on getting up the "position and velocity" database.....
I'll be back a bit later.
bruce
You're working too hard on this bruce,
Just get them all stored in your database. Now you can answer the questions easily for any given electron:
Position = here.
Velocity = 0.
You'll need some kind of identifier though, so users can specify the electron they want the above answers for. Perhaps something along the lines of a new UUEID (Universally Unique Electron IDentifier).
David
[Of course now you run the risk of confusion with really big EA models, where the new schema will have to accommodate a UUEID (Universally Unique Element IDentifier) field. Perhaps for UML we will have to accept something that's just globally unique - the GUEID - along with the corollary that our designs will only work on Earth, while the rest of the universe is on its own.]
[So there you have it. A big model should be able to accommodate everything on Earth correctly, but is likely to be weak at handling the entire universe. But once bruce gets the universe into his database, we don't really need to store it in EA.]
[Problem solved!]