Author Topic: Breaking long element names  (Read 8046 times)

qwerty

  • EA Guru
  • *****
  • Posts: 9665
  • Karma: +176/-150
  • I'm no guru at all
    • View Profile
Re: Breaking long element names
« Reply #15 on: October 18, 2016, 11:13:36 am »
Probably no English speaker wants to talk to the captain of a Danube steamship ;D

German allows to assemble substantives almost at will. The captain is build from five different ones and you could easily add more.

q.

Glassboy

  • EA Practitioner
  • ***
  • Posts: 1108
  • Karma: +77/-72
    • View Profile
Re: Breaking long element names
« Reply #16 on: October 18, 2016, 11:24:42 am »
German allows to assemble substantives almost at will. The captain is build from five different ones and you could easily add more.

English allows compound words but there's a subconscious bias towards not mixing Greek, Latin, French, or old English (Germanic) derived words together.

Paolo F Cantoni

  • EA Guru
  • *****
  • Posts: 6248
  • Karma: +104/-89
  • Inconsistently correct systems DON'T EXIST!
    • View Profile
Re: Breaking long element names
« Reply #17 on: October 18, 2016, 11:28:10 am »
Probably no English speaker wants to talk to the captain of a Danube steamship ;D

German allows to assemble substantives almost at will. The captain is build from five different ones and you could easily add more.

q.
Indeed!  ;D

It seems to me however, that since (for English) the Unspaced Cased form is an artefact where even naturally lowercased words are "corrupted" for purposes of tokenising (ThisMakeSenseToMe), the same could be said of Germanic languages - Donaudampfschiffahrtsgesellschaftskapitän name would become the DonauDampfschiffahrtsGesellschaftsKapitän token.

I am assuming that such Germanic languages would wish to break tokens at sustantive boundaries.

Paolo
Inconsistently correct systems DON'T EXIST!
... Therefore, aim for consistency; in the expectation of achieving correctness....
-Semantica-
Helsinki Principle Rules!

Paolo F Cantoni

  • EA Guru
  • *****
  • Posts: 6248
  • Karma: +104/-89
  • Inconsistently correct systems DON'T EXIST!
    • View Profile
Re: Breaking long element names
« Reply #18 on: October 18, 2016, 11:29:29 am »
German allows to assemble substantives almost at will. The captain is build from five different ones and you could easily add more.

English allows compound words but there's a subconscious bias towards not mixing Greek, Latin, French, or old English (Germanic) derived words together.
Television being "an exception that proves the rule..."  ;)

Paolo
Inconsistently correct systems DON'T EXIST!
... Therefore, aim for consistency; in the expectation of achieving correctness....
-Semantica-
Helsinki Principle Rules!

RoyC

  • EA Administrator
  • EA Practitioner
  • *****
  • Posts: 1187
  • Karma: +10/-3
  • Read The Help!
    • View Profile
Re: Breaking long element names
« Reply #19 on: October 18, 2016, 02:26:19 pm »
Quote
German allows to assemble substantives almost at will. The captain is build from five different ones and you could easily add more.
In my youth, I visited Hamburg with a group of friends. We wanted to inspect and possibly rent a couple of rooms in a small hotel, but the hotelier took a dislike to us for any number of reasons - we were a group of boys and girls, some of us were English, some American, most of us spoke bad German or no German at all, some of us were not especially keen on the hotel in the first place... - and he thought he could get away with insulting us in German. However, one of our party was a Bavarian, who did not make his presence felt until the Hamburg Hotelier was well under way. The rest of us then got a fascinating lesson in 'assembling substantives' as the two of them swapped insults, taking deeper and deeper breaths in order to shout an ever increasing string of words at each other.

We didn't get to look at the rooms.
Best Regards, Roy

Glassboy

  • EA Practitioner
  • ***
  • Posts: 1108
  • Karma: +77/-72
    • View Profile
Re: Breaking long element names
« Reply #20 on: October 18, 2016, 04:56:22 pm »
In my youth, I visited Hamburg with a group of friends. We wanted to inspect and possibly rent a couple of rooms in a small hotel, but the hotelier took a dislike to us for any number of reasons - we were a group of boys and girls, some of us were English, some American, most of us spoke bad German or no German at all, some of us were not especially keen on the hotel in the first place... - and he thought he could get away with insulting us in German. However, one of our party was a Bavarian, who did not make his presence felt until the Hamburg Hotelier was well under way. The rest of us then got a fascinating lesson in 'assembling substantives' as the two of them swapped insults, taking deeper and deeper breaths in order to shout an ever increasing string of words at each other.

Most Germans I've met tell you that Bavarians aren't Germans.  All the Bavarians I've met have been friendly chaps; good to have a beer with.

qwerty

  • EA Guru
  • *****
  • Posts: 9665
  • Karma: +176/-150
  • I'm no guru at all
    • View Profile
Re: Breaking long element names
« Reply #21 on: October 18, 2016, 07:59:55 pm »
Many (most?) Bavarians tell they are Bavarians, not necessarily Germans and they want to be independent (Bavexit; at least since they have to pay more than they formerly got from the state union). I guess the Bavarian won in the insult contest. They can get down to a grunt of vowels you can smell it's something you don't want to know in detail.

q.

skiwi

  • EA Practitioner
  • ***
  • Posts: 1758
  • Karma: +36/-53
    • View Profile
Re: Breaking long element names
« Reply #22 on: March 28, 2017, 07:04:54 am »
It seems to me that when the "algorithmic approach" breaks down when it comes to breaking long tokens (which are found in IT) there are two further approaches
1) simply break it at an arbitrary point, e.g. halfway, to enable the string to wrap
2) allow the (human) author to provide guidance (e.g. a hard return) that is ignored (or held separately) on where to split the token.
Orthogonality rules
Using EA14.0 (1422) on Windows 10 Enterprise/64 bit. Repositories in SQLServer2014 R2 & Access2003/JET4.0

qwerty

  • EA Guru
  • *****
  • Posts: 9665
  • Karma: +176/-150
  • I'm no guru at all
    • View Profile
Re: Breaking long element names
« Reply #23 on: March 28, 2017, 08:03:55 am »
I'd prefer 2 and would not like to see 1. Arbitrary breaks could lead to awkward names. E.g if you split the German word Urinstinkt - meaning basic instinct - like Urin stinkt it would mean pee stinks. Not a long word, but a good example for "breaking bad".

q.

Simon M

  • EA Administrator
  • EA Guru
  • *****
  • Posts: 6445
  • Karma: +55/-6
    • View Profile
Re: Breaking long element names
« Reply #24 on: March 28, 2017, 08:47:42 am »
The algorithmic approach already provides for option 2.

Insert a zero width space into the text.
« Last Edit: March 28, 2017, 08:51:56 am by Simon M »
Simon

support@sparxsystems.com

skiwi

  • EA Practitioner
  • ***
  • Posts: 1758
  • Karma: +36/-53
    • View Profile
Re: Breaking long element names
« Reply #25 on: March 28, 2017, 10:03:59 am »
The algorithmic approach already provides for option 2.

Insert a zero width space into the text.
Great, thanks
Now if I could do this easily from within the EA UI ...
or find it documented in the EA user guide
Orthogonality rules
Using EA14.0 (1422) on Windows 10 Enterprise/64 bit. Repositories in SQLServer2014 R2 & Access2003/JET4.0

Paolo F Cantoni

  • EA Guru
  • *****
  • Posts: 6248
  • Karma: +104/-89
  • Inconsistently correct systems DON'T EXIST!
    • View Profile
Re: Breaking long element names
« Reply #26 on: March 28, 2017, 10:39:03 am »
The algorithmic approach already provides for option 2.

Insert a zero width space into the text.
Simon,

What does this do to searches?  Particularly the Project Search?

Paolo
Inconsistently correct systems DON'T EXIST!
... Therefore, aim for consistency; in the expectation of achieving correctness....
-Semantica-
Helsinki Principle Rules!

Simon M

  • EA Administrator
  • EA Guru
  • *****
  • Posts: 6445
  • Karma: +55/-6
    • View Profile
Re: Breaking long element names
« Reply #27 on: March 28, 2017, 10:58:57 am »
skiwi,

Now if I could do this easily from within the EA UI ...
You want us to re-invent a unicode character entry system that should exist as part of your OS?

or find it documented in the EA user guide
And document the unicode standard within our user guide?

What does this do to searches?  Particularly the Project Search?
It's an extra character where you've used it. You would have to include that in your search term if you wanted to find elements using it.
Simon

support@sparxsystems.com

Paolo F Cantoni

  • EA Guru
  • *****
  • Posts: 6248
  • Karma: +104/-89
  • Inconsistently correct systems DON'T EXIST!
    • View Profile
Re: Breaking long element names
« Reply #28 on: March 28, 2017, 11:22:55 am »
[SNIP]
What does this do to searches?  Particularly the Project Search?
It's an extra character where you've used it. You would have to include that in your search term if you wanted to find elements using it.
Hence why it won't work...  I think I tested this, years ago.

What's the odds of an option to remove the zero width space (ZWS) for search purposes?

Obviously, in an enterprise environment, one user can't "see" where another user has placed the ZWS.

So, ultimately the solution - isn't.

Paolo
Inconsistently correct systems DON'T EXIST!
... Therefore, aim for consistency; in the expectation of achieving correctness....
-Semantica-
Helsinki Principle Rules!

Simon M

  • EA Administrator
  • EA Guru
  • *****
  • Posts: 6445
  • Karma: +55/-6
    • View Profile
Re: Breaking long element names
« Reply #29 on: March 29, 2017, 02:50:04 pm »
Okay, what I should have said was that search depends on the collation settings for the database being used. You can already see this effect where searching will be case-sensitive or case-insensitive depending on the database collation.

I couldn't find evidence that any databases allow customization of the collation (or a collation that ignores zero width space,) but it's possible. Maybe there's another character in the unicode standard that is considered a possible line break point and is ignored in a standard collation type.

I don't think it's a perfect solution. But as far as I can see, it's the only solution offered by the unicode standard.

PS. I don't think an option to ignore that character in the search could work. It would require either doing the replace in SQL to compare with the search term. On any non-trivial database you'll be in a lot of trouble if you even try that.
Simon

support@sparxsystems.com