Book a Demo

Author Topic: UTF-8 Encoding for JSON generation with Schema Composer  (Read 8592 times)

MrSnow

  • EA User
  • **
  • Posts: 20
  • Karma: +1/-0
    • View Profile
UTF-8 Encoding for JSON generation with Schema Composer
« on: January 21, 2019, 11:09:58 pm »
Hi,

I have been searching for a while now how to enable UTF-8 encoding in EA when generating JSON-files with the Schema Composer. I need to have a Norwegian description for several elements and the Norwegian language contains the special characters "æ, ø, å". When opening the generated JSON-file all the special characters are presented in their HTML Entity-form (&#2XX).

Is there anything I can do to be able to use special characters in the description of elements? How is UTF-8 enabled/disabled in EA and is there a specific type of encoding EA uses as default?

My questions may be a bit off as I'm still very new to EA - please let me know if I'm funamentally misunderstanding something or misusing any terms! :-)

/Snow

Geert Bellekens

  • EA Guru
  • *****
  • Posts: 13489
  • Karma: +572/-33
  • Make EA work for YOU!
    • View Profile
    • Enterprise Architect Consultant and Value Added Reseller
Re: UTF-8 Encoding for JSON generation with Schema Composer
« Reply #1 on: January 21, 2019, 11:47:21 pm »
Are you using an .eapx file? (Jet 4)?

Models in the .eap format (Jet 3.5) are not fully Unicode compatible, so that might be an issue.

Geert

Eve

  • EA Administrator
  • EA Guru
  • *****
  • Posts: 8097
  • Karma: +118/-20
    • View Profile
Re: UTF-8 Encoding for JSON generation with Schema Composer
« Reply #2 on: January 22, 2019, 09:09:24 am »
I suspect that this isn't directly related to the schema composer (or database format.)

Internally, EA encodes characters in the extended ascii range for the system code page by escaping them. It would appear that the schema composer isn't restoring the original character.

I'm not actually sure what code page the schema composer writes files as or what options impact that.

MrSnow

  • EA User
  • **
  • Posts: 20
  • Karma: +1/-0
    • View Profile
Re: UTF-8 Encoding for JSON generation with Schema Composer
« Reply #3 on: January 23, 2019, 12:01:26 am »
Are you using an .eapx file? (Jet 4)?

Models in the .eap format (Jet 3.5) are not fully Unicode compatible, so that might be an issue.

Geert

Yes, I found that Jet 4 could be a solution in some cases, but in my case the Schema Composer still generates JSON's that only show the HTML entities of special characters. I did notice that simply activating the use of Jet 4 didn't change the project suffix to .eapx automatically so I tried exporting and importing all packages to a new .eapx-project, but this didn't help either.

Thanks anyway!

bholtzman

  • EA User
  • **
  • Posts: 93
  • Karma: +2/-0
    • View Profile
Re: UTF-8 Encoding for JSON generation with Schema Composer
« Reply #4 on: March 14, 2020, 11:14:23 am »
I suspect that this isn't directly related to the schema composer (or database format.)

Internally, EA encodes characters in the extended ascii range for the system code page by escaping them. It would appear that the schema composer isn't restoring the original character.

I'm not actually sure what code page the schema composer writes files as or what options impact that.

Eve,
Since "EA encodes characters in the extended ascii range for the system code page by escaping them" as you say, then how would I go about recognizing them in the database? We want to publish a model but we want to first make sure that no UTF-8 characters got into it first. Thanks!

Bill

Eve

  • EA Administrator
  • EA Guru
  • *****
  • Posts: 8097
  • Karma: +118/-20
    • View Profile
Re: UTF-8 Encoding for JSON generation with Schema Composer
« Reply #5 on: March 16, 2020, 08:51:01 am »
If you're searching the notes field of the database the pattern is &#nnnn;

bholtzman

  • EA User
  • **
  • Posts: 93
  • Karma: +2/-0
    • View Profile
Re: UTF-8 Encoding for JSON generation with Schema Composer
« Reply #6 on: March 20, 2020, 03:26:50 am »
Are you saying I can find non-ASCII characters with this query in the SQL Scratch Pad?
select * from t_object where instr(note, '#nnnn;') > 0

Eve

  • EA Administrator
  • EA Guru
  • *****
  • Posts: 8097
  • Karma: +118/-20
    • View Profile
Re: UTF-8 Encoding for JSON generation with Schema Composer
« Reply #7 on: March 20, 2020, 08:08:00 am »
No, you'll need to use a like statement where  'n' is a wildcard.

bholtzman

  • EA User
  • **
  • Posts: 93
  • Karma: +2/-0
    • View Profile
Re: UTF-8 Encoding for JSON generation with Schema Composer
« Reply #8 on: March 20, 2020, 11:34:59 pm »
Ok but this debate is about the JSON export from Schema Composer right? May issue is slightly different. If I paste a non-ASCII character into the Notes field of an object in EA, is it possible to detect that using an EA query? What I've done is:
- added a small Greek alpha character to the Notes field in an EA object
- open the EAP in MS Access
- add a new table containing all ASCII characters (except for those that can't be represented)
- build VB code to loop through every t_object.note field and every t_attribute.notes field
- loop through the text in each of these fields character by character
- verify that each character matches a character in the ASCII table
- identify any characters that do not have a match
- output the object or attribute name of any match fails and the position of the non-matching characters in the string

Oddly, when I tried to use the Asc() function in Access, when it hit my alpha it decided it was a small "a", as in ASCII character 97.

Any thoughts? Our goal is to publish a model that does not have any non-ASCII characters.

Bill

bholtzman

  • EA User
  • **
  • Posts: 93
  • Karma: +2/-0
    • View Profile
Re: UTF-8 Encoding for JSON generation with Schema Composer
« Reply #9 on: March 20, 2020, 11:56:12 pm »
Sorry but here's one other note. I found some other characters that did not match anything in my ASCII table and seemed to be non-ASCII. When I used the Asc() function on them, the result was ASCII 9, which is a backspace. That doesn't make sense right? Does the Asc() function somehow pick the closest possible?! :-)