Sparx Systems Forum

Enterprise Architect => Automation Interface, Add-Ins and Tools => Topic started by: Viking on October 09, 2020, 09:46:10 pm

Title: Decode BinContent in t_document
Post by: Viking on October 09, 2020, 09:46:10 pm
Hi,
I am trying to read BinContent in t_document. I found this very useful answer from @qwerty: “It's a base64 encoded zip with a single file str.dat as contents.” in https://www.sparxsystems.com/forums/smf/index.php/topic,42555.msg252958.html#msg252958

I tried the code below. But FromBase64String throws the exception, that “The input is not a valid Base-64 string as it contains a non-base 64 character, more than two padding characters, or an illegal character among the padding characters.”

Could somebody tell me what’s wrong with the code, please? V.

Code: [Select]
String query = "SELECT BinContent from t_document where ElementID = 'TECHNOLOGY'";
List<Document> docs = XMLService.GetXMLasObjects<Document>(Sparx.Repository.SQLQuery(query));
foreach (Document aDoc in docs)
{
  Byte[] aBinContent = aDoc.BinContent;
  // Unzip
  using (var memoryStream = new MemoryStream(aBinContent))
  {
    using (var archive = new ZipArchive(memoryStream, ZipArchiveMode.Read))
    {
      foreach (var entry in archive.Entries)
      {
        using (var stream = entry.Open())
        using (var reader = new StreamReader(stream))
        {
          String unzippedData = reader.ReadToEnd();
        }
      }
    }
  }
  // Decode base64
  byte[] zipBytes = Convert.FromBase64String(unzippedData);
}
Title: Re: Decode BinContent in t_document
Post by: qwerty on October 09, 2020, 09:58:15 pm
What does the bincontent result actually show? First and last chars would be enough to see.

q.
Title: Re: Decode BinContent in t_document
Post by: Geert Bellekens on October 09, 2020, 10:03:12 pm
Yeah, I don't think it's the same base64 format used in other places.
That looks more like

Code: [Select]
Qk02AwAAAAAAADYAAAAoAAAAEAAAABAAAAABABgAAAAAAAADAADEDgAAxA4AAAAAAAAAAAAA
wMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDA
wMDAwMDAQEBAQEBAQEBAQEBAQEBAwMDATn6mwMDATn6mwMDAwMDAwMDAwMDAwMDApqamQEBA

The bincontent is some kind of hex format I think, it always starts with "0x"

Code: [Select]
0x504B0304140000000800AA713C47B1E305CA66D00000464...
I don't think I've already seen a solution on how to decode that format.

Geert
Title: Re: Decode BinContent in t_document
Post by: Geert Bellekens on October 09, 2020, 10:13:15 pm
I got curious, and it's indeed a different encoding.
I was able to decode a bincontent string (of a small png image) into a file using https://tomeko.net/online_tools/hex_to_file.php?lang=en (https://tomeko.net/online_tools/hex_to_file.php?lang=en)
The downloaded file was again the classifical .zip file with str.dat as content.

I'm not sure how this website does the conversion, but it seems rather trivial.

Geert
Title: Re: Decode BinContent in t_document
Post by: Viking on October 09, 2020, 11:08:37 pm
@qwerty, @Geert: I do not get the content mentioned in the thread. Mine starts with "<\0?\0x\0m\0l\0 \0v\0e\0r\0s\0i ...". That does not really look correct :-(

Normally I would prefere using Repository.Execute(). But it does not support reading data.
Title: Re: Decode BinContent in t_document
Post by: Viking on October 09, 2020, 11:18:24 pm
I got curious, and it's indeed a different encoding.
I was able to decode a bincontent string (of a small png image) into a file using https://tomeko.net/online_tools/hex_to_file.php?lang=en (https://tomeko.net/online_tools/hex_to_file.php?lang=en)
The downloaded file was again the classifical .zip file with str.dat as content.
I'm not sure how this website does the conversion, but it seems rather trivial.
Geert

Could you share the code to get the HEX, please? I am not sure, if the sequence I implemented is correct.
Title: Re: Decode BinContent in t_document
Post by: Geert Bellekens on October 09, 2020, 11:49:38 pm
I didn't use code to get the content, just copy paste from SQL server management studio

Geert
Title: Re: Decode BinContent in t_document
Post by: qwerty on October 10, 2020, 02:06:40 am
@qwerty, @Geert: I do not get the content mentioned in the thread. Mine starts with "<\0?\0x\0m\0l\0 \0v\0e\0r\0s\0i ...". That does not really look correct :-(

Normally I would prefere using Repository.Execute(). But it does not support reading data.
That's some UTF8 text (...Vers...)

q.
Title: Re: Decode BinContent in t_document
Post by: qwerty on October 10, 2020, 02:09:59 am
Looks like TECHNOLOGY entries are plain UTF8 (or was it 16?) strings.

q.

P.S. Could you help me with creating a TECHNOLOGY entry in t_document?
Title: Re: Decode BinContent in t_document
Post by: Viking on October 12, 2020, 04:00:45 am
I am afraid that Repository.SQLQuery does some conversion. If I directly access the database I see the expected data.
Title: Re: Decode BinContent in t_document
Post by: qwerty on October 12, 2020, 06:31:27 am
Well, SQLQuery converts binary data to base64 as it creates an XML result  - and binary in an XML is not a good idea. So yes, a direct access will return the blob as it is.

q.
Title: Re: Decode BinContent in t_document
Post by: Ian Mitchell on February 22, 2024, 02:25:45 am
I have tried to make this work, but when I try to open the ZipArchiveEntry for the str.dat file, I get an error: "A local file header is corrupt".
This may sugest that Sparx have done something else on top of the (ZIP + base64 encoding) which now seems to be normal for potentially large data fields like this.

The data I  am trying to extract is a simple test case:  the translation data for a single element, where I added the translation to the 'notes' translation tab via the UI.

Has anyone else had success in extracting this data?
@Sparx - are there any instructions anywhere which say how to acess this data? Maybe an API call?
Title: Re: Decode BinContent in t_document
Post by: Ian Mitchell on February 22, 2024, 02:32:09 am
If anyone needs a test case, the data from EA was:

U1MDBBQAAAAIAH5gVVjVk/2cDgEAALICAAAHABEAc3RyLmRhdFVUDQAHk+bV
ZZPm1WWT5tVlfVJda8JAEJznQv/D4XOp+p4qRQUfiv8hNiG1TU7IJSL99c7s
lhbR60PuY2Zud7K7BZY4o0OLgBNq9Eg44IiIF0wwxzNm3APRASXRimtrfG2K
yLP4JRZ4xAMK6npTJupK3jxaIp/jgt0iGozcm9/IrUWOxDpib9S+44unynj3
JTcHrinDy99A7JafX0Vf4YOr+EQX3d0Mec1flv80nknIdVxHFljzPpD55v5p
dam5ynHLTy9UJeURFqlSzXQKdg6WWxF29pfqple2YpYn4sLUSzkcDS/5sjdU
/ZUnzYHeBGzwyv4HbH+i+nxIJz9SKPue32i416GxDHIpD8lcuYcO3vOGUQtM
M9OgOclxmqILUEsBAhcLFAAAAAgAfmBVWNWT/ZwOAQAAsgIAAAcACQAAAAAA
AAAAAACAAAAAAHN0ci5kYXRVVAUAB5Pm1WVQSwUGAAAAAAEAAQA+AAAARAEA
AAAA

which was extracted using a simple bit of SQL to read the BinContent column of t_document.
The text should decode to something like:

Dit zijn enkele aantekeningen in het Nederlands, de secundaire taal van EA. Het vertalen gebeurt nog steeds handmatig.
(Apologies to native Dutch speakers: this is aGoogle translation.)
Title: Re: Decode BinContent in t_document
Post by: Geert Bellekens on February 22, 2024, 03:03:26 am
Ian,

I'm pretty sure I saw some methods to get translations using the API
...
EA.Element.GetTXNote (string Code, long Flag) and the likes.

Geert

PS. the Dutch is pretty much OK, no apologies neede :)
Title: Re: Decode BinContent in t_document
Post by: johann on February 22, 2024, 07:25:09 pm
I hope this doesn't disclose a business secret:

Decode the BASE64 and replace the first two bytes (letters "SS") with the letters "PK" - now you have a valid PKZip archive format.

Extract the (single) entry "str.dat":

Code: [Select]
<?xml version="1.0" standalone="no" ?>
<translations><translation language="nl" nameLocked="0" aliasLocked="0" notesLocked="1" nameChecksum="0" aliasChecksum="0" notesChecksum="0" name="" alias="">Dit zijn enkele aantekeningen in het Nederlands, de secundaire taal van EA. Het vertalen gebeurt nog steeds handmatig.</translation></translations>

We sometimes also extract stuff from the t_documents table, but we have not seen this kind of obfuscation(?) before.

Why "SS"?
Maybe it's short for "Sparx Systems" - you have a "Sparx Systems" ZIP format!
Title: Re: Decode BinContent in t_document
Post by: Geert Bellekens on February 22, 2024, 07:28:16 pm
Hmm, nice catch, thanks Johann  :)

Geert
Title: Re: Decode BinContent in t_document
Post by: Paolo F Cantoni on February 22, 2024, 07:55:40 pm
Hmm, nice catch, thanks Johann  :)

Geert
Wot 'e sed!!!!!

Out of interest, how did you figure that out?  Seems a "random" change...

Paolo
Title: Re: Decode BinContent in t_document
Post by: qwerty on February 22, 2024, 09:16:54 pm
Hope it's not derived from Schutz-Staffel :-(

q.
Title: Re: Decode BinContent in t_document
Post by: Ian Mitchell on February 28, 2024, 08:12:08 pm
Pure genius Johann.

I would like to award the 2024 'Sparx Sherlock Holmes Award for Best EA Detective' to Johan!!


Thanks to Geert for seeing the element.getTXNote(...) and  element.setTXNote(..) methods.
I'm not sure which release these appeared in, but they do everything I need.
Nearly everything: the value of the languae code which these methods needs can be found in usys_system in the TranslateSecondary setting.

But Johan is still a genius.
Title: Re: Decode BinContent in t_document
Post by: Elpis on March 07, 2024, 07:18:26 am
[...] Nearly everything: the value of the languae code which these methods [Element.GetTXNote and alike] needs can be found in usys_system in the TranslateSecondary setting.
[...]

There is easier way IMHO to find these language codes. Sparx documentation says (for Element.*TX* methods): "[...] Two-letter language code (found on the 'Translations' page of the 'Manage Model Options' dialog)".