Author Topic: Decode BinContent in t_document  (Read 14070 times)

Viking

  • EA User
  • **
  • Posts: 422
  • Karma: +2/-2
    • View Profile
Decode BinContent in t_document
« on: October 09, 2020, 09:46:10 pm »
Hi,
I am trying to read BinContent in t_document. I found this very useful answer from @qwerty: “It's a base64 encoded zip with a single file str.dat as contents.” in https://www.sparxsystems.com/forums/smf/index.php/topic,42555.msg252958.html#msg252958

I tried the code below. But FromBase64String throws the exception, that “The input is not a valid Base-64 string as it contains a non-base 64 character, more than two padding characters, or an illegal character among the padding characters.”

Could somebody tell me what’s wrong with the code, please? V.

Code: [Select]
String query = "SELECT BinContent from t_document where ElementID = 'TECHNOLOGY'";
List<Document> docs = XMLService.GetXMLasObjects<Document>(Sparx.Repository.SQLQuery(query));
foreach (Document aDoc in docs)
{
  Byte[] aBinContent = aDoc.BinContent;
  // Unzip
  using (var memoryStream = new MemoryStream(aBinContent))
  {
    using (var archive = new ZipArchive(memoryStream, ZipArchiveMode.Read))
    {
      foreach (var entry in archive.Entries)
      {
        using (var stream = entry.Open())
        using (var reader = new StreamReader(stream))
        {
          String unzippedData = reader.ReadToEnd();
        }
      }
    }
  }
  // Decode base64
  byte[] zipBytes = Convert.FromBase64String(unzippedData);
}

qwerty

  • EA Guru
  • *****
  • Posts: 13584
  • Karma: +396/-301
  • I'm no guru at all
    • View Profile
Re: Decode BinContent in t_document
« Reply #1 on: October 09, 2020, 09:58:15 pm »
What does the bincontent result actually show? First and last chars would be enough to see.

q.

Geert Bellekens

  • EA Guru
  • *****
  • Posts: 13251
  • Karma: +554/-33
  • Make EA work for YOU!
    • View Profile
    • Enterprise Architect Consultant and Value Added Reseller
Re: Decode BinContent in t_document
« Reply #2 on: October 09, 2020, 10:03:12 pm »
Yeah, I don't think it's the same base64 format used in other places.
That looks more like

Code: [Select]
Qk02AwAAAAAAADYAAAAoAAAAEAAAABAAAAABABgAAAAAAAADAADEDgAAxA4AAAAAAAAAAAAA
wMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDA
wMDAwMDAQEBAQEBAQEBAQEBAQEBAwMDATn6mwMDATn6mwMDAwMDAwMDAwMDAwMDApqamQEBA

The bincontent is some kind of hex format I think, it always starts with "0x"

Code: [Select]
0x504B0304140000000800AA713C47B1E305CA66D00000464...
I don't think I've already seen a solution on how to decode that format.

Geert

Geert Bellekens

  • EA Guru
  • *****
  • Posts: 13251
  • Karma: +554/-33
  • Make EA work for YOU!
    • View Profile
    • Enterprise Architect Consultant and Value Added Reseller
Re: Decode BinContent in t_document
« Reply #3 on: October 09, 2020, 10:13:15 pm »
I got curious, and it's indeed a different encoding.
I was able to decode a bincontent string (of a small png image) into a file using https://tomeko.net/online_tools/hex_to_file.php?lang=en
The downloaded file was again the classifical .zip file with str.dat as content.

I'm not sure how this website does the conversion, but it seems rather trivial.

Geert

Viking

  • EA User
  • **
  • Posts: 422
  • Karma: +2/-2
    • View Profile
Re: Decode BinContent in t_document
« Reply #4 on: October 09, 2020, 11:08:37 pm »
@qwerty, @Geert: I do not get the content mentioned in the thread. Mine starts with "<\0?\0x\0m\0l\0 \0v\0e\0r\0s\0i ...". That does not really look correct :-(

Normally I would prefere using Repository.Execute(). But it does not support reading data.

Viking

  • EA User
  • **
  • Posts: 422
  • Karma: +2/-2
    • View Profile
Re: Decode BinContent in t_document
« Reply #5 on: October 09, 2020, 11:18:24 pm »
I got curious, and it's indeed a different encoding.
I was able to decode a bincontent string (of a small png image) into a file using https://tomeko.net/online_tools/hex_to_file.php?lang=en
The downloaded file was again the classifical .zip file with str.dat as content.
I'm not sure how this website does the conversion, but it seems rather trivial.
Geert

Could you share the code to get the HEX, please? I am not sure, if the sequence I implemented is correct.

Geert Bellekens

  • EA Guru
  • *****
  • Posts: 13251
  • Karma: +554/-33
  • Make EA work for YOU!
    • View Profile
    • Enterprise Architect Consultant and Value Added Reseller
Re: Decode BinContent in t_document
« Reply #6 on: October 09, 2020, 11:49:38 pm »
I didn't use code to get the content, just copy paste from SQL server management studio

Geert

qwerty

  • EA Guru
  • *****
  • Posts: 13584
  • Karma: +396/-301
  • I'm no guru at all
    • View Profile
Re: Decode BinContent in t_document
« Reply #7 on: October 10, 2020, 02:06:40 am »
@qwerty, @Geert: I do not get the content mentioned in the thread. Mine starts with "<\0?\0x\0m\0l\0 \0v\0e\0r\0s\0i ...". That does not really look correct :-(

Normally I would prefere using Repository.Execute(). But it does not support reading data.
That's some UTF8 text (...Vers...)

q.

qwerty

  • EA Guru
  • *****
  • Posts: 13584
  • Karma: +396/-301
  • I'm no guru at all
    • View Profile
Re: Decode BinContent in t_document
« Reply #8 on: October 10, 2020, 02:09:59 am »
Looks like TECHNOLOGY entries are plain UTF8 (or was it 16?) strings.

q.

P.S. Could you help me with creating a TECHNOLOGY entry in t_document?

Viking

  • EA User
  • **
  • Posts: 422
  • Karma: +2/-2
    • View Profile
Re: Decode BinContent in t_document
« Reply #9 on: October 12, 2020, 04:00:45 am »
I am afraid that Repository.SQLQuery does some conversion. If I directly access the database I see the expected data.

qwerty

  • EA Guru
  • *****
  • Posts: 13584
  • Karma: +396/-301
  • I'm no guru at all
    • View Profile
Re: Decode BinContent in t_document
« Reply #10 on: October 12, 2020, 06:31:27 am »
Well, SQLQuery converts binary data to base64 as it creates an XML result  - and binary in an XML is not a good idea. So yes, a direct access will return the blob as it is.

q.

Ian Mitchell

  • EA User
  • **
  • Posts: 506
  • Karma: +22/-4
  • The eaDocX and Model Expert guy
    • View Profile
Re: Decode BinContent in t_document
« Reply #11 on: February 22, 2024, 02:25:45 am »
I have tried to make this work, but when I try to open the ZipArchiveEntry for the str.dat file, I get an error: "A local file header is corrupt".
This may sugest that Sparx have done something else on top of the (ZIP + base64 encoding) which now seems to be normal for potentially large data fields like this.

The data I  am trying to extract is a simple test case:  the translation data for a single element, where I added the translation to the 'notes' translation tab via the UI.

Has anyone else had success in extracting this data?
@Sparx - are there any instructions anywhere which say how to acess this data? Maybe an API call?
Ian Mitchell, Designer, eaDocX


www.eaDocX.com
www.theartfulmodeller.com

Ian Mitchell

  • EA User
  • **
  • Posts: 506
  • Karma: +22/-4
  • The eaDocX and Model Expert guy
    • View Profile
Re: Decode BinContent in t_document
« Reply #12 on: February 22, 2024, 02:32:09 am »
If anyone needs a test case, the data from EA was:

U1MDBBQAAAAIAH5gVVjVk/2cDgEAALICAAAHABEAc3RyLmRhdFVUDQAHk+bV
ZZPm1WWT5tVlfVJda8JAEJznQv/D4XOp+p4qRQUfiv8hNiG1TU7IJSL99c7s
lhbR60PuY2Zud7K7BZY4o0OLgBNq9Eg44IiIF0wwxzNm3APRASXRimtrfG2K
yLP4JRZ4xAMK6npTJupK3jxaIp/jgt0iGozcm9/IrUWOxDpib9S+44unynj3
JTcHrinDy99A7JafX0Vf4YOr+EQX3d0Mec1flv80nknIdVxHFljzPpD55v5p
dam5ynHLTy9UJeURFqlSzXQKdg6WWxF29pfqple2YpYn4sLUSzkcDS/5sjdU
/ZUnzYHeBGzwyv4HbH+i+nxIJz9SKPue32i416GxDHIpD8lcuYcO3vOGUQtM
M9OgOclxmqILUEsBAhcLFAAAAAgAfmBVWNWT/ZwOAQAAsgIAAAcACQAAAAAA
AAAAAACAAAAAAHN0ci5kYXRVVAUAB5Pm1WVQSwUGAAAAAAEAAQA+AAAARAEA
AAAA

which was extracted using a simple bit of SQL to read the BinContent column of t_document.
The text should decode to something like:

Dit zijn enkele aantekeningen in het Nederlands, de secundaire taal van EA. Het vertalen gebeurt nog steeds handmatig.
(Apologies to native Dutch speakers: this is aGoogle translation.)
Ian Mitchell, Designer, eaDocX


www.eaDocX.com
www.theartfulmodeller.com

Geert Bellekens

  • EA Guru
  • *****
  • Posts: 13251
  • Karma: +554/-33
  • Make EA work for YOU!
    • View Profile
    • Enterprise Architect Consultant and Value Added Reseller
Re: Decode BinContent in t_document
« Reply #13 on: February 22, 2024, 03:03:26 am »
Ian,

I'm pretty sure I saw some methods to get translations using the API
...
EA.Element.GetTXNote (string Code, long Flag) and the likes.

Geert

PS. the Dutch is pretty much OK, no apologies neede :)

johann

  • EA User
  • **
  • Posts: 26
  • Karma: +1/-0
    • View Profile
Re: Decode BinContent in t_document
« Reply #14 on: February 22, 2024, 07:25:09 pm »
I hope this doesn't disclose a business secret:

Decode the BASE64 and replace the first two bytes (letters "SS") with the letters "PK" - now you have a valid PKZip archive format.

Extract the (single) entry "str.dat":

Code: [Select]
<?xml version="1.0" standalone="no" ?>
<translations><translation language="nl" nameLocked="0" aliasLocked="0" notesLocked="1" nameChecksum="0" aliasChecksum="0" notesChecksum="0" name="" alias="">Dit zijn enkele aantekeningen in het Nederlands, de secundaire taal van EA. Het vertalen gebeurt nog steeds handmatig.</translation></translations>

We sometimes also extract stuff from the t_documents table, but we have not seen this kind of obfuscation(?) before.

Why "SS"?
Maybe it's short for "Sparx Systems" - you have a "Sparx Systems" ZIP format!