Sparx Systems Forum
Enterprise Architect => Automation Interface, Add-Ins and Tools => Topic started by: Viking on October 09, 2020, 09:46:10 pm
-
Hi,
I am trying to read BinContent in t_document. I found this very useful answer from @qwerty: “It's a base64 encoded zip with a single file str.dat as contents.” in https://www.sparxsystems.com/forums/smf/index.php/topic,42555.msg252958.html#msg252958
I tried the code below. But FromBase64String throws the exception, that “The input is not a valid Base-64 string as it contains a non-base 64 character, more than two padding characters, or an illegal character among the padding characters.”
Could somebody tell me what’s wrong with the code, please? V.
String query = "SELECT BinContent from t_document where ElementID = 'TECHNOLOGY'";
List<Document> docs = XMLService.GetXMLasObjects<Document>(Sparx.Repository.SQLQuery(query));
foreach (Document aDoc in docs)
{
Byte[] aBinContent = aDoc.BinContent;
// Unzip
using (var memoryStream = new MemoryStream(aBinContent))
{
using (var archive = new ZipArchive(memoryStream, ZipArchiveMode.Read))
{
foreach (var entry in archive.Entries)
{
using (var stream = entry.Open())
using (var reader = new StreamReader(stream))
{
String unzippedData = reader.ReadToEnd();
}
}
}
}
// Decode base64
byte[] zipBytes = Convert.FromBase64String(unzippedData);
}
-
What does the bincontent result actually show? First and last chars would be enough to see.
q.
-
Yeah, I don't think it's the same base64 format used in other places.
That looks more like
Qk02AwAAAAAAADYAAAAoAAAAEAAAABAAAAABABgAAAAAAAADAADEDgAAxA4AAAAAAAAAAAAA
wMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDA
wMDAwMDAQEBAQEBAQEBAQEBAQEBAwMDATn6mwMDATn6mwMDAwMDAwMDAwMDAwMDApqamQEBA
The bincontent is some kind of hex format I think, it always starts with "0x"
0x504B0304140000000800AA713C47B1E305CA66D00000464...
I don't think I've already seen a solution on how to decode that format.
Geert
-
I got curious, and it's indeed a different encoding.
I was able to decode a bincontent string (of a small png image) into a file using https://tomeko.net/online_tools/hex_to_file.php?lang=en (https://tomeko.net/online_tools/hex_to_file.php?lang=en)
The downloaded file was again the classifical .zip file with str.dat as content.
I'm not sure how this website does the conversion, but it seems rather trivial.
Geert
-
@qwerty, @Geert: I do not get the content mentioned in the thread. Mine starts with "<\0?\0x\0m\0l\0 \0v\0e\0r\0s\0i ...". That does not really look correct :-(
Normally I would prefere using Repository.Execute(). But it does not support reading data.
-
I got curious, and it's indeed a different encoding.
I was able to decode a bincontent string (of a small png image) into a file using https://tomeko.net/online_tools/hex_to_file.php?lang=en (https://tomeko.net/online_tools/hex_to_file.php?lang=en)
The downloaded file was again the classifical .zip file with str.dat as content.
I'm not sure how this website does the conversion, but it seems rather trivial.
Geert
Could you share the code to get the HEX, please? I am not sure, if the sequence I implemented is correct.
-
I didn't use code to get the content, just copy paste from SQL server management studio
Geert
-
@qwerty, @Geert: I do not get the content mentioned in the thread. Mine starts with "<\0?\0x\0m\0l\0 \0v\0e\0r\0s\0i ...". That does not really look correct :-(
Normally I would prefere using Repository.Execute(). But it does not support reading data.
That's some UTF8 text (...Vers...)
q.
-
Looks like TECHNOLOGY entries are plain UTF8 (or was it 16?) strings.
q.
P.S. Could you help me with creating a TECHNOLOGY entry in t_document?
-
I am afraid that Repository.SQLQuery does some conversion. If I directly access the database I see the expected data.
-
Well, SQLQuery converts binary data to base64 as it creates an XML result - and binary in an XML is not a good idea. So yes, a direct access will return the blob as it is.
q.
-
I have tried to make this work, but when I try to open the ZipArchiveEntry for the str.dat file, I get an error: "A local file header is corrupt".
This may sugest that Sparx have done something else on top of the (ZIP + base64 encoding) which now seems to be normal for potentially large data fields like this.
The data I am trying to extract is a simple test case: the translation data for a single element, where I added the translation to the 'notes' translation tab via the UI.
Has anyone else had success in extracting this data?
@Sparx - are there any instructions anywhere which say how to acess this data? Maybe an API call?
-
If anyone needs a test case, the data from EA was:
U1MDBBQAAAAIAH5gVVjVk/2cDgEAALICAAAHABEAc3RyLmRhdFVUDQAHk+bV
ZZPm1WWT5tVlfVJda8JAEJznQv/D4XOp+p4qRQUfiv8hNiG1TU7IJSL99c7s
lhbR60PuY2Zud7K7BZY4o0OLgBNq9Eg44IiIF0wwxzNm3APRASXRimtrfG2K
yLP4JRZ4xAMK6npTJupK3jxaIp/jgt0iGozcm9/IrUWOxDpib9S+44unynj3
JTcHrinDy99A7JafX0Vf4YOr+EQX3d0Mec1flv80nknIdVxHFljzPpD55v5p
dam5ynHLTy9UJeURFqlSzXQKdg6WWxF29pfqple2YpYn4sLUSzkcDS/5sjdU
/ZUnzYHeBGzwyv4HbH+i+nxIJz9SKPue32i416GxDHIpD8lcuYcO3vOGUQtM
M9OgOclxmqILUEsBAhcLFAAAAAgAfmBVWNWT/ZwOAQAAsgIAAAcACQAAAAAA
AAAAAACAAAAAAHN0ci5kYXRVVAUAB5Pm1WVQSwUGAAAAAAEAAQA+AAAARAEA
AAAA
which was extracted using a simple bit of SQL to read the BinContent column of t_document.
The text should decode to something like:
Dit zijn enkele aantekeningen in het Nederlands, de secundaire taal van EA. Het vertalen gebeurt nog steeds handmatig.
(Apologies to native Dutch speakers: this is aGoogle translation.)
-
Ian,
I'm pretty sure I saw some methods to get translations using the API
...
EA.Element.GetTXNote (string Code, long Flag) and the likes.
Geert
PS. the Dutch is pretty much OK, no apologies neede :)
-
I hope this doesn't disclose a business secret:
Decode the BASE64 and replace the first two bytes (letters "SS") with the letters "PK" - now you have a valid PKZip archive format.
Extract the (single) entry "str.dat":
<?xml version="1.0" standalone="no" ?>
<translations><translation language="nl" nameLocked="0" aliasLocked="0" notesLocked="1" nameChecksum="0" aliasChecksum="0" notesChecksum="0" name="" alias="">Dit zijn enkele aantekeningen in het Nederlands, de secundaire taal van EA. Het vertalen gebeurt nog steeds handmatig.</translation></translations>
We sometimes also extract stuff from the t_documents table, but we have not seen this kind of obfuscation(?) before.
Why "SS"?
Maybe it's short for "Sparx Systems" - you have a "Sparx Systems" ZIP format!
-
Hmm, nice catch, thanks Johann :)
Geert
-
Hmm, nice catch, thanks Johann :)
Geert
Wot 'e sed!!!!!
Out of interest, how did you figure that out? Seems a "random" change...
Paolo
-
Hope it's not derived from Schutz-Staffel :-(
q.
-
Pure genius Johann.
I would like to award the 2024 'Sparx Sherlock Holmes Award for Best EA Detective' to Johan!!
Thanks to Geert for seeing the element.getTXNote(...) and element.setTXNote(..) methods.
I'm not sure which release these appeared in, but they do everything I need.
Nearly everything: the value of the languae code which these methods needs can be found in usys_system in the TranslateSecondary setting.
But Johan is still a genius.
-
[...] Nearly everything: the value of the languae code which these methods [Element.GetTXNote and alike] needs can be found in usys_system in the TranslateSecondary setting.
[...]
There is easier way IMHO to find these language codes. Sparx documentation says (for Element.*TX* methods): "[...] Two-letter language code (found on the 'Translations' page of the 'Manage Model Options' dialog)".