Recommended Posts

  On 9/19/2020 at 5:48 PM, DarthParametric said:

Is there a game-specific identifier in the NCS header? I thought both were flagged as NCS V1.0B.

No, there isn't. That's why I had to write a detection algorithm.

Share this post


Link to post
Share on other sites

@JCarter426

Thanks! I'll look at them later just to be sure I haven't missed anything. I was supposed to do it earlier, I just forgot about it until I was Force Persuaded to. 😋

Share this post


Link to post
Share on other sites
  On 9/19/2020 at 5:26 PM, AmanoJyaku said:

Version Detection

 

The algorithm was fine, the problem was the input. I was using K2 function definitions, some of which had changed from K1. Now, NCS files are identified as K1 or K2!

 

It's probably easier and safer just query the user on what game this script is for (via a command line option, for example), than to dependant on some brittle heuristic to detect that.

 

  On 9/19/2020 at 5:26 PM, AmanoJyaku said:
  1. Evaluation of for and while loops
    1. They look identical, with the only potential difference being an incremented or decremented value at the end of a for loop
    2. Now that the algorithm described above has been completed, it should be easy to determine the rest of the code that makes a for loop unique
    3. I've temporarily given up on do-while loops because only one file has one, and I'm not entirely certain it's in the compiled NCS file

If you lack files to test this on, you could get Neverwinter Nights and install its toolset, because that came with an official BioWare NWScript compiler. Yes, it's for NWN, but the control structure stuff should be identical.

 

  On 9/19/2020 at 5:26 PM, AmanoJyaku said:
  1. Figuring out the DESTRUCT op code
    1. I've seen this used to destroy variables on the stack that aren't necessary, but prevent the top-of-stack behavior used by NCS
    2. Although this shouldn't be necessary, since the desired variable could just be copied to the top of the stack using the CPTOPSP op code...

Dunno if I already wrote that here or just in IRC, on a GitHub issue or wherever, but: the interesting thing about the DESTRUCT opcode is that it's used to single out individual struct members. I.e. whenever the nss used a struct member, the ncs copies the whole struct to the top of the stack and used DESTRUCT on this new block in the stack to get rid of everything but the single struct member it's interested in.

Which means you can use the existance of DESTRUCT to identify structs.

Share this post


Link to post
Share on other sites
  On 9/20/2020 at 5:52 AM, DrMcCoy said:

It's probably easier and safer just query the user on what game this script is for (via a command line option, for example), than to dependant on some brittle heuristic to detect that.

I considered that, but the differences are obvious and the detection simple. Are you aware of any pitfalls, or are you just being cautious? This is a beta release, there's plenty of time to change functionality.

  On 9/20/2020 at 5:52 AM, DrMcCoy said:

If you lack files to test this on, you could get Neverwinter Nights and install its toolset, because that came with an official BioWare NWScript compiler. Yes, it's for NWN, but the control structure stuff should be identical.

Given the fact that one file out of 2,500 has a do-while loop, it's low priority. I'll get around to it after more pressing matters are dealt with.

  On 9/20/2020 at 5:52 AM, DrMcCoy said:

Dunno if I already wrote that here or just in IRC, on a GitHub issue or wherever, but: the interesting thing about the DESTRUCT opcode is that it's used to single out individual struct members. I.e. whenever the nss used a struct member, the ncs copies the whole struct to the top of the stack and used DESTRUCT on this new block in the stack to get rid of everything but the single struct member it's interested in.

Which means you can use the existance of DESTRUCT to identify structs.

Thanks, I thought as much. I first saw it in use destroying two elements of a vector, so I figured its purpose was for destroying unused elements of aggregates. I just need to identify which elements are preserved.

Share this post


Link to post
Share on other sites

It's time for a monthly update, and I said I might have a beta ready by now. Well, there won't be a beta.

  Reveal hidden contents

Will have more news next week.

  • Like 2

Share this post


Link to post
Share on other sites

Sorry that this is taking longer than I wanted. I was lucky enough to get some contracts, but that also means very little free time.

While a decompiler is an all-or-nothing program (hence the reason there won't be a beta), I can show off some sample output:

  Reveal hidden contents

It's not much, but you can see following:

  1. It identifies the NCS as a Kotor2 file, which is reliant on the code including K2 engine functions
  2. There are six subroutines
    1. _start(), which is automatically included by the compiler
    2. void main()
    3. void DamagingExplosion( object oCreature, int nDelay, int nDamage )
    4. int GR_GetGrenadeDC(object oTarget)
    5. void NonDamagingExplosion(object oCreature, int nDelay)
    6. void KillCreature(object oCreature, int nDelay )
  3. The types of return values are found by examining the called subroutine
  4. The types of parameters must be found by examining caller subroutines (strange as it may seem, there's no guarantee a parameter is used)
  5. The local variables are listed in the order in which they are created

As simple as this may seem, it's been hell trying to figure out how NCS works due to limited documentation, time, and mental capacity (🤪). That said, I think I now know everything there is to know about NCS. Even how to deal with recursion (it took less than an hour), identifying vectors and structs (they do disappear in the bytecode, but there are code patterns to look out for), and handling certain errors in the game's code.

So, the two major tasks left are:

  1. Including block scopes, e.g. if-else and while statements (I can identify them, I just haven't put them in among the locals)
  2. Handling expressions, commonly known as operator precedence and associativity, e.g. int d = (a + b) * c
  • Like 1

Share this post


Link to post
Share on other sites
  On 10/20/2020 at 2:43 AM, AmanoJyaku said:

there's no guarantee a parameter is used

Interesting. I had assumed that the compiler substituted in default values when compiling the NCS, but are you saying that it's the engine that does that at runtime? Which would also mean DeNCS adds missing defaults back in when decompiling then.

Share this post


Link to post
Share on other sites
  On 10/20/2020 at 3:48 AM, DarthParametric said:

Interesting. I had assumed that the compiler substituted in default values when compiling the NCS, but are you saying that it's the engine that does that at runtime? Which would also mean DeNCS adds missing defaults back in when decompiling then.

Sorry, I wasn't clear. You are correct about the compiler substituting defaults into the bytecode, the engine does not substitute at runtime.

(I'm preparing a post to better explain NCS in case someone wants to port this to a different language, or make a better decompiler.)

  Reveal hidden contents

This is a silly example. But, it demonstrates the impossibility of identifying the type and value of an unused parameter from inside a function.

I can't say I've seen this in practice. There are 2,500 files, I'm not looking at them all that closely. But I have to code for the possibility, or risk corrupting the stack.

Share this post


Link to post
Share on other sites

Ah, you were talking about the function not using all the stated parameters. I thought you were talking about calling a function without specifying all the parameters. I suppose that sort of sloppiness is not out of the question when dealing with mod-generated scripts.

Share this post


Link to post
Share on other sites

I forgot to do the monthly update. 🤯 I was all set to finish the decompiler, and of course I ran into trouble.

First, I had a family problem that has since been resolved.

The second is a mundane issue: my laptop has been suffering from BSODs for the past two months, and it's happening with increasing frequency. Sometimes when I wake the machine from sleep or hibernation it BSODs, and when I cold boot it doesn't see any storage. This only happens when it's on battery, and it seems to be when the battery is below 50%. Now that I have an idea of what triggers this I can work around it, but it was affecting my productivity.

The third is an issue with NCS files. My decompiler relied on files being written "correctly", so of course it bombed when fed a poorly written file. For example, k_inc_npckill:

  Reveal hidden contents

Nothing wrong with the source. Let's look at the NCS:

  Reveal hidden contents

The lines with asterisks are dead code, they never get executed and the game is fine with that. Problem is, my decompiler was expecting the last operation of the true branch of an if statement to be part of the control path. The last op of the true branch is what tells you if you're looking at a regular if, an if-else, or an if that exits the script as seen above. The decompiler was looking at the wrong thing, and returning the wrong results. I've now fixed that with an additional set of evaluations.

Now I'm back to working on producing output. This is tedious as it requires keeping track of data, individual operations, and the context in which the data and operations are being used. Believe it or not, a block of code will decompile differently based on code that comes before and after it. (For example, RSADDx does not mean "create a named variable" as I incorrectly assumed months ago. It could create a named variable or a temporary variable, based on the context.)

This morning I had an idea as to the overall rule for determining the beginning and end of statements, so I'll be working on that this weekend. I hope to have a status update in a few days.

  • Like 2

Share this post


Link to post
Share on other sites

 

roast-porg.png.a78b93c95cdbcfdf770956d8f11fe7a3.png

Hope everyone got their fill of roast porg yesterday!

The Gap - No, We're Not Selling Pants

  Reveal hidden contents

 

Do You Have A Reservation?

  Reveal hidden contents

 

The Next Problem - Because There's Always Another Problem

  Reveal hidden contents

These are control statements, the remaining set of statements to decipher. Expect an update in the next few days.

  • Like 1

Share this post


Link to post
Share on other sites

I said there's always another problem...

TL;DR

Any scripts created by nwnnsscomp are likely to be broken. That probably means all scripts in TSLRCM. Oh, joy.

The Gory Details

  Reveal hidden contents

 

  • Like 1

Share this post


Link to post
Share on other sites
  On 11/29/2020 at 3:57 AM, DarthParametric said:

Fry_Shocked.gif.b71e7554f1d28c2a7bd57d31ade5a188.gif

I never used nwnnsscomp until I suspected mod files have "incorrect" scripts. Have you seen other issues?

BTW, I realize my last post might be confusing to those who don't code in C and assembly. I didn't mean to alarm people by giving the impression scripts compiled with nwnnsscomp don't work. As far as I can tell, they work correctly in the game.

They just don't decompile correctly, and that makes it difficult for someone who is using them to learn how to write a decompiler. There is a certain structure that a compiled file must adhere to in order to be decompiled. I finally realized that structure is different between the files in my KoTOR installation, which has no mods, and my TSL installation, which has the TSLRCM.

I don't remember if TSLRCM saves the original scripts anywhere, so I'll reinstall a clean copy of TSL on another machine to examine the original files there.

Share this post


Link to post
Share on other sites

Frankly, saying this is "incorrect" compilation is a misunderstanding of the compilation process. If you look at modern C compilers, they're doing far more transformations. That's all legal and often even wanted (to keep pipelines from stalling, etc). Don't stoop down to clickbait.

And in general, for many targets, a nested if like that is indistinguishable from an unnested if, else if, else in the final machine code anyway. This one reason compilation-decompilation is never a lossless process, different things in the source can compile down to the same machine code.

What you have found is a "tell" in the original BioWare compiler, that lets you distinguish certain things. That nwnnsscomp doesn't have the same tell is not a fault of nwnnsscomp, it's the nature of the thing. Ask me about gcc and clang differences some day, or different styles of mangling C++ symbols in gcc and msvc.

Share this post


Link to post
Share on other sites
  On 11/30/2020 at 9:52 PM, DrMcCoy said:

Frankly, saying this is "incorrect" compilation is a misunderstanding of the compilation process. If you look at modern C compilers, they're doing far more transformations. That's all legal and often even wanted (to keep pipelines from stalling, etc). Don't stoop down to clickbait.

And in general, for many targets, a nested if like that is indistinguishable from an unnested if, else if, else in the final machine code anyway. This one reason compilation-decompilation is never a lossless process, different things in the source can compile down to the same machine code.

What you have found is a "tell" in the original BioWare compiler, that lets you distinguish certain things. That nwnnsscomp doesn't have the same tell is not a fault of nwnnsscomp, it's the nature of the thing. Ask me about gcc and clang differences some day, or different styles of mangling C++ symbols in gcc and msvc.

I'm familiar with compiler optimizations, I considered that possibility.

However, I don't believe this is the case given the other issues like the lack of optimizations for logical AND and logical OR, the existence dead code, and empty functions. There is so much more that could have been optimized that I can only conclude this is sloppy coding. Additionally, it's highly unlikely the game benefits from such optimizations. The "optimized" files are larger (i.e. no cache savings), and the lack of an additional JMP instruction saves a few dozen cycles (which wasn't even a problem in 2003).

Anyway, it's not my intention to argue over this. It won't prevent me from making the decompiler, but it means it won't be possible to guarantee the decompiled output matches the original source for files compiled by nwnnsscomp. I assume those files would only be found in mods. The main purpose for making this decompiler is to access original game files, which I assume were not compiled by nwnnsscomp.

Share this post


Link to post
Share on other sites

Yeah, nwnnsscomp is not optimizing. But my point is that lots of roads lead to Rome, and denoting one as more correct than others is the wrong way to think about this. That different compilers written by different people lead to different bytecode that still accurately perform the same visible actions is normal, that should be dealt with. Similar for dead code. Dead code is entirely fine and should be handled properly without throwing errors there. And I feel that saying "nwnnsscomp is wrong" does it a huge disservice, because it's technically not wrong at all.

So I'm more hung up about your phrasing here. As long as the decompiler still works as such (and doesn't just outright refuse to work) for cases were different idiosyncrasies slip through, be it nwnnsscomp or maybe the BioWare compiler used for Dragon Age, everything is entirely fine with me (for what that's worth, which is maybe not a lot). Guaranteeing a 100% match with the original source in all cases is pretty much impossible anyway, and not something I expected in the first place.

Share this post


Link to post
Share on other sites
  On 11/30/2020 at 9:37 PM, AmanoJyaku said:

Have you seen other issues?

Well I can't offer any input on the whole "correct/incorrect" debate. I'm just a pleb who is completely ignorant of programming. I was just observing that it's not particularly surprising that a fan made tool derived from reverse-engineering Bioware's compiler for a different game that has undergone multiple revisions by different people has a few quirks.

Share this post


Link to post
Share on other sites

Time for an update!

Now that I've recovered from the nwnnsscomp-induced heart attack, I'm back to solving the last of the reverse compiler challenges. (When I complete the reverse compiler, I'll provide a detailed explanation of NCS, how NCS maps to NWScript, how the reverse compiler works, and why I'm so critical of nwnnsscomp.)

The current problem I'm working on looks rather simple:

RSADDx
JSR
JSR

We see a reservation on the stack, followed by two subroutines. NCS rules for subroutines are as follows:

  1. Optionally, reserve one or more values on the stack
  2. Optionally, place subroutine arguments on the stack
  3. Call the subroutine

You can see the disaster coming: it's those optional bits. In the example, the first subroutine might return a value. If so, that value is the argument of the second subroutine.

But, RSADDx is also used to create uninitialized named variables in NWScript, for example:

int i;		//RSADDI
float f;	//RSADDF
string s;	//RSADDS

You should never create uninitialized variables, but it was once a common practice. Numerous examples exist in NCS files, and that means the example code can also be read as an uninitialized variable, followed by two subroutines that do not take parameters and return void. In other words:

Sub1(Sub0());

//vs

int i;
Sub0();
Sub1();

So, how do we figure this out? Unfortunately, we have to look inside each of the JSR calls. Which means yet another rewrite of my reverse compiler. It's not the worst thing in the world since the purpose of a reverse compiler is to reconstruct JSR calls. But more work must be done to determine dependencies, change the order in which things are analyzed, and store the resulting information for later use.

I don't work this hard in my paying job...

  • Like 1

Share this post


Link to post
Share on other sites

Happy Life Day!!!

447dc2af-a77c-418f-951f-a082d66529c1-LEG

The Last Problem

  Reveal hidden contents

The Sub Awakens

  Reveal hidden contents

The Rise of Subwalker

  Reveal hidden contents

tl;dr

The reverse compiler is on its way to being complete. I was honestly about to give up, but I'm glad I didn't!

  • Like 1

Share this post


Link to post
Share on other sites
  On 12/25/2020 at 9:59 PM, AmanoJyaku said:

That means I can't test with NCS files from mods, since those are made with nwnnsscomp. Switching back to vanilla NCS files has allowed for the reverse compiler to proceed.

So what does that mean for the final product? Are scripts produced by nwnnsscomp doomed to never being decompilable properly (at least for complex ones)?

Share this post


Link to post
Share on other sites
  On 12/26/2020 at 4:38 AM, DarthParametric said:

So what does that mean for the final product? Are scripts produced by nwnnsscomp doomed to never being decompilable properly (at least for complex ones)?

I'm confident they can be, and I'm currently in the process of confirming that. What I believe will happen is that vanilla game files can be reversed perfectly, but anything that was compiled by nwnnsscomp will be altered from its original source. As DrMcoy said, the result, while syntactically different, should functionally be the same as the original source.

For example, the source of k_contain_unlock from TSLRCM contains the following:

string GetTreasureBundle (int nItemLevel, int nItemType = 0)
{
  string sTemplate = "";
 
  if (nItemType == 0) {}
  else
  {
    if((nItemType % 100) == 0) {}
    else if((nItemType % 10) == 0) {}
    else {}
  }
  
  return sTemplate;
}

But, the NCS file is written like this:

string GetTreasureBundle (int nItemLevel, int nItemType = 0)
{
  string sTemplate = "";
 
  if () {}
  else if (){}
  else if() {}
  else {}
  
  return sTemplate;
}

The number of branches is the same across both, so the two forms should behave the same when executed by the game. If this proves to be true, then it won't matter if the reverse compiler produces the second form even if the first is the original.

If I understand correctly, the original intent of the reverse compiler was to retrieve the sources for vanilla game scripts. Those are not compiled with nwnnsscomp. Which means the only files that are of concern are mods. I don't know what the policy is for reverse engineering mods, but I assume that's frowned upon?

Share this post


Link to post
Share on other sites
  On 12/26/2020 at 6:04 AM, AmanoJyaku said:

the result, while syntactically different, should functionally be the same as the original source

That seems fine to me. I was more concerned that certain files might not able to be decompiled at all. This is the case for DeNCS as well, based on my own experiments.

  On 12/26/2020 at 6:04 AM, AmanoJyaku said:

I don't know what the policy is for reverse engineering mods, but I assume that's frowned upon?

The main rights issues with mods are rehosting a mod without the author's permission and/or reusing their assets without permission. I think scripts fall into something of a grey area, partially because they are often derived from vanilla game scripts anyway and also because access to the source is necessary for compatibility (especially for large mods like TSLRCM). It's why authors are encouraged to include their script source (thus tacitly approving modification/reuse) to sidestep the problem altogether. If you have concerns I'm sure the admins can give you an official ruling on whatever it is you are specifically worried about.

Share this post


Link to post
Share on other sites
  On 12/13/2020 at 4:12 PM, AmanoJyaku said:

You should never create uninitialized variables, but it was once a common practice.

This is still common practice, at least as far as I've been taught. Yeah, there are security issues if you then try to read one, but modern compilers either won't let you do that or will initialize it with a default value, if the language itself doesn't. With that in mind, I don't see the issue.

  On 12/26/2020 at 6:04 AM, AmanoJyaku said:

the result, while syntactically different, should functionally be the same as the original source

  On 12/26/2020 at 6:14 AM, DarthParametric said:

That seems fine to me. I was more concerned that certain files might not able to be decompiled at all. This is the case for DeNCS as well, based on my own experiments.

I agree. I'm not concerned about syntactically identical source code. The way the source code is written is... not always the best way, to put it lightly.

The issues you've mentioned before don't seem like they should cause problems to me. Like with the nested else/if vs a simple else if, that'll only be functionally different if you declare something local to the else branch's scope. And, obviously, don't do that. And I doubt the compiler would've optimized for it that way if it would produce functionally different code. It usually knows better than we do.

  On 12/26/2020 at 6:04 AM, AmanoJyaku said:

I don't know what the policy is for reverse engineering mods, but I assume that's frowned upon?

I think that depends on whether you ask Oracle or Google...

Share this post


Link to post
Share on other sites
Guest
This topic is now closed to further replies.