Recommended Posts

1 minute ago, DarthParametric said:

Is there a game-specific identifier in the NCS header? I thought both were flagged as NCS V1.0B.

No, there isn't. That's why I had to write a detection algorithm.

Share this post


Link to post
Share on other sites
12 hours ago, AmanoJyaku said:

Version Detection

 

The algorithm was fine, the problem was the input. I was using K2 function definitions, some of which had changed from K1. Now, NCS files are identified as K1 or K2!

 

It's probably easier and safer just query the user on what game this script is for (via a command line option, for example), than to dependant on some brittle heuristic to detect that.

 

12 hours ago, AmanoJyaku said:
  1. Evaluation of for and while loops
    1. They look identical, with the only potential difference being an incremented or decremented value at the end of a for loop
    2. Now that the algorithm described above has been completed, it should be easy to determine the rest of the code that makes a for loop unique
    3. I've temporarily given up on do-while loops because only one file has one, and I'm not entirely certain it's in the compiled NCS file

If you lack files to test this on, you could get Neverwinter Nights and install its toolset, because that came with an official BioWare NWScript compiler. Yes, it's for NWN, but the control structure stuff should be identical.

 

12 hours ago, AmanoJyaku said:
  1. Figuring out the DESTRUCT op code
    1. I've seen this used to destroy variables on the stack that aren't necessary, but prevent the top-of-stack behavior used by NCS
    2. Although this shouldn't be necessary, since the desired variable could just be copied to the top of the stack using the CPTOPSP op code...

Dunno if I already wrote that here or just in IRC, on a GitHub issue or wherever, but: the interesting thing about the DESTRUCT opcode is that it's used to single out individual struct members. I.e. whenever the nss used a struct member, the ncs copies the whole struct to the top of the stack and used DESTRUCT on this new block in the stack to get rid of everything but the single struct member it's interested in.

Which means you can use the existance of DESTRUCT to identify structs.

Share this post


Link to post
Share on other sites
9 hours ago, DrMcCoy said:

It's probably easier and safer just query the user on what game this script is for (via a command line option, for example), than to dependant on some brittle heuristic to detect that.

I considered that, but the differences are obvious and the detection simple. Are you aware of any pitfalls, or are you just being cautious? This is a beta release, there's plenty of time to change functionality.

9 hours ago, DrMcCoy said:

If you lack files to test this on, you could get Neverwinter Nights and install its toolset, because that came with an official BioWare NWScript compiler. Yes, it's for NWN, but the control structure stuff should be identical.

Given the fact that one file out of 2,500 has a do-while loop, it's low priority. I'll get around to it after more pressing matters are dealt with.

9 hours ago, DrMcCoy said:

Dunno if I already wrote that here or just in IRC, on a GitHub issue or wherever, but: the interesting thing about the DESTRUCT opcode is that it's used to single out individual struct members. I.e. whenever the nss used a struct member, the ncs copies the whole struct to the top of the stack and used DESTRUCT on this new block in the stack to get rid of everything but the single struct member it's interested in.

Which means you can use the existance of DESTRUCT to identify structs.

Thanks, I thought as much. I first saw it in use destroying two elements of a vector, so I figured its purpose was for destroying unused elements of aggregates. I just need to identify which elements are preserved.

Share this post


Link to post
Share on other sites

It's time for a monthly update, and I said I might have a beta ready by now. Well, there won't be a beta.

Spoiler

I'm skipping the beta to complete the "final" release!

The to-do list is nearly complete:

  1. Evaluating subroutines - Complete
    1. Input parameters and return values have been discovered
  2. Merging the control structure algorithm with the virtual stack - Complete
    1. Subroutine scopes and variables are correctly identified
  3. Evaluation of for and while loops - Nearly complete
    1. While loops - Complete
    2. For loops - In progress
  4. Evaluation of switch statements - Nearly complete
    1. Case labels and statements - Complete
    2. Default label and statement - In progress
  5. Figuring out the DESTRUCT op code - In progress
    1. Not started, but not expecting it to take much work

Will have more news next week.

  • Like 2

Share this post


Link to post
Share on other sites

Sorry that this is taking longer than I wanted. I was lucky enough to get some contracts, but that also means very little free time.

While a decompiler is an all-or-nothing program (hence the reason there won't be a beta), I can show off some sample output:

Spoiler

'.\NCS Analyzer.exe' '..\..\Compiled Scripts\Kotor2\k_inc_npckill.ncs'
This is a K2 file

void Sub1()

void Sub2()
Integer
Integer
Integer

void Sub3(???, ???, ???)
Integer
Integer
Location
Float
Float
Float
Float
Location
Object

int Sub4(???)
Integer
Integer
Integer

void Sub5(???, ???)

void Sub6(???, ???)
Effect

 

It's not much, but you can see following:

  1. It identifies the NCS as a Kotor2 file, which is reliant on the code including K2 engine functions
  2. There are six subroutines
    1. _start(), which is automatically included by the compiler
    2. void main()
    3. void DamagingExplosion( object oCreature, int nDelay, int nDamage )
    4. int GR_GetGrenadeDC(object oTarget)
    5. void NonDamagingExplosion(object oCreature, int nDelay)
    6. void KillCreature(object oCreature, int nDelay )
  3. The types of return values are found by examining the called subroutine
  4. The types of parameters must be found by examining caller subroutines (strange as it may seem, there's no guarantee a parameter is used)
  5. The local variables are listed in the order in which they are created

As simple as this may seem, it's been hell trying to figure out how NCS works due to limited documentation, time, and mental capacity (🤪). That said, I think I now know everything there is to know about NCS. Even how to deal with recursion (it took less than an hour), identifying vectors and structs (they do disappear in the bytecode, but there are code patterns to look out for), and handling certain errors in the game's code.

So, the two major tasks left are:

  1. Including block scopes, e.g. if-else and while statements (I can identify them, I just haven't put them in among the locals)
  2. Handling expressions, commonly known as operator precedence and associativity, e.g. int d = (a + b) * c
  • Like 1

Share this post


Link to post
Share on other sites
1 hour ago, AmanoJyaku said:

there's no guarantee a parameter is used

Interesting. I had assumed that the compiler substituted in default values when compiling the NCS, but are you saying that it's the engine that does that at runtime? Which would also mean DeNCS adds missing defaults back in when decompiling then.

Share this post


Link to post
Share on other sites
11 hours ago, DarthParametric said:

Interesting. I had assumed that the compiler substituted in default values when compiling the NCS, but are you saying that it's the engine that does that at runtime? Which would also mean DeNCS adds missing defaults back in when decompiling then.

Sorry, I wasn't clear. You are correct about the compiler substituting defaults into the bytecode, the engine does not substitute at runtime.

(I'm preparing a post to better explain NCS in case someone wants to port this to a different language, or make a better decompiler.)

Spoiler

 Let's say you have the call Sub0(1, 2, 3):


void main()
{
	Sub0(1, 2, 3);
	return;
}

void Sub0(int a, int b, int c)
{
	b + c;
	return;
}

The NCS call in void main() looks like this:


CONSTI 3
CONSTI 2
CONSTI 1
JSR Sub0

The inside of Sub0 looks like this:


//Top of the stack is always 0
//Upon entry to the subroutine -4 is a, -8 is b, -12 is c

CPTOPSP -12 4
//Now -4 is the copy of c, -8 is a, -12 is b, -16 is c

CPTOPSP -12 4
//Now -4 is the copy of b, -8 is the copy of c, -12 is a, -16 is b, -20 is c

ADDII
//Now -4 is the result of b + c, -8 is a, -12 is b, -16 is c

MOVSP -4
//Now -4 is a, -8 is b, -12 is c

MOVSP -12
//All parameters destroyed

RETN

 

This is a silly example. But, it demonstrates the impossibility of identifying the type and value of an unused parameter from inside a function.

I can't say I've seen this in practice. There are 2,500 files, I'm not looking at them all that closely. But I have to code for the possibility, or risk corrupting the stack.

Share this post


Link to post
Share on other sites

Ah, you were talking about the function not using all the stated parameters. I thought you were talking about calling a function without specifying all the parameters. I suppose that sort of sloppiness is not out of the question when dealing with mod-generated scripts.

Share this post


Link to post
Share on other sites

I forgot to do the monthly update. 🤯 I was all set to finish the decompiler, and of course I ran into trouble.

First, I had a family problem that has since been resolved.

The second is a mundane issue: my laptop has been suffering from BSODs for the past two months, and it's happening with increasing frequency. Sometimes when I wake the machine from sleep or hibernation it BSODs, and when I cold boot it doesn't see any storage. This only happens when it's on battery, and it seems to be when the battery is below 50%. Now that I have an idea of what triggers this I can work around it, but it was affecting my productivity.

The third is an issue with NCS files. My decompiler relied on files being written "correctly", so of course it bombed when fed a poorly written file. For example, k_inc_npckill:

Spoiler

void main()
{
    int nKillMode = GetScriptParameter(1);
    int nDelay = GetScriptParameter(2);
    int nDamage = GetScriptParameter(3);

    if ( nKillMode == 0 )
    {
        DamagingExplosion(OBJECT_SELF, nDelay, nDamage);
        return;
    }

    if ( nKillMode == 1 )
    {
        NonDamagingExplosion(OBJECT_SELF, nDelay);
        return;
    }

    if ( nKillMode == 2 )
    {
        KillCreature(OBJECT_SELF, nDelay);
        return;
    }
}

 

Nothing wrong with the source. Let's look at the NCS:

Spoiler

void main()
21      RSADDx
23      CONSTI 1
29      GetScriptParameter
34      CPDOWNSP -2 1
42      MOVSP -1
48      RSADDx
50      CONSTI 2
56      GetScriptParameter
61      CPDOWNSP -2 1
69      MOVSP -1
75      RSADDx
77      CONSTI 3
83      GetScriptParameter
88      CPDOWNSP -2 1
96      MOVSP -1
102     CPTOPSP -3 1
110     CONSTI 0
116     EQUALxx
118     JZ 170

124     CPTOPSP -1 1
132     CPTOPSP -3 1
140     CONSTO object
146     JSR 298
152     MOVSP -3
158     JMP 296

*164     JMP 170

170     CPTOPSP -3 1
178     CONSTI 1
184     EQUALxx
186     JZ 230

192     CPTOPSP -2 1
200     CONSTO object
206     JSR 1601
212     MOVSP -3
218     JMP 296

*224     JMP 230

230     CPTOPSP -3 1
238     CONSTI 2
244     EQUALxx
246     JZ 290

252     CPTOPSP -2 1
260     CONSTO object
266     JSR 1637
272     MOVSP -3
278     JMP 296

*284     JMP 290

290     MOVSP -3

296     RETN

 

The lines with asterisks are dead code, they never get executed and the game is fine with that. Problem is, my decompiler was expecting the last operation of the true branch of an if statement to be part of the control path. The last op of the true branch is what tells you if you're looking at a regular if, an if-else, or an if that exits the script as seen above. The decompiler was looking at the wrong thing, and returning the wrong results. I've now fixed that with an additional set of evaluations.

Now I'm back to working on producing output. This is tedious as it requires keeping track of data, individual operations, and the context in which the data and operations are being used. Believe it or not, a block of code will decompile differently based on code that comes before and after it. (For example, RSADDx does not mean "create a named variable" as I incorrectly assumed months ago. It could create a named variable or a temporary variable, based on the context.)

This morning I had an idea as to the overall rule for determining the beginning and end of statements, so I'll be working on that this weekend. I hope to have a status update in a few days.

  • Like 2

Share this post


Link to post
Share on other sites

 

roast-porg.png.a78b93c95cdbcfdf770956d8f11fe7a3.png

Hope everyone got their fill of roast porg yesterday!

The Gap - No, We're Not Selling Pants

Spoiler

My last update touched on NWScript statements and the problem of RSADDx. Recreating the original statements is the ultimate goal of a decompiler. Yet, there is no handbook on what combinations of NCS code map to NWScript statements. All we have is Skywing's website that lists the 45 opcodes, the data types, and the stack operation. And even that information is incomplete or incorrect. So, I've spent the past 9 months reading through NSS and NCS files to see how opcodes are used so that I can fill in the gaps.

For example, Skywing's doc describes MOVSP like so:

Quote

MOVSP - Adjust the Stack Pointer
Add the value specified in the instruction to the stack pointer.
The value of SP is adjusted by the value specified.

Well, that's useless... If you're familiar with x86 assembly you probably know that moving the stack pointer allocates and deallocates space on the stack for variables. Enter a function with four 32-bit local variables and the stack pointer moves 4x32-bits in one direction. Exit the function and the stack pointer moves 4x32-bits in the other direction. So, you might assume the same in NCS.

And you would be wrong. I've only ever seen negative values passed to MOVSP, which means that value that is "added" is in fact subtracted. MOVSP serves one purpose only: to destroy variables. OK, so what? Well, now that we know MOVSP only destroys variables we know what's happening whenever we see it:


21      RSADDx
23      CONSTx 0
29      CPDOWNSP -2 1
37      MOVSP -1

At offset 37 a variable is being destroyed. Which variable? Well, now we need to know which opcodes create variables.

 

Do You Have A Reservation?

Spoiler

 There are only four:

  1. RSADDx
  2. CPTOPSP
  3. CONSTx
  4. CPTOPBP

Which means there are two variables on the stack in the example. RSADDx simply allocates space for a variable without giving it a value. This is similar to the following C code:


//An uninitialized variable. Never do this!!!
int i; 

Similar, but not identical. More on that in a moment. CONSTx also allocates space for a variable, but it actually initializes it with a value. This is similar to the following:


0;

Also similar, but not identical. More on that in a moment, too. Back to MOVSP. MOVSP at offset 37 destroys one variable, the integer 0 created at offset 23.

The reason RSADDx and CONSTx are similar, but not identical, to the C code is that the NCS runtime doesn't understand C. What it does understand is its stack, and both RSADDx and CONSTx add to the stack. MOVSP removes from the stack. It's simple, but efficient. Too simple, since the C abstractions are lost. C abstracts variable locations by referring to variables by name. Where is the variable named i? It's at position 0 on the stack. Where is the constant 0? It's at position 4 on the stack. We program in languages like C because we don't want to be bothered with remembering such details. Especially since variables might get moved over the course of software development. Even if they aren't moved, they may get duplicated during program execution! (e.g. i++ makes a copy of i, increments i, then returns the pre-incremented copy.) Better to give it a name and let the compiler and/or linker worry about which address the variable is at.

But, what is the significance of MOVSP in the example? The variable that is destroyed is an immediate value. Named variable are destroyed at the end of their enclosing block, but immediates are destroyed at the end of a statement. Which means MOVSP signals either the end of a statement or the end of a block. So, look for MOVSP and you know you've reached the end of... something. If the MOVSP destroys a variable created by CONSTx then we've reached the end of a statement. If MOVSP destroys more than that then we've reached the end of a block!

Unfortunately, life isn't that simple. Consider the following pseudo code:


21 RSADDx
23 JSR some_offset

We see a reservation, followed by a subroutine call. According to Skywing's doc, a subroutine the returns a value must be preceded by RSADDx. There's just one problem: RSADDx is also used to create uninitialized named variables as seen in our first example! The first example is the C code int i = 0.

  1. An uninitialized variable is reserved on the stack
  2. A constant, or immediate value, is placed on the stack
  3. The constant is copied down to the reserved space (assignment, or = in C)
  4. The constant is destroyed because is a temporary

Now that we know RSADDx can be either a named variable or a temporary, we're faced with a dilemma. Is our second example a subroutine that returns a value, which means the RSADDx is a temporary? Or is the second example an uninitialized variable followed by a subroutine that returns void? This is the problem I was working on as of the last update, and I believe I found the solution over the weekend. Our friend MOVSP provided the answer, and I can find statements with a large degree of accuracy.

 

The Next Problem - Because There's Always Another Problem

Spoiler

Since then, I've been working on another problem: what is this JZ? If you think RSADDx is a pain since it represents two possibilities, consider that JZ can be:

  1. an if()
  2. an if() else
  3. an if() else if()
  4. a while()
  5. a for ()
  6. a do-while()
  7. logical AND (&&)
  8. logical OR (||)

These are control statements, the remaining set of statements to decipher. Expect an update in the next few days.

  • Like 1

Share this post


Link to post
Share on other sites

I said there's always another problem...

TL;DR

Any scripts created by nwnnsscomp are likely to be broken. That probably means all scripts in TSLRCM. Oh, joy.

The Gory Details

Spoiler

I found something curious in the NCS bytecode, so I decided to recompile an NSS file to see if the problem was created by nwnnsscomp. The dead code I've been complaining about? It's created by nwnnsscomp, which means nwnnsscomp isn't creating NCS files correctly!!! For example, an if-else looks like this graphically:

if-else.png.9269c90e2f8dfdab170e8e9eefcc912b.png

Now, let's nest a second if-else into the false branch:


  if(true)
  {
    //something
  }
  else
  {
    if(true)
    {
      //some other thing
    }
    else
    {
      //some else thing
    }
  }

if-else.png.00ab4d03523fd78a2ba025695ede6a29.png

The key here is that block c from the first graph has been replaced with blocks c, e, f and g. Block g is a separate block from d. But, that's not what nwnnsscomp does:

if-else.png.d9cc749a43420a630c45d5b003e3f693.png

That's this:


if (true)
{
	//something
}
else if (true)
{
	//some other thing
}
else
{
	//some else thing
}

Which is perfectly legal code, it just doesn't match what's in the source script. So, this is the reason I've been struggling to make a decompiler...

The dead code? It's probably meant to be the missing block g from the second graph.

 

  • Like 1

Share this post


Link to post
Share on other sites
On 11/28/2020 at 10:57 PM, DarthParametric said:

Fry_Shocked.gif.b71e7554f1d28c2a7bd57d31ade5a188.gif

I never used nwnnsscomp until I suspected mod files have "incorrect" scripts. Have you seen other issues?

BTW, I realize my last post might be confusing to those who don't code in C and assembly. I didn't mean to alarm people by giving the impression scripts compiled with nwnnsscomp don't work. As far as I can tell, they work correctly in the game.

They just don't decompile correctly, and that makes it difficult for someone who is using them to learn how to write a decompiler. There is a certain structure that a compiled file must adhere to in order to be decompiled. I finally realized that structure is different between the files in my KoTOR installation, which has no mods, and my TSL installation, which has the TSLRCM.

I don't remember if TSLRCM saves the original scripts anywhere, so I'll reinstall a clean copy of TSL on another machine to examine the original files there.

Share this post


Link to post
Share on other sites

Frankly, saying this is "incorrect" compilation is a misunderstanding of the compilation process. If you look at modern C compilers, they're doing far more transformations. That's all legal and often even wanted (to keep pipelines from stalling, etc). Don't stoop down to clickbait.

And in general, for many targets, a nested if like that is indistinguishable from an unnested if, else if, else in the final machine code anyway. This one reason compilation-decompilation is never a lossless process, different things in the source can compile down to the same machine code.

What you have found is a "tell" in the original BioWare compiler, that lets you distinguish certain things. That nwnnsscomp doesn't have the same tell is not a fault of nwnnsscomp, it's the nature of the thing. Ask me about gcc and clang differences some day, or different styles of mangling C++ symbols in gcc and msvc.

Share this post


Link to post
Share on other sites
45 minutes ago, DrMcCoy said:

Frankly, saying this is "incorrect" compilation is a misunderstanding of the compilation process. If you look at modern C compilers, they're doing far more transformations. That's all legal and often even wanted (to keep pipelines from stalling, etc). Don't stoop down to clickbait.

And in general, for many targets, a nested if like that is indistinguishable from an unnested if, else if, else in the final machine code anyway. This one reason compilation-decompilation is never a lossless process, different things in the source can compile down to the same machine code.

What you have found is a "tell" in the original BioWare compiler, that lets you distinguish certain things. That nwnnsscomp doesn't have the same tell is not a fault of nwnnsscomp, it's the nature of the thing. Ask me about gcc and clang differences some day, or different styles of mangling C++ symbols in gcc and msvc.

I'm familiar with compiler optimizations, I considered that possibility.

However, I don't believe this is the case given the other issues like the lack of optimizations for logical AND and logical OR, the existence dead code, and empty functions. There is so much more that could have been optimized that I can only conclude this is sloppy coding. Additionally, it's highly unlikely the game benefits from such optimizations. The "optimized" files are larger (i.e. no cache savings), and the lack of an additional JMP instruction saves a few dozen cycles (which wasn't even a problem in 2003).

Anyway, it's not my intention to argue over this. It won't prevent me from making the decompiler, but it means it won't be possible to guarantee the decompiled output matches the original source for files compiled by nwnnsscomp. I assume those files would only be found in mods. The main purpose for making this decompiler is to access original game files, which I assume were not compiled by nwnnsscomp.

Share this post


Link to post
Share on other sites

Yeah, nwnnsscomp is not optimizing. But my point is that lots of roads lead to Rome, and denoting one as more correct than others is the wrong way to think about this. That different compilers written by different people lead to different bytecode that still accurately perform the same visible actions is normal, that should be dealt with. Similar for dead code. Dead code is entirely fine and should be handled properly without throwing errors there. And I feel that saying "nwnnsscomp is wrong" does it a huge disservice, because it's technically not wrong at all.

So I'm more hung up about your phrasing here. As long as the decompiler still works as such (and doesn't just outright refuse to work) for cases were different idiosyncrasies slip through, be it nwnnsscomp or maybe the BioWare compiler used for Dragon Age, everything is entirely fine with me (for what that's worth, which is maybe not a lot). Guaranteeing a 100% match with the original source in all cases is pretty much impossible anyway, and not something I expected in the first place.

Share this post


Link to post
Share on other sites
6 hours ago, AmanoJyaku said:

Have you seen other issues?

Well I can't offer any input on the whole "correct/incorrect" debate. I'm just a pleb who is completely ignorant of programming. I was just observing that it's not particularly surprising that a fan made tool derived from reverse-engineering Bioware's compiler for a different game that has undergone multiple revisions by different people has a few quirks.

Share this post


Link to post
Share on other sites

Time for an update!

Now that I've recovered from the nwnnsscomp-induced heart attack, I'm back to solving the last of the reverse compiler challenges. (When I complete the reverse compiler, I'll provide a detailed explanation of NCS, how NCS maps to NWScript, how the reverse compiler works, and why I'm so critical of nwnnsscomp.)

The current problem I'm working on looks rather simple:

RSADDx
JSR
JSR

We see a reservation on the stack, followed by two subroutines. NCS rules for subroutines are as follows:

  1. Optionally, reserve one or more values on the stack
  2. Optionally, place subroutine arguments on the stack
  3. Call the subroutine

You can see the disaster coming: it's those optional bits. In the example, the first subroutine might return a value. If so, that value is the argument of the second subroutine.

But, RSADDx is also used to create uninitialized named variables in NWScript, for example:

int i;		//RSADDI
float f;	//RSADDF
string s;	//RSADDS

You should never create uninitialized variables, but it was once a common practice. Numerous examples exist in NCS files, and that means the example code can also be read as an uninitialized variable, followed by two subroutines that do not take parameters and return void. In other words:

Sub1(Sub0());

//vs

int i;
Sub0();
Sub1();

So, how do we figure this out? Unfortunately, we have to look inside each of the JSR calls. Which means yet another rewrite of my reverse compiler. It's not the worst thing in the world since the purpose of a reverse compiler is to reconstruct JSR calls. But more work must be done to determine dependencies, change the order in which things are analyzed, and store the resulting information for later use.

I don't work this hard in my paying job...

  • Like 1

Share this post


Link to post
Share on other sites

Happy Life Day!!!

447dc2af-a77c-418f-951f-a082d66529c1-LEG

The Last Problem

Spoiler

The last post described the issue with RSADDx and JSR. The solution was to analyze by call dependency: we use the analysis of B to analyze A.


void subA()
{
  	subB();
}

You knew it wouldn't be that simple, right? How do we deal with recursive calls and circular references? In other words:

  1. When subroutine A calls subroutine A
  2. When subroutine A calls subroutine B, and subroutine B calls subroutine A

The above are possible due to control statements, namely if-else statements. If we can ignore the recursive calls and circular references then we can create a dependency tree. Unfortunately, we're back to the original problem of understanding the subroutine's internal structure.


void subA()
{
  if(true)
  {
    subA();
    //wtf???
  }
  else
  {
    //do stuff
  }
}

 

The Sub Awakens

Spoiler

The high-level structure of a subroutine is four parts:


return_type subA(Argument_1, Argument_2...)
{
  //Part 1 - The code that does stuff
  //Part 2 - Values returned
  //Part 3 - Destruction of local variables
  //Part 4 - Destruction of arguments
}

As you can see, we need parts 2 and 4 to solve the problem of RSADDx and JSR. But, all parts are optional. A subroutine doesn't have to take arguments, return values, create local variables, or even have code. (Yes, there are empty subroutines in NCS files. Today's compilers would just omit such subroutines.) Since all parts are optional, it's necessary to analyze enough of the subroutine to know where one part ends and the next begins.

Is there a way to do a partial analysis of a subroutine while avoiding the pitfalls of recursion and circular references? Of course there is.

The Rise of Subwalker

Spoiler

You just have to find the child scopes and skip over them. The subroutine is the top-level scope, and most control statements create new scopes.


void subA()
{
  //A scope
  if (true)
  {
    //A second scope
  }
  else
  {
    //A third scope
  }
}

I tried identifying scopes months ago, but got confused as to how to find them in code. My meltdown over nwnnsscomp was due to me thinking I figured it out, then getting confused over code that didn't match what I knew. I have ample evidence NCS was designed to be reverse compiled perfectly.

That's right, you should be able to recreate the original NSS, vectors, structures, loops, branches and all.

I just didn't count on a compiler producing code that would violate this design pattern. That means I can't test with NCS files from mods, since those are made with nwnnsscomp. Switching back to vanilla NCS files has allowed for the reverse compiler to proceed.

I now have an analyzer that finds control structures perfectly, all located in part 1 of the subroutine, and can now move on to parts 2, 3 and 4.

tl;dr

The reverse compiler is on its way to being complete. I was honestly about to give up, but I'm glad I didn't!

  • Like 1

Share this post


Link to post
Share on other sites
6 hours ago, AmanoJyaku said:

That means I can't test with NCS files from mods, since those are made with nwnnsscomp. Switching back to vanilla NCS files has allowed for the reverse compiler to proceed.

So what does that mean for the final product? Are scripts produced by nwnnsscomp doomed to never being decompilable properly (at least for complex ones)?

Share this post


Link to post
Share on other sites
1 hour ago, DarthParametric said:

So what does that mean for the final product? Are scripts produced by nwnnsscomp doomed to never being decompilable properly (at least for complex ones)?

I'm confident they can be, and I'm currently in the process of confirming that. What I believe will happen is that vanilla game files can be reversed perfectly, but anything that was compiled by nwnnsscomp will be altered from its original source. As DrMcoy said, the result, while syntactically different, should functionally be the same as the original source.

For example, the source of k_contain_unlock from TSLRCM contains the following:

string GetTreasureBundle (int nItemLevel, int nItemType = 0)
{
  string sTemplate = "";
 
  if (nItemType == 0) {}
  else
  {
    if((nItemType % 100) == 0) {}
    else if((nItemType % 10) == 0) {}
    else {}
  }
  
  return sTemplate;
}

But, the NCS file is written like this:

string GetTreasureBundle (int nItemLevel, int nItemType = 0)
{
  string sTemplate = "";
 
  if () {}
  else if (){}
  else if() {}
  else {}
  
  return sTemplate;
}

The number of branches is the same across both, so the two forms should behave the same when executed by the game. If this proves to be true, then it won't matter if the reverse compiler produces the second form even if the first is the original.

If I understand correctly, the original intent of the reverse compiler was to retrieve the sources for vanilla game scripts. Those are not compiled with nwnnsscomp. Which means the only files that are of concern are mods. I don't know what the policy is for reverse engineering mods, but I assume that's frowned upon?

Share this post


Link to post
Share on other sites
21 minutes ago, AmanoJyaku said:

the result, while syntactically different, should functionally be the same as the original source

That seems fine to me. I was more concerned that certain files might not able to be decompiled at all. This is the case for DeNCS as well, based on my own experiments.

21 minutes ago, AmanoJyaku said:

I don't know what the policy is for reverse engineering mods, but I assume that's frowned upon?

The main rights issues with mods are rehosting a mod without the author's permission and/or reusing their assets without permission. I think scripts fall into something of a grey area, partially because they are often derived from vanilla game scripts anyway and also because access to the source is necessary for compatibility (especially for large mods like TSLRCM). It's why authors are encouraged to include their script source (thus tacitly approving modification/reuse) to sidestep the problem altogether. If you have concerns I'm sure the admins can give you an official ruling on whatever it is you are specifically worried about.

Share this post


Link to post
Share on other sites
On 12/13/2020 at 11:12 AM, AmanoJyaku said:

You should never create uninitialized variables, but it was once a common practice.

This is still common practice, at least as far as I've been taught. Yeah, there are security issues if you then try to read one, but modern compilers either won't let you do that or will initialize it with a default value, if the language itself doesn't. With that in mind, I don't see the issue.

6 hours ago, AmanoJyaku said:

the result, while syntactically different, should functionally be the same as the original source

6 hours ago, DarthParametric said:

That seems fine to me. I was more concerned that certain files might not able to be decompiled at all. This is the case for DeNCS as well, based on my own experiments.

I agree. I'm not concerned about syntactically identical source code. The way the source code is written is... not always the best way, to put it lightly.

The issues you've mentioned before don't seem like they should cause problems to me. Like with the nested else/if vs a simple else if, that'll only be functionally different if you declare something local to the else branch's scope. And, obviously, don't do that. And I doubt the compiler would've optimized for it that way if it would produce functionally different code. It usually knows better than we do.

6 hours ago, AmanoJyaku said:

I don't know what the policy is for reverse engineering mods, but I assume that's frowned upon?

I think that depends on whether you ask Oracle or Google...

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.