Recommended Posts

I put my original mod on hold since someone else is making it. And, wow, does it look good!

 

So, on to the most neglected project: a new DeNCS. It's currently able to interpret op codes and arguments, so the next step is the higher-level control flow. Here's where you can help:

  • Byte Code - I obtained a list of op codes from here. I need to make sure it's complete and correct.
  • NWScript - I am compiling a list of operators, punctuators, and keywords from here, and here. I need to make sure it's complete and correct.
  • Mapping - This is the big one, making sure script converts to byte code, and vice versa. 100% identical conversion, every time. No pressure.
  • DOT Diagram  - I need a program that can make diagrams from DOT
  • DOT Generator - I can generate the DOT from my code, but it would be helpful if something could do this for me, as well.
  • Name - Right now, it's called DeNCS 2020. Narrowly edging out NewNCS. Please, give it a good name.

 

A couple of things I want to address.

DeNCS sometimes reports a partial-byte mismatch. At first, I ignored this just like every other modder. However, while testing this tool I compared the output of an NCS file straight from the game to the output DeNCS generated... Well, it looks like DeNCS attempts to convert the NCS to source code, then attempts to convert the source code back to an NCS file. Then, it compares the original NCS to the new one. And, if they don't match? Partial-byte mismatch. I don't know why DeNCS doesn't always perform a perfect conversion, but it's something to investigate. Particularly, since there are reports that even NCS files generated by Bioware and Obsidian had bugs in them. Troubleshooting the NCS files will make this longer since I have no way of knowing what the original files should have looked like. (But, I can guess just like DeNCS seems to do.)

 

This is command-line only. There was never any intention to make this a stand-alone tool. However, development will take longer than I expected, which means a GUI is the least of my priorities. At some point, I want to merge this into a toolset, maybe even the one up above. So, no GUI. (That also means the stand-alone tool won't be around for very long.)

 

Got any feedback? Thanks!

 

Edit: I just downloaded Graphviz, so I am covered with DOT diagrams. I think the included library will allow me to generate DOT, too.

  • Like 2

Share this post


Link to post
Share on other sites
  On 2/5/2020 at 8:22 PM, AmanoJyaku said:

Byte Code - I obtained a list of op codes from here. I need to make sure it's complete and correct.

Presumably you want to ask @DrMcCoy. Xoreos has done the most recent and thorough (publicly available) work on it. I gather you have perused their repo?

  On 2/5/2020 at 8:22 PM, AmanoJyaku said:

Name - Right now, it's called DeNCS 2020. Narrowly edging out NewNCS. Please, give it a good name.

I'd advise against any incorporation of "DeNCS" in the name. Xoreos has already taken "ncsdis" and "ncsdecomp". You could perhaps go with something like "NCS2NSS".

  On 2/5/2020 at 8:22 PM, AmanoJyaku said:

This is command-line only.

That's fine. The Xoreos tools are all commandline, and real men compile directly with nwnnsscomp which is commandline as well. Batch scripts are easy enough to write. And as with nwnnsscomp, if someone was keen enough they could always write a GUI front-end for it (ask @JCarter426 - he'll need a GUI project for his course later in the year presumably).

  On 2/5/2020 at 8:22 PM, AmanoJyaku said:

DeNCS sometimes reports a partial-byte mismatch.

Yes. It's presumably due to the way that DeNCS interprets the bytecode, since there's obviously not a 1:1 match between it and the source NSS. I would suggest you try decompiling some of the global scripts that have Bioware source available and compare DeNCS's output with the original NSS. There are certain quirks in the way DeNCS likes to format things, which, while functionally the same, may introduce the subtle differences it complains about with partial matches.

  • Like 1
  • Haha 1

Share this post


Link to post
Share on other sites

Yeah, do have a look at the nwscript directory: https://github.com/xoreos/xoreos-tools/tree/master/src/nwscript

Specifically, the instruction.h: https://github.com/xoreos/xoreos-tools/blob/master/src/nwscript/instruction.h , which contains a list of opcodes (including 2 each that were introduced by Dragon Age: Origins and Dragon Age II).

You might also want to look at https://github.com/xoreos/xoreos-tools/blob/master/src/nwscript/game_kotor.h and https://github.com/xoreos/xoreos-tools/blob/master/src/nwscript/game_kotor2.h for a list of engine functions and their signatures (return type + types of parameters). Alternatively, you can parse the nwscript.nss from the games yourself.

If you haven't already, have a read over my blog post about disassembling the bytecode: https://xoreos.org/blog/2016/01/12/disassembling-nwscript-bytecode/ . It also shows a bug in BioWare's compiler (though that particular bug is not present in the KotOR scripts, it was fixed before KotOR development started it seems).

If you want to decompile all scripts, you'll find that you're going to have problems with recursion, i.e. functions that call themselves. You need to analyze how a function leaves the stack to be able to continue with the code after the call, so you need to branch into a callee first before continuing with the caller. If the callee is the caller itself (or A calls B calls C calls A again)... that's a problem. This is essentially the halting problem, and there's no general solution, unfortunately.

  • Like 1

Share this post


Link to post
Share on other sites

Interestingly though, most scripts that DeNCS chokes on do not appear to be due to recursion, at least judging by running said scripts through ncsdis, which reports the presence of recursion. In my experience, it's the use of per-planet includes that trips it up the most. The module OnEnter and OnHeartbeat scripts for Tatooine in K1 are a particularly good example of this.

  • Like 1

Share this post


Link to post
Share on other sites

As for operators, you should probably also make sure you know the precedence rules. I'm not sure NWScript follows the C rules there exactly. I haven't yet looked at that, since it's unambiguous in the bytecode.

Oh, and also, there's another bug I've found, but I'm not sure in which games it was present. It had to do with parameter shadowing. I.e.

void foobar(int blah) {
	  int blah = barfoo(blah);
	}

(excuse the quirky formatting, this editor is weirder than I remember)

I've seen the produced bytecode use the uninitialized value of the just created local variable blah as in input for barfoo(), instead of the parameter of foobar().

Maybe something to have an eye on.

  On 2/6/2020 at 6:04 AM, DarthParametric said:

In my experience, it's the use of per-planet includes that trips it up the most. The module OnEnter and OnHeartbeat scripts for Tatooine in K1 are a particularly good example of this.

Hmm, depends on how DeNCS operates. Does it also try to decompile functions that are not called? I.e. if you have an include with foobar1(), foobar2() and foobar3(), and the script itself just calls foobar1() (and that doesn't call the other functions), does DeNCS also decompile foobar2() and foobar3()?

Because IIRC ncsdis completely ignores foobar2() and foobar3() in that case and wouldn't see any recursion there.

  • Like 1

Share this post


Link to post
Share on other sites

It's been a month, so here's a progress report:

  • Completed decoding of opcodes, data types, header fields
    • Validating basic fields and generating (helpful?) error messages
  • Embedded NWScript engine calls into program code
    • No need for modders to provide nwscript.nss
  • Created call graphs of subroutines
    • Generating graph description files for Graphviz renderer
  • Created control-flow graphs of basic blocks
    • Generating graph description files for Graphviz renderer
  • Created call stack simulator
    • Keeping track of which variables are modified

To do:

  • Identifying iteration, selection and jump statements
  • Operator associativity and precedence
  • Type conversions
  • Byte code conversion to source code
  • Source code conversion to byte code
  • GUI
  • Setup new dev laptop (dropped current laptop last night, awaiting delivery of new one) 😢
  • Probably more stuff, but I don't know what I don't know, you know?
  • Like 3
  • Light Side Points 1

Share this post


Link to post
Share on other sites

It's been quite interesting these last few weeks, eh? I hope you are all doing well.

There's been progress in analyzing the NCS byte code. Graphviz has been invaluable in producing the maps necessary to visualize the program flow. At the moment, one script has been reverse engineered. By no means is a decompilier close to being ready, but it is a significant step in analyzing the program flow. At the moment, several NCS code patterns have been identified that map back to NWScript:
 

Dead code (e.g. perfectly valid code that never gets called, and theoretically could be removed)

Assignment (e.g. i = 5)

Named variable declaration (e.g. int i, float f, string s, object o)
    initialized from constant (e.g. int i = 5, float f = 0.0, string s = "Jello, World!")
    initialized from engine routine (object o = GetFirstPC())
    initialized from subroutine (bool b = IsItTrue())
    initialized from named variable (int i = integer_i )

Selection
    If (GetLocalBoolean(55))
    If (!GetLocalBoolean(55))
    If (IsItTrue())
    If (!IsItTrue()
    Switch (i) {case 0: break; default: break;}

Attached is a control flow diagram of the script k_sup_galaxymap.ncs. Red blocks are dead code. There are three switch statements in the code. Can you find them?

Expect more updates in the next few weeks!

 

k_sup_galaxymap.ncs.png

  • Like 3

Share this post


Link to post
Share on other sites
  On 3/19/2020 at 10:10 PM, AmanoJyaku said:

Dead code (e.g. perfectly valid code that never gets called, and theoretically could be removed)

I'm assuming these are all the debug functions? PrintString, AurPostString and the like? I was playing with those only yesterday and noticed that despite removing if (!ShipBuild()) checks, they still didn't appear to function. I assume they gutted them from the retail release altogether? While they are unnecessary for the code to run, I would argue they should remain in any decompiled scripts since, in the absence of commented source for the module scripts, the debug functions are the only thing remaining that can indicate developer intent. Additionally, the user can manually change them to SendMessageToPC for their own debugging purposes if needed.

  • Like 1
  • Thanks 1

Share this post


Link to post
Share on other sites
  On 3/20/2020 at 5:57 AM, DarthParametric said:

I'm assuming these are all the debug functions? PrintString, AurPostString and the like? I was playing with those only yesterday and noticed that despite removing if (!ShipBuild()) checks, they still didn't appear to function. I assume they gutted them from the retail release altogether? While they are unnecessary for the code to run, I would argue they should remain in any decompiled scripts since, in the absence of commented source for the module scripts, the debug functions are the only thing remaining that can indicate developer intent. Additionally, the user can manually change them to SendMessageToPC for their own debugging purposes if needed.

It may be that those weren't intended to print anything in the game itself but a console window, which would be present in a working/debug version of the game. If that's the case, it's not so much that they don't function as that they have no effect.

In any case, anything that can be preserved, should be!

  • Like 1

Share this post


Link to post
Share on other sites
  On 3/20/2020 at 5:57 AM, DarthParametric said:

I'm assuming these are all the debug functions? PrintString, AurPostString and the like? I was playing with those only yesterday and noticed that despite removing if (!ShipBuild()) checks, they still didn't appear to function. I assume they gutted them from the retail release altogether? While they are unnecessary for the code to run, I would argue they should remain in any decompiled scripts since, in the absence of commented source for the module scripts, the debug functions are the only thing remaining that can indicate developer intent. Additionally, the user can manually change them to SendMessageToPC for their own debugging purposes if needed.

My apologies. As with all things STEM, terminology must be accurate. I am learning compiler design in order to create this compiler/decompiler, so I am still unfamiliar with communicating this topic effectively.

 

"Dead code" has two meanings, one more accurate than the other:

  • Instructions that compute a value that is never used
  • Unreachable code

The red blocks do not represent "instructions that compute a value that is never used". They may fit the first definition, but I am not looking for that in these diagrams.

The red blocks are "unreachable code"; they never get called, which is why they aren't preceded by another green block. Only the function entry point (the very first block in a function/subroutine) should be without a preceding block.

 

The image above has 10 functions/subroutines, each represented by a box filled with one or more colored blocks. (Accuracy alert!!! Functions and subroutines are similar, but not identical. The difference only matters to academics and compiler designers, but this entire post is about pedantry...) Thus, there should only be 10 blocks that aren't preceded by another green block: the function entry points. If a block is in red, that's because no other block explicitly jumps to it or implicitly proceeds to it. These blocks are therefore useless, and could be removed if they serve no other purpose. x86/x86-64 uses the NOP op code to allow for instruction padding to improve memory alignment, and resulting access times. To my knowledge, such a technique has no effect in the NWScript runtime. If I am correct, then the code could be stripped. My concern is that perhaps there IS a reason for these blocks, or that the existing script compilers are riddled with bugs...

 

Finally, these blocks are not the result of debug code. One of the switch statements in the diagram has a debug function under a case label in its source.

switch(nPlanet)
{
case PLANET_PERAGUS:
    {
        AurPostString("ERROR: We should not be able to travel back to peragus.",0,10,5.0);
    }
break;
//Other case statements removed
}

 

The compiled case label:

case 8:
		0x00000ffb CPTOPSP -4 4
		0x00001003 CONSTI 8	//PLANET_PERAGUS defined as 8 in nwscript.nss
		0x00001009 EQUALII
		0x0000100b JNZ 0x000010dd

The target block in the compiled file:

case 8 block
0x000010dd
		0x000010dd CONSTF 0.000000
		0x000010e3 CONSTI 10
		0x000010e9 CONSTI 0
		0x000010ef CONSTS ERROR: We should not be able to travel back to peragus.
		0x0000112a ACTION 582 4	//Engine functions are zero-indexed, AurPostString() is #582, and it takes 4 parameters
		0x0000112f JMP 0x0000158e

Case label 8 jumps to the Case 8 block, and calls the debug function. No unreachable code here!

  • Like 1

Share this post


Link to post
Share on other sites

I don't remember seeing this exacty case. Then again, it's been a while since I touched that code and I'm not known to have a good memory. :P Also, my disassembler just follows the starting segment and IIRC I don't even check for unreachable blocks without an edge leading into them. That's maybe something that could use improvement. I take GitHub pull requests! ;)

However, I can say that I have found a few bugs in BioWare's script compiler (I explained two in my replies above, I think), so I wouldn't particularily rule that out.

And while I can't name any specifics right now, I also never got the feeling that their compiler was all that great in optimizing, fusing instructions, or removing outright unnecessary instructions. I don't think they focussed on that at all, rather keeping it simple and working without any surprises (bugs nonwithstanding).

  • Like 1

Share this post


Link to post
Share on other sites
  On 3/20/2020 at 4:38 PM, DrMcCoy said:

I don't remember seeing this exacty case. Then again, it's been a while since I touched that code and I'm not known to have a good memory. :P Also, my disassembler just follows the starting segment and IIRC I don't even check for unreachable blocks without an edge leading into them. That's maybe something that could use improvement. I take GitHub pull requests! ;)

However, I can say that I have found a few bugs in BioWare's script compiler (I explained two in my replies above, I think), so I wouldn't particularily rule that out.

Unreachable code, by definition, should have no effect on script execution. Analysis of unreachable code is simply to facilitate analysis of reachable code.For example:

0xNNNNNNNN CPTOPSP -4 4
0xNNNNNNNN CONSTI N
0xNNNNNNNN EQUALII
0xNNNNNNNN JNZ 0xNNNNNNNN

Is analyzed to see if it is a switch case label. Every case label I've seen fits this pattern, and for all but the first case label the above code is the entirety of a basic block. However, the first case label has instructions prior to CPTOPSP. It's necessary to see if such code is unreachable, and therefore part of a separate basic block, or actually a functional part of the first case's basic block.

And, that's assuming it's actually a switch! It could just be an if (false) {} statement, but we have to analyze the flow to find out!

  On 3/20/2020 at 4:38 PM, DrMcCoy said:

And while I can't name any specifics right now, I also never got the feeling that their compiler was all that great in optimizing, fusing instructions, or removing outright unnecessary instructions. I don't think they focussed on that at all, rather keeping it simple and working without any surprises (bugs nonwithstanding).

I won't rule out optimizations, but I doubt they exist. They would be meaningless since NWScript isn't meant for high-performance. We aren't using NWScript to create databases, calculate protein folds, or process bank transactions. Slow code won't be noticeable, but such optimizations are error-prone and require a lot of development effort that clearly would have been better spent elsewhere.

  • Like 1

Share this post


Link to post
Share on other sites
  On 3/20/2020 at 2:42 PM, AmanoJyaku said:

I am learning compiler design in order to create this compiler/decompiler, so I am still unfamiliar with communicating this topic effectively.

It's like I'm looking at my future, and my future is written in assembly.

 

I whipped up some tests and attached them. I don't know how much help they'll be, though, since if anything they only confirm what's already been said.

The compiler we have, at least, does

  On 2/6/2020 at 6:08 AM, DrMcCoy said:

Does it also try to decompile functions that are not called?

  On 2/6/2020 at 6:16 AM, DarthParametric said:

No, because those wouldn't be pulled into the compiled script if it didn't call them.

The preprocessor that we're using, at least, will only include code that is actually used. Which is good. My tests confirm that. Based on the garbage I've seen in decompiled game scripts, though, I think it's possible BioWare's might not have been as optimized. I often see tons of integers declared in decompiled scripts that are never used. I can't remember if I've ever seen any functions I was sure weren't used, though.

  On 3/20/2020 at 2:42 PM, AmanoJyaku said:

"Dead code" has two meanings, one more accurate than the other:

  • Instructions that compute a value that is never used
  • Unreachable code

I tried various things that should lead to dead code but I think it's all of the first variety. I tried to make an unreachable block in the 6th one, but I still see jump points in the disassembled code.

ncstest.zipFetching info...

Share this post


Link to post
Share on other sites
  On 3/20/2020 at 11:59 PM, JCarter426 said:

I tried various things that should lead to dead code but I think it's all of the first variety. I tried to make an unreachable block in the 6th one, but I still see jump points in the disassembled code.

Scripts 1-5 have unreachable code (remember, I'm ignoring dead/unused code for now). Script 6 is fine, all basic blocks are reachable. How did you compile this?

 

  On 3/20/2020 at 11:59 PM, JCarter426 said:

It's like I'm looking at my future, and my future is written in assembly.

*triggered*

Share this post


Link to post
Share on other sites
  On 3/22/2020 at 10:17 PM, JCarter426 said:

I compiled with the version of NWNSSCOMP found here. In KOTOR 2 mode, if that matters.

Ooooooooooooh!!!

There are issues in almost all of the tools on that site. I'm sure it's the case for NWNSSCOMP, as well, although I haven't used it myself. Don't need to, either, seeing as how the scripts you gave me are compiled incorrectly. I'm particularly concerned about this:

  Quote

'Star Wars: Knights of the Old Republic' Script Compiler/Decompiler
based on 'NeverWinter Nights' Script Compiler/Decompiler
Copyright 2002-2003, Edward T. Smith
Modified by Hazard (hazard_x@gmx.net)
Modified further tk102 for stoffe -mkb- (v0.03b)

Modified? Modified how? Where's the source? Was this reverse engineered? Etc... Given that this was worked on by three different people/groups, my guess is someone screwed up while making updates. Ah, well... All the more reason to produce a new compiler.

 

Edit:

So, I just read Torlak's old site. Here's the full description of NWNSSCOMP and it's potential problems. tl;dr, there most likely are bugs in NWNSSCOMP:

  Reveal hidden contents

 

 

Share this post


Link to post
Share on other sites

Once your work is far enough along and you have a spare moment @AmanoJyaku, I'm curious if you can finally resolve the problem I mentioned above regarding DeNCS choking on decompiling some scripts that doesn't seem related to recursion (or at least ncsdis doesn't report it). I just found another one (attached if you are curious) as I was poking through various things, again from Tatooine which seems to be heavily affected by the issue, so it reminded me. I'm extremely curious to know the whys and wherefores.

Non-decompilable_script.7zFetching info...

  • Like 2

Share this post


Link to post
Share on other sites

Interesting. It's definitely not recursion. There are only four functions:

1) _start()

2) _global()

3) main()

4) sub1()

The main() function only calls sub1() once, and sub1() never calls a user-defined function. So, only an engine routine could recur. More interestingly, the file is very short and very simple.

 

I suspect I know what the problem is, because I see something in the code I've never seen anywhere else. But, I would like to withhold assumptions until I can analyze another file that doesn't decompile. Can you supply another?

  • Like 1

Share this post


Link to post
Share on other sites

Sure. Here's another short one from the very same conversation:

Non-decompiling_same_convo.7zFetching info...

And here are some other ones that we came across developing K1CP v1.8. Mostly module OnEnters and OnHeartbeats that have a bit more going on, but also some smaller trigger scripts and the like. Mostly from Tatooine, but there are also a couple of Manaan ones:

Other_non-decompiling.7zFetching info...

Share this post


Link to post
Share on other sites

Veeeery interesting. I'm seeing the same thing in all files, so I hope it's the culprit.

Basically, every NCS file I've worked on uses global variables initialized from constants. However, the files you've given me all include a global that's initialized from an engine routine! I'm wondering if DeNCS is choking on that.

0x00000c67 RSADDS
0x00000c69 CONSTI 32289
0x00000c6f ACTION 239 1
0x00000c74 CPDOWNSP -8 4
0x00000c7c MOVSP -4

// 239: Get a string from the talk table using nStrRef.
string GetStringByStrRef(int nStrRef);

 

  • Like 3
  • Thanks 2

Share this post


Link to post
Share on other sites

Presumably not coincidentally, both the Tatooine and Manaan module includes contain the following in their constants list:

string RACE_DEFAULT = GetStringByStrRef(32289);

And, as discussed previously, we know Bioware's compiler pulled in all constants from any listed includes, regardless of whether they were required or not. @JCarter426 also determined that the KOTOR nwnnsscomp also does this, despite the original NWN version being fixed to prevent that.

  • Like 2

Share this post


Link to post
Share on other sites
Guest
This topic is now closed to further replies.