DrMcCoy

Disassembling NWScript Bytecode

Recommended Posts

Nice, especially given that it handles DA and DA2 scripts (as far as I am aware there have never been any publicly available decompilers for those).

 

Any plans to turn into a human-readable script format for simple peons like myself? So for example instead of this:

; 85 bytes, 12 instructions

; Engine types:
; E0: Effect
; E1: Event
; E2: Location
; E3: Talent

_start:
  0000000D 1E 00 00000008             JSR main
  00000013 20 00                      RETN
  -------- -------------------------- ---
main: ; void main()
  00000015 04 05 str                  CONSTS ""
  00000019 04 05 str                  CONSTS ""
  0000001D 04 05 str                  CONSTS ""
  00000021 04 05 str                  CONSTS ""
  00000025 04 05 str                  CONSTS ""
  00000029 04 05 str                  CONSTS "05r"
  00000030 04 05 str                  CONSTS "lev40_wpconfa"
  00000041 04 05 str                  CONSTS "lev_m40ac"
  0000004E 05 00 01FD 08              ACTION StartNewModule 8
  00000053 20 00                      RETN
  -------- -------------------------- ---
you get this:

void main() {
	StartNewModule("lev_m40ac", "lev40_wpconfa", "05r", "", "", "", "", "");
}
We already have DeNCS which does this for most K1/TSL scripts, but there are some it fails on.
  • Like 1

Share this post


Link to post
Share on other sites

Any plans to turn into a human-readable script format for simple peons like myself?

Yes, that would be the decompiler step I talk about. Basically, iterate over the blocks, check what control structures they belong to (which is already detected) and generate NWScript source code from that. That's something that's planned and should be reasonably possible.

 

I did already experiment with that without the control structures: https://gist.github.com/DrMcCoy/6c3ce1131707bd97ebef (though the code that did that is not public, as it's a big hack). Now think while-loops and if-conditions linking the code segments instead of goto sprinkled everywhere, and this would be the output you get. Somebody (or I, in the future) will "only" have to connect that codegen with the control structures and clean it up.

 

As you can see there, there are a lot of unnecessary duplicate temporary variables. The next step then is to eliminate those, by simplifying and unifying expressions. Especially for conditions in ifs and whiles, that would be interesting. And currently, structs and vectors aren't recognized as such (instead, you get their separate members as single variables).

 

Again, if somebody is interested in taking up these tasks, please contact me.

 

 

We already have DeNCS which does this for most K1/TSL scripts, but there are some it fails on.

 

I can say, though, that right from the get-go, my decompiler would fail on some scripts too. Namely, scripts with recursion. I.e. functions that call themselves. My approach of analyzing the stack (which is already done) fails for these scripts, because it needs to finish analysation of a function before continuing analyzing its caller [1]. So when I have a function that calls itself (even over multiple levels), I'm biting my own tail, so to speak.

 

ncsdis currently already recognizes that and stops deeper analysis of these scripts. So you won't see function signature comments (like that "; void main()" in your example there) in these scripts, only the bare disassembly.

 

There's not really anything I can do about it as such, as that's similar to the halting problem. There might be some clever approach I could switch to then that works for certain cases (maybe even most that occur in exiting scripts?), but so far, I haven't found one.

 

Again again, anybody with a lead and/or interesting in implementing anything there, please contact me. :D

 

[1] Because I need to know how the function manipulates the stack. A function pops off its arguments at the end.

Share this post


Link to post
Share on other sites

Didn't Torlack (?) write up some stuff on reading compiled NWScript years ago? Might be of use to you, if you haven't got it, but I'm not sure if it's still all that easily found online....

Share this post


Link to post
Share on other sites

DeNCS fails when there is an included script, so I'd assume it is not the same problem.

 

There's nothing in the bytecode that says "this here is an included script". The include basically just means that the compiler copies the contents of the include into your script. The effect would be the same if you manually copy-pasted the contents of the include yourself. You couldn't tell one method apart from the other in the bytecode.

 

So if it fails for certain includes, that would just mean that these includes contain recursive functions. And yes, this is what I have found: some generic utility includes contain functions that use recursion to solve their task.

 

This means that the bytecode of every script that uses this utility include contains recursive functions. Because the #include just means "copy what's in this file here wholesale".

 

That's still entirely consistent with my guess that this is the same issue between DeNCS and my ncsdis. :D

 

EDIT: See also this Wikipedia article on the include directive: https://en.wikipedia.org/wiki/Include_directive, especially the part about C/C++ (because NWScript takes a lot of cues from C).

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.