DrMcCoy 40 Posted January 15, 2016 Over at xoreos.org I wrote a blog post about Disassembling NWScript Bytecode. This is basically a progress report of me writing an NWScript disassembler, which might be extended into a full-fledged decompiler. It might be of interest to people here. 2 Quote Share this post Link to post Share on other sites
1Leonard 134 Posted January 15, 2016 Over at xoreos.org I wrote a blog post about Disassembling NWScript Bytecode. This is basically a progress report of me writing an NWScript disassembler, which might be extended into a full-fledged decompiler. It might be of interest to people here. Interesting. I respect the work you put in xoreos! 1 Quote Share this post Link to post Share on other sites
DrMcCoy 40 Posted February 1, 2016 In case anybody wants to try this out, but couldn't compile xoreos-tools from sources: xoreos and xoreos-tools 0.0.4 has been released, which includes binaries of ncsdis for GNU/Linux, Windows and Mac OS X. It is still a command line tool, though, without a GUI. Quote Share this post Link to post Share on other sites
DarthParametric 3,797 Posted February 2, 2016 Nice, especially given that it handles DA and DA2 scripts (as far as I am aware there have never been any publicly available decompilers for those). Any plans to turn into a human-readable script format for simple peons like myself? So for example instead of this: ; 85 bytes, 12 instructions ; Engine types: ; E0: Effect ; E1: Event ; E2: Location ; E3: Talent _start: 0000000D 1E 00 00000008 JSR main 00000013 20 00 RETN -------- -------------------------- --- main: ; void main() 00000015 04 05 str CONSTS "" 00000019 04 05 str CONSTS "" 0000001D 04 05 str CONSTS "" 00000021 04 05 str CONSTS "" 00000025 04 05 str CONSTS "" 00000029 04 05 str CONSTS "05r" 00000030 04 05 str CONSTS "lev40_wpconfa" 00000041 04 05 str CONSTS "lev_m40ac" 0000004E 05 00 01FD 08 ACTION StartNewModule 8 00000053 20 00 RETN -------- -------------------------- --- you get this: void main() { StartNewModule("lev_m40ac", "lev40_wpconfa", "05r", "", "", "", "", ""); } We already have DeNCS which does this for most K1/TSL scripts, but there are some it fails on. 1 Quote Share this post Link to post Share on other sites
DrMcCoy 40 Posted February 2, 2016 Any plans to turn into a human-readable script format for simple peons like myself? Yes, that would be the decompiler step I talk about. Basically, iterate over the blocks, check what control structures they belong to (which is already detected) and generate NWScript source code from that. That's something that's planned and should be reasonably possible. I did already experiment with that without the control structures: https://gist.github.com/DrMcCoy/6c3ce1131707bd97ebef (though the code that did that is not public, as it's a big hack). Now think while-loops and if-conditions linking the code segments instead of goto sprinkled everywhere, and this would be the output you get. Somebody (or I, in the future) will "only" have to connect that codegen with the control structures and clean it up. As you can see there, there are a lot of unnecessary duplicate temporary variables. The next step then is to eliminate those, by simplifying and unifying expressions. Especially for conditions in ifs and whiles, that would be interesting. And currently, structs and vectors aren't recognized as such (instead, you get their separate members as single variables). Again, if somebody is interested in taking up these tasks, please contact me. We already have DeNCS which does this for most K1/TSL scripts, but there are some it fails on. I can say, though, that right from the get-go, my decompiler would fail on some scripts too. Namely, scripts with recursion. I.e. functions that call themselves. My approach of analyzing the stack (which is already done) fails for these scripts, because it needs to finish analysation of a function before continuing analyzing its caller [1]. So when I have a function that calls itself (even over multiple levels), I'm biting my own tail, so to speak. ncsdis currently already recognizes that and stops deeper analysis of these scripts. So you won't see function signature comments (like that "; void main()" in your example there) in these scripts, only the bare disassembly. There's not really anything I can do about it as such, as that's similar to the halting problem. There might be some clever approach I could switch to then that works for certain cases (maybe even most that occur in exiting scripts?), but so far, I haven't found one. Again again, anybody with a lead and/or interesting in implementing anything there, please contact me. [1] Because I need to know how the function manipulates the stack. A function pops off its arguments at the end. Quote Share this post Link to post Share on other sites
DarthParametric 3,797 Posted February 2, 2016 I can say, though, that right from the get-go, my decompiler would fail on some scripts too. Namely, scripts with recursion. I.e. functions that call themselves. Ah, so I would guess that is probably the same problem DeNCS has then. Quote Share this post Link to post Share on other sites
DrMcCoy 40 Posted February 2, 2016 That's quite likely, yes. Quote Share this post Link to post Share on other sites
InSidious 237 Posted February 8, 2016 Didn't Torlack (?) write up some stuff on reading compiled NWScript years ago? Might be of use to you, if you haven't got it, but I'm not sure if it's still all that easily found online.... Quote Share this post Link to post Share on other sites
DrMcCoy 40 Posted February 8, 2016 Yeah, Torlack's docs (and Skywing's NWN2 JIT NWScript thing) provided the base for my understanding of NWScript. Quote Share this post Link to post Share on other sites
DrMcCoy 40 Posted February 10, 2016 Oh, and while it's not really on-topic for this thread, I did an interview with "The Linux Gamer" about xoreos: https://www.youtube.com/watch?v=kDEmpUJviRM It's more a general overview of what xoreos is, the current state, etc. In my badly accented, mumbling English. 1 Quote Share this post Link to post Share on other sites
LiliArch 115 Posted March 8, 2016 Ah, so I would guess that is probably the same problem DeNCS has then.DeNCS fails when there is an included script, so I'd assume it is not the same problem. Quote Share this post Link to post Share on other sites
DarthParametric 3,797 Posted March 8, 2016 I'm not sure that it precludes the same problem from occurring. There's nothing that says DeNCS can't have more than a single fail state. Quote Share this post Link to post Share on other sites
DrMcCoy 40 Posted March 8, 2016 DeNCS fails when there is an included script, so I'd assume it is not the same problem. There's nothing in the bytecode that says "this here is an included script". The include basically just means that the compiler copies the contents of the include into your script. The effect would be the same if you manually copy-pasted the contents of the include yourself. You couldn't tell one method apart from the other in the bytecode. So if it fails for certain includes, that would just mean that these includes contain recursive functions. And yes, this is what I have found: some generic utility includes contain functions that use recursion to solve their task. This means that the bytecode of every script that uses this utility include contains recursive functions. Because the #include just means "copy what's in this file here wholesale". That's still entirely consistent with my guess that this is the same issue between DeNCS and my ncsdis. EDIT: See also this Wikipedia article on the include directive: https://en.wikipedia.org/wiki/Include_directive, especially the part about C/C++ (because NWScript takes a lot of cues from C). Quote Share this post Link to post Share on other sites
LiliArch 115 Posted March 8, 2016 It never crossed my mind to actually track down WHAT in those included scripts causes DeNCS to fail... ah well, I seem to definitely need a break. Quote Share this post Link to post Share on other sites