AmanoJyaku 184 Posted September 19, 2020 1 minute ago, DarthParametric said: Is there a game-specific identifier in the NCS header? I thought both were flagged as NCS V1.0B. No, there isn't. That's why I had to write a detection algorithm. Share this post Link to post Share on other sites
JCarter426 1,216 Posted September 19, 2020 Ah, I should've mentioned before. Not sure how much help it'll be now if you already worked through all the changes, but here are my notes on the differences between the function definitions in each game that I took back when I was making Odyssey++. Share this post Link to post Share on other sites
AmanoJyaku 184 Posted September 20, 2020 @JCarter426 Thanks! I'll look at them later just to be sure I haven't missed anything. I was supposed to do it earlier, I just forgot about it until I was Force Persuaded to. 😋 Share this post Link to post Share on other sites
DrMcCoy 40 Posted September 20, 2020 12 hours ago, AmanoJyaku said: Version Detection The algorithm was fine, the problem was the input. I was using K2 function definitions, some of which had changed from K1. Now, NCS files are identified as K1 or K2! It's probably easier and safer just query the user on what game this script is for (via a command line option, for example), than to dependant on some brittle heuristic to detect that. 12 hours ago, AmanoJyaku said: Evaluation of for and while loops They look identical, with the only potential difference being an incremented or decremented value at the end of a for loop Now that the algorithm described above has been completed, it should be easy to determine the rest of the code that makes a for loop unique I've temporarily given up on do-while loops because only one file has one, and I'm not entirely certain it's in the compiled NCS file If you lack files to test this on, you could get Neverwinter Nights and install its toolset, because that came with an official BioWare NWScript compiler. Yes, it's for NWN, but the control structure stuff should be identical. 12 hours ago, AmanoJyaku said: Figuring out the DESTRUCT op code I've seen this used to destroy variables on the stack that aren't necessary, but prevent the top-of-stack behavior used by NCS Although this shouldn't be necessary, since the desired variable could just be copied to the top of the stack using the CPTOPSP op code... Dunno if I already wrote that here or just in IRC, on a GitHub issue or wherever, but: the interesting thing about the DESTRUCT opcode is that it's used to single out individual struct members. I.e. whenever the nss used a struct member, the ncs copies the whole struct to the top of the stack and used DESTRUCT on this new block in the stack to get rid of everything but the single struct member it's interested in. Which means you can use the existance of DESTRUCT to identify structs. Share this post Link to post Share on other sites
AmanoJyaku 184 Posted September 20, 2020 9 hours ago, DrMcCoy said: It's probably easier and safer just query the user on what game this script is for (via a command line option, for example), than to dependant on some brittle heuristic to detect that. I considered that, but the differences are obvious and the detection simple. Are you aware of any pitfalls, or are you just being cautious? This is a beta release, there's plenty of time to change functionality. 9 hours ago, DrMcCoy said: If you lack files to test this on, you could get Neverwinter Nights and install its toolset, because that came with an official BioWare NWScript compiler. Yes, it's for NWN, but the control structure stuff should be identical. Given the fact that one file out of 2,500 has a do-while loop, it's low priority. I'll get around to it after more pressing matters are dealt with. 9 hours ago, DrMcCoy said: Dunno if I already wrote that here or just in IRC, on a GitHub issue or wherever, but: the interesting thing about the DESTRUCT opcode is that it's used to single out individual struct members. I.e. whenever the nss used a struct member, the ncs copies the whole struct to the top of the stack and used DESTRUCT on this new block in the stack to get rid of everything but the single struct member it's interested in. Which means you can use the existance of DESTRUCT to identify structs. Thanks, I thought as much. I first saw it in use destroying two elements of a vector, so I figured its purpose was for destroying unused elements of aggregates. I just need to identify which elements are preserved. Share this post Link to post Share on other sites
AmanoJyaku 184 Posted October 3, 2020 It's time for a monthly update, and I said I might have a beta ready by now. Well, there won't be a beta. Spoiler I'm skipping the beta to complete the "final" release! The to-do list is nearly complete: Evaluating subroutines - Complete Input parameters and return values have been discovered Merging the control structure algorithm with the virtual stack - Complete Subroutine scopes and variables are correctly identified Evaluation of for and while loops - Nearly complete While loops - Complete For loops - In progress Evaluation of switch statements - Nearly complete Case labels and statements - Complete Default label and statement - In progress Figuring out the DESTRUCT op code - In progress Not started, but not expecting it to take much work Will have more news next week. 2 Share this post Link to post Share on other sites
AmanoJyaku 184 Posted October 20, 2020 Sorry that this is taking longer than I wanted. I was lucky enough to get some contracts, but that also means very little free time. While a decompiler is an all-or-nothing program (hence the reason there won't be a beta), I can show off some sample output: Spoiler '.\NCS Analyzer.exe' '..\..\Compiled Scripts\Kotor2\k_inc_npckill.ncs' This is a K2 file void Sub1() void Sub2() Integer Integer Integer void Sub3(???, ???, ???) Integer Integer Location Float Float Float Float Location Object int Sub4(???) Integer Integer Integer void Sub5(???, ???) void Sub6(???, ???) Effect It's not much, but you can see following: It identifies the NCS as a Kotor2 file, which is reliant on the code including K2 engine functions There are six subroutines _start(), which is automatically included by the compiler void main() void DamagingExplosion( object oCreature, int nDelay, int nDamage ) int GR_GetGrenadeDC(object oTarget) void NonDamagingExplosion(object oCreature, int nDelay) void KillCreature(object oCreature, int nDelay ) The types of return values are found by examining the called subroutine The types of parameters must be found by examining caller subroutines (strange as it may seem, there's no guarantee a parameter is used) The local variables are listed in the order in which they are created As simple as this may seem, it's been hell trying to figure out how NCS works due to limited documentation, time, and mental capacity (🤪). That said, I think I now know everything there is to know about NCS. Even how to deal with recursion (it took less than an hour), identifying vectors and structs (they do disappear in the bytecode, but there are code patterns to look out for), and handling certain errors in the game's code. So, the two major tasks left are: Including block scopes, e.g. if-else and while statements (I can identify them, I just haven't put them in among the locals) Handling expressions, commonly known as operator precedence and associativity, e.g. int d = (a + b) * c 1 Share this post Link to post Share on other sites
DarthParametric 3,790 Posted October 20, 2020 1 hour ago, AmanoJyaku said: there's no guarantee a parameter is used Interesting. I had assumed that the compiler substituted in default values when compiling the NCS, but are you saying that it's the engine that does that at runtime? Which would also mean DeNCS adds missing defaults back in when decompiling then. Share this post Link to post Share on other sites
AmanoJyaku 184 Posted October 20, 2020 11 hours ago, DarthParametric said: Interesting. I had assumed that the compiler substituted in default values when compiling the NCS, but are you saying that it's the engine that does that at runtime? Which would also mean DeNCS adds missing defaults back in when decompiling then. Sorry, I wasn't clear. You are correct about the compiler substituting defaults into the bytecode, the engine does not substitute at runtime. (I'm preparing a post to better explain NCS in case someone wants to port this to a different language, or make a better decompiler.) Spoiler Let's say you have the call Sub0(1, 2, 3): void main() { Sub0(1, 2, 3); return; } void Sub0(int a, int b, int c) { b + c; return; } The NCS call in void main() looks like this: CONSTI 3 CONSTI 2 CONSTI 1 JSR Sub0 The inside of Sub0 looks like this: //Top of the stack is always 0 //Upon entry to the subroutine -4 is a, -8 is b, -12 is c CPTOPSP -12 4 //Now -4 is the copy of c, -8 is a, -12 is b, -16 is c CPTOPSP -12 4 //Now -4 is the copy of b, -8 is the copy of c, -12 is a, -16 is b, -20 is c ADDII //Now -4 is the result of b + c, -8 is a, -12 is b, -16 is c MOVSP -4 //Now -4 is a, -8 is b, -12 is c MOVSP -12 //All parameters destroyed RETN This is a silly example. But, it demonstrates the impossibility of identifying the type and value of an unused parameter from inside a function. I can't say I've seen this in practice. There are 2,500 files, I'm not looking at them all that closely. But I have to code for the possibility, or risk corrupting the stack. Share this post Link to post Share on other sites
DarthParametric 3,790 Posted October 20, 2020 Ah, you were talking about the function not using all the stated parameters. I thought you were talking about calling a function without specifying all the parameters. I suppose that sort of sloppiness is not out of the question when dealing with mod-generated scripts. Share this post Link to post Share on other sites
AmanoJyaku 184 Posted November 20, 2020 I forgot to do the monthly update. 🤯 I was all set to finish the decompiler, and of course I ran into trouble. First, I had a family problem that has since been resolved. The second is a mundane issue: my laptop has been suffering from BSODs for the past two months, and it's happening with increasing frequency. Sometimes when I wake the machine from sleep or hibernation it BSODs, and when I cold boot it doesn't see any storage. This only happens when it's on battery, and it seems to be when the battery is below 50%. Now that I have an idea of what triggers this I can work around it, but it was affecting my productivity. The third is an issue with NCS files. My decompiler relied on files being written "correctly", so of course it bombed when fed a poorly written file. For example, k_inc_npckill: Spoiler void main() { int nKillMode = GetScriptParameter(1); int nDelay = GetScriptParameter(2); int nDamage = GetScriptParameter(3); if ( nKillMode == 0 ) { DamagingExplosion(OBJECT_SELF, nDelay, nDamage); return; } if ( nKillMode == 1 ) { NonDamagingExplosion(OBJECT_SELF, nDelay); return; } if ( nKillMode == 2 ) { KillCreature(OBJECT_SELF, nDelay); return; } } Nothing wrong with the source. Let's look at the NCS: Spoiler void main() 21 RSADDx 23 CONSTI 1 29 GetScriptParameter 34 CPDOWNSP -2 1 42 MOVSP -1 48 RSADDx 50 CONSTI 2 56 GetScriptParameter 61 CPDOWNSP -2 1 69 MOVSP -1 75 RSADDx 77 CONSTI 3 83 GetScriptParameter 88 CPDOWNSP -2 1 96 MOVSP -1 102 CPTOPSP -3 1 110 CONSTI 0 116 EQUALxx 118 JZ 170 124 CPTOPSP -1 1 132 CPTOPSP -3 1 140 CONSTO object 146 JSR 298 152 MOVSP -3 158 JMP 296 *164 JMP 170 170 CPTOPSP -3 1 178 CONSTI 1 184 EQUALxx 186 JZ 230 192 CPTOPSP -2 1 200 CONSTO object 206 JSR 1601 212 MOVSP -3 218 JMP 296 *224 JMP 230 230 CPTOPSP -3 1 238 CONSTI 2 244 EQUALxx 246 JZ 290 252 CPTOPSP -2 1 260 CONSTO object 266 JSR 1637 272 MOVSP -3 278 JMP 296 *284 JMP 290 290 MOVSP -3 296 RETN The lines with asterisks are dead code, they never get executed and the game is fine with that. Problem is, my decompiler was expecting the last operation of the true branch of an if statement to be part of the control path. The last op of the true branch is what tells you if you're looking at a regular if, an if-else, or an if that exits the script as seen above. The decompiler was looking at the wrong thing, and returning the wrong results. I've now fixed that with an additional set of evaluations. Now I'm back to working on producing output. This is tedious as it requires keeping track of data, individual operations, and the context in which the data and operations are being used. Believe it or not, a block of code will decompile differently based on code that comes before and after it. (For example, RSADDx does not mean "create a named variable" as I incorrectly assumed months ago. It could create a named variable or a temporary variable, based on the context.) This morning I had an idea as to the overall rule for determining the beginning and end of statements, so I'll be working on that this weekend. I hope to have a status update in a few days. 2 Share this post Link to post Share on other sites
AmanoJyaku 184 Posted November 27, 2020 Hope everyone got their fill of roast porg yesterday! The Gap - No, We're Not Selling Pants Spoiler My last update touched on NWScript statements and the problem of RSADDx. Recreating the original statements is the ultimate goal of a decompiler. Yet, there is no handbook on what combinations of NCS code map to NWScript statements. All we have is Skywing's website that lists the 45 opcodes, the data types, and the stack operation. And even that information is incomplete or incorrect. So, I've spent the past 9 months reading through NSS and NCS files to see how opcodes are used so that I can fill in the gaps. For example, Skywing's doc describes MOVSP like so: Quote MOVSP - Adjust the Stack Pointer Add the value specified in the instruction to the stack pointer. The value of SP is adjusted by the value specified. Well, that's useless... If you're familiar with x86 assembly you probably know that moving the stack pointer allocates and deallocates space on the stack for variables. Enter a function with four 32-bit local variables and the stack pointer moves 4x32-bits in one direction. Exit the function and the stack pointer moves 4x32-bits in the other direction. So, you might assume the same in NCS. And you would be wrong. I've only ever seen negative values passed to MOVSP, which means that value that is "added" is in fact subtracted. MOVSP serves one purpose only: to destroy variables. OK, so what? Well, now that we know MOVSP only destroys variables we know what's happening whenever we see it: 21 RSADDx 23 CONSTx 0 29 CPDOWNSP -2 1 37 MOVSP -1 At offset 37 a variable is being destroyed. Which variable? Well, now we need to know which opcodes create variables. Do You Have A Reservation? Spoiler There are only four: RSADDx CPTOPSP CONSTx CPTOPBP Which means there are two variables on the stack in the example. RSADDx simply allocates space for a variable without giving it a value. This is similar to the following C code: //An uninitialized variable. Never do this!!! int i; Similar, but not identical. More on that in a moment. CONSTx also allocates space for a variable, but it actually initializes it with a value. This is similar to the following: 0; Also similar, but not identical. More on that in a moment, too. Back to MOVSP. MOVSP at offset 37 destroys one variable, the integer 0 created at offset 23. The reason RSADDx and CONSTx are similar, but not identical, to the C code is that the NCS runtime doesn't understand C. What it does understand is its stack, and both RSADDx and CONSTx add to the stack. MOVSP removes from the stack. It's simple, but efficient. Too simple, since the C abstractions are lost. C abstracts variable locations by referring to variables by name. Where is the variable named i? It's at position 0 on the stack. Where is the constant 0? It's at position 4 on the stack. We program in languages like C because we don't want to be bothered with remembering such details. Especially since variables might get moved over the course of software development. Even if they aren't moved, they may get duplicated during program execution! (e.g. i++ makes a copy of i, increments i, then returns the pre-incremented copy.) Better to give it a name and let the compiler and/or linker worry about which address the variable is at. But, what is the significance of MOVSP in the example? The variable that is destroyed is an immediate value. Named variable are destroyed at the end of their enclosing block, but immediates are destroyed at the end of a statement. Which means MOVSP signals either the end of a statement or the end of a block. So, look for MOVSP and you know you've reached the end of... something. If the MOVSP destroys a variable created by CONSTx then we've reached the end of a statement. If MOVSP destroys more than that then we've reached the end of a block! Unfortunately, life isn't that simple. Consider the following pseudo code: 21 RSADDx 23 JSR some_offset We see a reservation, followed by a subroutine call. According to Skywing's doc, a subroutine the returns a value must be preceded by RSADDx. There's just one problem: RSADDx is also used to create uninitialized named variables as seen in our first example! The first example is the C code int i = 0. An uninitialized variable is reserved on the stack A constant, or immediate value, is placed on the stack The constant is copied down to the reserved space (assignment, or = in C) The constant is destroyed because is a temporary Now that we know RSADDx can be either a named variable or a temporary, we're faced with a dilemma. Is our second example a subroutine that returns a value, which means the RSADDx is a temporary? Or is the second example an uninitialized variable followed by a subroutine that returns void? This is the problem I was working on as of the last update, and I believe I found the solution over the weekend. Our friend MOVSP provided the answer, and I can find statements with a large degree of accuracy. The Next Problem - Because There's Always Another Problem Spoiler Since then, I've been working on another problem: what is this JZ? If you think RSADDx is a pain since it represents two possibilities, consider that JZ can be: an if() an if() else an if() else if() a while() a for () a do-while() logical AND (&&) logical OR (||) These are control statements, the remaining set of statements to decipher. Expect an update in the next few days. 1 Share this post Link to post Share on other sites
AmanoJyaku 184 Posted November 28, 2020 I said there's always another problem... TL;DR Any scripts created by nwnnsscomp are likely to be broken. That probably means all scripts in TSLRCM. Oh, joy. The Gory Details Spoiler I found something curious in the NCS bytecode, so I decided to recompile an NSS file to see if the problem was created by nwnnsscomp. The dead code I've been complaining about? It's created by nwnnsscomp, which means nwnnsscomp isn't creating NCS files correctly!!! For example, an if-else looks like this graphically: Now, let's nest a second if-else into the false branch: if(true) { //something } else { if(true) { //some other thing } else { //some else thing } } The key here is that block c from the first graph has been replaced with blocks c, e, f and g. Block g is a separate block from d. But, that's not what nwnnsscomp does: That's this: if (true) { //something } else if (true) { //some other thing } else { //some else thing } Which is perfectly legal code, it just doesn't match what's in the source script. So, this is the reason I've been struggling to make a decompiler... The dead code? It's probably meant to be the missing block g from the second graph. 1 Share this post Link to post Share on other sites
DarthParametric 3,790 Posted November 29, 2020 5 hours ago, AmanoJyaku said: nwnnsscomp isn't creating NCS files correctly!!! Share this post Link to post Share on other sites
AmanoJyaku 184 Posted November 30, 2020 On 11/28/2020 at 10:57 PM, DarthParametric said: I never used nwnnsscomp until I suspected mod files have "incorrect" scripts. Have you seen other issues? BTW, I realize my last post might be confusing to those who don't code in C and assembly. I didn't mean to alarm people by giving the impression scripts compiled with nwnnsscomp don't work. As far as I can tell, they work correctly in the game. They just don't decompile correctly, and that makes it difficult for someone who is using them to learn how to write a decompiler. There is a certain structure that a compiled file must adhere to in order to be decompiled. I finally realized that structure is different between the files in my KoTOR installation, which has no mods, and my TSL installation, which has the TSLRCM. I don't remember if TSLRCM saves the original scripts anywhere, so I'll reinstall a clean copy of TSL on another machine to examine the original files there. Share this post Link to post Share on other sites
DrMcCoy 40 Posted November 30, 2020 Frankly, saying this is "incorrect" compilation is a misunderstanding of the compilation process. If you look at modern C compilers, they're doing far more transformations. That's all legal and often even wanted (to keep pipelines from stalling, etc). Don't stoop down to clickbait. And in general, for many targets, a nested if like that is indistinguishable from an unnested if, else if, else in the final machine code anyway. This one reason compilation-decompilation is never a lossless process, different things in the source can compile down to the same machine code. What you have found is a "tell" in the original BioWare compiler, that lets you distinguish certain things. That nwnnsscomp doesn't have the same tell is not a fault of nwnnsscomp, it's the nature of the thing. Ask me about gcc and clang differences some day, or different styles of mangling C++ symbols in gcc and msvc. Share this post Link to post Share on other sites
AmanoJyaku 184 Posted November 30, 2020 45 minutes ago, DrMcCoy said: Frankly, saying this is "incorrect" compilation is a misunderstanding of the compilation process. If you look at modern C compilers, they're doing far more transformations. That's all legal and often even wanted (to keep pipelines from stalling, etc). Don't stoop down to clickbait. And in general, for many targets, a nested if like that is indistinguishable from an unnested if, else if, else in the final machine code anyway. This one reason compilation-decompilation is never a lossless process, different things in the source can compile down to the same machine code. What you have found is a "tell" in the original BioWare compiler, that lets you distinguish certain things. That nwnnsscomp doesn't have the same tell is not a fault of nwnnsscomp, it's the nature of the thing. Ask me about gcc and clang differences some day, or different styles of mangling C++ symbols in gcc and msvc. I'm familiar with compiler optimizations, I considered that possibility. However, I don't believe this is the case given the other issues like the lack of optimizations for logical AND and logical OR, the existence dead code, and empty functions. There is so much more that could have been optimized that I can only conclude this is sloppy coding. Additionally, it's highly unlikely the game benefits from such optimizations. The "optimized" files are larger (i.e. no cache savings), and the lack of an additional JMP instruction saves a few dozen cycles (which wasn't even a problem in 2003). Anyway, it's not my intention to argue over this. It won't prevent me from making the decompiler, but it means it won't be possible to guarantee the decompiled output matches the original source for files compiled by nwnnsscomp. I assume those files would only be found in mods. The main purpose for making this decompiler is to access original game files, which I assume were not compiled by nwnnsscomp. Share this post Link to post Share on other sites
DrMcCoy 40 Posted November 30, 2020 Yeah, nwnnsscomp is not optimizing. But my point is that lots of roads lead to Rome, and denoting one as more correct than others is the wrong way to think about this. That different compilers written by different people lead to different bytecode that still accurately perform the same visible actions is normal, that should be dealt with. Similar for dead code. Dead code is entirely fine and should be handled properly without throwing errors there. And I feel that saying "nwnnsscomp is wrong" does it a huge disservice, because it's technically not wrong at all. So I'm more hung up about your phrasing here. As long as the decompiler still works as such (and doesn't just outright refuse to work) for cases were different idiosyncrasies slip through, be it nwnnsscomp or maybe the BioWare compiler used for Dragon Age, everything is entirely fine with me (for what that's worth, which is maybe not a lot). Guaranteeing a 100% match with the original source in all cases is pretty much impossible anyway, and not something I expected in the first place. Share this post Link to post Share on other sites
DarthParametric 3,790 Posted December 1, 2020 6 hours ago, AmanoJyaku said: Have you seen other issues? Well I can't offer any input on the whole "correct/incorrect" debate. I'm just a pleb who is completely ignorant of programming. I was just observing that it's not particularly surprising that a fan made tool derived from reverse-engineering Bioware's compiler for a different game that has undergone multiple revisions by different people has a few quirks. Share this post Link to post Share on other sites
AmanoJyaku 184 Posted December 13, 2020 Time for an update! Now that I've recovered from the nwnnsscomp-induced heart attack, I'm back to solving the last of the reverse compiler challenges. (When I complete the reverse compiler, I'll provide a detailed explanation of NCS, how NCS maps to NWScript, how the reverse compiler works, and why I'm so critical of nwnnsscomp.) The current problem I'm working on looks rather simple: RSADDx JSR JSR We see a reservation on the stack, followed by two subroutines. NCS rules for subroutines are as follows: Optionally, reserve one or more values on the stack Optionally, place subroutine arguments on the stack Call the subroutine You can see the disaster coming: it's those optional bits. In the example, the first subroutine might return a value. If so, that value is the argument of the second subroutine. But, RSADDx is also used to create uninitialized named variables in NWScript, for example: int i; //RSADDI float f; //RSADDF string s; //RSADDS You should never create uninitialized variables, but it was once a common practice. Numerous examples exist in NCS files, and that means the example code can also be read as an uninitialized variable, followed by two subroutines that do not take parameters and return void. In other words: Sub1(Sub0()); //vs int i; Sub0(); Sub1(); So, how do we figure this out? Unfortunately, we have to look inside each of the JSR calls. Which means yet another rewrite of my reverse compiler. It's not the worst thing in the world since the purpose of a reverse compiler is to reconstruct JSR calls. But more work must be done to determine dependencies, change the order in which things are analyzed, and store the resulting information for later use. I don't work this hard in my paying job... 1 Share this post Link to post Share on other sites
AmanoJyaku 184 Posted December 25, 2020 Happy Life Day!!! The Last Problem Spoiler The last post described the issue with RSADDx and JSR. The solution was to analyze by call dependency: we use the analysis of B to analyze A. void subA() { subB(); } You knew it wouldn't be that simple, right? How do we deal with recursive calls and circular references? In other words: When subroutine A calls subroutine A When subroutine A calls subroutine B, and subroutine B calls subroutine A The above are possible due to control statements, namely if-else statements. If we can ignore the recursive calls and circular references then we can create a dependency tree. Unfortunately, we're back to the original problem of understanding the subroutine's internal structure. void subA() { if(true) { subA(); //wtf??? } else { //do stuff } } The Sub Awakens Spoiler The high-level structure of a subroutine is four parts: return_type subA(Argument_1, Argument_2...) { //Part 1 - The code that does stuff //Part 2 - Values returned //Part 3 - Destruction of local variables //Part 4 - Destruction of arguments } As you can see, we need parts 2 and 4 to solve the problem of RSADDx and JSR. But, all parts are optional. A subroutine doesn't have to take arguments, return values, create local variables, or even have code. (Yes, there are empty subroutines in NCS files. Today's compilers would just omit such subroutines.) Since all parts are optional, it's necessary to analyze enough of the subroutine to know where one part ends and the next begins. Is there a way to do a partial analysis of a subroutine while avoiding the pitfalls of recursion and circular references? Of course there is. The Rise of Subwalker Spoiler You just have to find the child scopes and skip over them. The subroutine is the top-level scope, and most control statements create new scopes. void subA() { //A scope if (true) { //A second scope } else { //A third scope } } I tried identifying scopes months ago, but got confused as to how to find them in code. My meltdown over nwnnsscomp was due to me thinking I figured it out, then getting confused over code that didn't match what I knew. I have ample evidence NCS was designed to be reverse compiled perfectly. That's right, you should be able to recreate the original NSS, vectors, structures, loops, branches and all. I just didn't count on a compiler producing code that would violate this design pattern. That means I can't test with NCS files from mods, since those are made with nwnnsscomp. Switching back to vanilla NCS files has allowed for the reverse compiler to proceed. I now have an analyzer that finds control structures perfectly, all located in part 1 of the subroutine, and can now move on to parts 2, 3 and 4. tl;dr The reverse compiler is on its way to being complete. I was honestly about to give up, but I'm glad I didn't! 1 Share this post Link to post Share on other sites
DarthParametric 3,790 Posted December 26, 2020 6 hours ago, AmanoJyaku said: That means I can't test with NCS files from mods, since those are made with nwnnsscomp. Switching back to vanilla NCS files has allowed for the reverse compiler to proceed. So what does that mean for the final product? Are scripts produced by nwnnsscomp doomed to never being decompilable properly (at least for complex ones)? Share this post Link to post Share on other sites
AmanoJyaku 184 Posted December 26, 2020 1 hour ago, DarthParametric said: So what does that mean for the final product? Are scripts produced by nwnnsscomp doomed to never being decompilable properly (at least for complex ones)? I'm confident they can be, and I'm currently in the process of confirming that. What I believe will happen is that vanilla game files can be reversed perfectly, but anything that was compiled by nwnnsscomp will be altered from its original source. As DrMcoy said, the result, while syntactically different, should functionally be the same as the original source. For example, the source of k_contain_unlock from TSLRCM contains the following: string GetTreasureBundle (int nItemLevel, int nItemType = 0) { string sTemplate = ""; if (nItemType == 0) {} else { if((nItemType % 100) == 0) {} else if((nItemType % 10) == 0) {} else {} } return sTemplate; } But, the NCS file is written like this: string GetTreasureBundle (int nItemLevel, int nItemType = 0) { string sTemplate = ""; if () {} else if (){} else if() {} else {} return sTemplate; } The number of branches is the same across both, so the two forms should behave the same when executed by the game. If this proves to be true, then it won't matter if the reverse compiler produces the second form even if the first is the original. If I understand correctly, the original intent of the reverse compiler was to retrieve the sources for vanilla game scripts. Those are not compiled with nwnnsscomp. Which means the only files that are of concern are mods. I don't know what the policy is for reverse engineering mods, but I assume that's frowned upon? Share this post Link to post Share on other sites
DarthParametric 3,790 Posted December 26, 2020 21 minutes ago, AmanoJyaku said: the result, while syntactically different, should functionally be the same as the original source That seems fine to me. I was more concerned that certain files might not able to be decompiled at all. This is the case for DeNCS as well, based on my own experiments. 21 minutes ago, AmanoJyaku said: I don't know what the policy is for reverse engineering mods, but I assume that's frowned upon? The main rights issues with mods are rehosting a mod without the author's permission and/or reusing their assets without permission. I think scripts fall into something of a grey area, partially because they are often derived from vanilla game scripts anyway and also because access to the source is necessary for compatibility (especially for large mods like TSLRCM). It's why authors are encouraged to include their script source (thus tacitly approving modification/reuse) to sidestep the problem altogether. If you have concerns I'm sure the admins can give you an official ruling on whatever it is you are specifically worried about. Share this post Link to post Share on other sites
JCarter426 1,216 Posted December 26, 2020 On 12/13/2020 at 11:12 AM, AmanoJyaku said: You should never create uninitialized variables, but it was once a common practice. This is still common practice, at least as far as I've been taught. Yeah, there are security issues if you then try to read one, but modern compilers either won't let you do that or will initialize it with a default value, if the language itself doesn't. With that in mind, I don't see the issue. 6 hours ago, AmanoJyaku said: the result, while syntactically different, should functionally be the same as the original source 6 hours ago, DarthParametric said: That seems fine to me. I was more concerned that certain files might not able to be decompiled at all. This is the case for DeNCS as well, based on my own experiments. I agree. I'm not concerned about syntactically identical source code. The way the source code is written is... not always the best way, to put it lightly. The issues you've mentioned before don't seem like they should cause problems to me. Like with the nested else/if vs a simple else if, that'll only be functionally different if you declare something local to the else branch's scope. And, obviously, don't do that. And I doubt the compiler would've optimized for it that way if it would produce functionally different code. It usually knows better than we do. 6 hours ago, AmanoJyaku said: I don't know what the policy is for reverse engineering mods, but I assume that's frowned upon? I think that depends on whether you ask Oracle or Google... Share this post Link to post Share on other sites