The single most common problem was overcomplication. Many projects exhibited several forms of it. Overcomplicated code takes longer to write and to debug, and is more likely to hide bugs, some of which are not exposed by the tiny provided test suite. One of the more important skills a programmer needs is the ability to sense when to say "this is more complicated than necessary; try to do this in a different, simpler way". Many people defined their own arrays to hold the strings "Registers" and "Instructions". This is totally unnecessary, since the C compiler turns the string constants themselves into unnamed arrays. Most of these people went further and used magic numbers to set the sizes of these arrays, instead of letting the C compiler deduce it from the initializer; such use of magic numbers is error prone. Some people compared the lines read in with "Registers" and "Instructions" using strcasecmp, not strcmp. This accepts input containing e.g. "ReGisTeRs" instead of "Registers", and is therefore a bad thing to do. There were many people who used plain scanf("%s", ...) to read in the lines they expected to hold those two strings, and did not check whether the lines read in actually contained those strings. This also allows incorrect input to remain undiagnosed. Some people used complicated string processing code to decode the lines that set register values, instead of a simple scanf("r%d = %x\n", ...). Some people put the register numbers into an array, and the register values into another array. With this setup, looking up the value of a register requires searching the first array, and using the index of a hit in the second array. I don't think ANYONE who used this setup got this right. The obvious, simple approach is of course to use a single array of register values indexed by the register number. Unfortunately, some people sized the array to hold 31 register values, not 32. Some people did not detect attempts to set the value of r0. A few people did not detect attempts to set the value of a register whose register number was negative. Almost everyone detected attempts to set the values of registers whose number was 32 or above, though some people's code did not then proceed to print an error message. Some people read all the instructions and put them into an array, and then processed the array elements, when it would have been simpler, more direct and less code to process each instruction as it is read in. That approach also avoids limiting the size of the input, since if you put each instruction into an array element, the input cannot have more instructions than what fits into the array. Some people extended this bad approach to store the register assignments in an array for later processing as well. The right way to read in the input is to make the structure of the code follow the structure of the data. (This rule of thumb applies in most programs, not just this one.) This yields a program structure like this: read the Registers marker read in and process all the register assignments read the Instructions marker read in and process all the instructions You can read each line with scanf (which most people did), or you can read them with something like fgets or getline (from 296) and then use sscanf to interpret their contents (which a few people did, and which is even better, since it allows better control in the presence of errors in the input.) Some people used a more complicated structure like this: loop while there is a line to read if the line is the Registers marker, read in and process all the register assignments else if the line is the Instructions marker, read in and process all the instructions else report an error Not only is this more complicated, it is also buggy, since it allows malformed input like Instructions ... some instructions ... Registers ... some register assignments ... Instructions ... some instructions ... Registers ... some register assignments ... to be accepted without comment. In many cases, the lack of a final else also allowed totally malformed lines to be accepted without comment. Some people took this wrong approach even further: loop while there is a line to read if the line is the Registers marker, record it else if the line is the Instructions marker, record it else if the most recent marker was the Registers marker, read in and process ONE register assignment else if the most recent marker was the Instructions marker, read in and process ONE instruction else report an error This is even worse, since in most cases, the variable that held the id of most recent code was initialized to indicate "Registers", which means that the code accepted input that started with register assignments WITHOUT the Registers marker. In general, you want your program to accept all input that is correctly formatted, and to reject all input that is not correctly formatted, with each rejection being accompanied by an error message that is specific to the actual form of the format violation. Some people tried to find the various fields of a MIPS instruction using the string representation of the instruction in the input (as eight hex digits encoded in ASCII). There is no way for this approach to yield anything except very complicated code, partly but not only because the opcode and register number fields straddle the hex digit boundaries. Some people tried to find the various fields of a MIPS instruction by converting the string representation of the instruction in the input (as eight hex digits encoded in ASCII) into another string representation (as 32 binary digits, also encoded in ASCII). This approach is only very slightly better than the previous one. Most people tried to find the various fields of a MIPS instruction the right way, by converting the string representation of the instruction in the input into an int or unsigned int, and then using bit operations. Most people did the conversion the correct and most direct way (scanf), though some people wrote their own (unnecessary) conversion function. Many people computed the opcode the right way, using nothing more than just instr >> 26, which does everything you need if instr is unsigned. If it is signed, you need something like (instr >> 26) & 0x3f; many people did that. Some people added another unnecessary mask, with code equivalent to ((instr & (0x3f << 26)) >> 26) & 0x3f, in which the first mask is redundant. (With unsigned instr, either one can be deleted.) Some people used (1 << OPCODE_SIZE) - 1 as the mask instead of 0x3f; that is good. However, some people built up the mask using a loop over the bit positions. In some people's code, this loop over the bits of the opcode first computed the mask and applied it, bit by bit. This approach can be made to work, but is much longer and harder to get right than the direct approach. A few people using such loops tried to calculate the value of each bit position using the pow function in the math library, which is (a) a huge overkill, and (b) fundamentally misguided, since floating point functions like pow are inherently approximate (though they will yield precise results when asked to raise 2 to any power from 1 to 32). Most people computed the values in the base register number and the offset fields using the same approach as they used to compute the opcode. There are two simple ways to sign extend the offset. One is to use a trick: if offset is signed, then (offset << 16) >> 16 does the job. The other is to directly test the sign bit, using code like this: if ((offset & (1 << 15)) != 0) { sign_extended_offset = 0xffff0000 | offset; } else { sign_extended_offset = offset; } Unfortunately, many people chose other approaches, which in many cases worked correctly only for SOME values of offset. Some projects did sign extension not on the offset, but on the sum of the offset and the register value, which is a bug, since that is now how MIPS addressing modes work. Some people added the sign extended offset not to the contents of the register indicated by the base register number field, but to the base register number itself, even though this totally removed the purpose of all the register value assignments in the input. There are two good, direct ways to decode the opcode. The first is with a switch: switch (opcode) { case 0x23: // lw ... break; case 0x21: // lh case 0x25: // lhu ... break; ... } or, equivalently, using a chain of if-then-elses. A less good variant is to have several if-then-else chains, e.g. one for the r/w distinction, and one for the access size. Some people who used this approach used range based tests like (0x20 <= opcode && opcode <= 0x25 && opcode != 0x22), which are hard to read and therefore to check. They are even harder if the opcodes are converted to decimal, since the MIPS documentation gives them in hex. The second good way, which several projects used, is to define an array of constant structures, one for each of the eight opcodes you are looking for, in which each element has (a) the opcode, (b) a char denoting load vs store ('r' vs 'w'), and (c) an int giving the access size. When processing an instruction, you just search the array for its opcode. Some projects used variations on the second approach, with separate arrays for loads and stores, and maybe extra arrays for 1, 2 and 4 byte access sizes. These approaches usually decoded the opcode twice (once for load vs store vs everything else, and once for access size) or three times. They usually had nontrivial amounts of duplicated code, and were usually full of magic numbers (since each array had a different size). Sometimes, these arrays had a field for the address, even though it does not belong there; while the r/w indication and access size is the same e.g. for every lw instruction, the address is not. Most programs had reasonable though not ideal indentation, usually with some minor problems such as some loop bodies not being indented, or different constructs at the same level being indented by different amounts, or with comments being indented differently than the code to which they apply. Some programs had totally unreasonable indentation, either with the code being smashed up against the left margin, or each line being indented almost at random. Such programs are hard to read and understand not just for the marker, but for you too. I have to look at it only for a short while, while you look at it for a lot longer, so bad indentation hurts you much more than it hurts me. Many programs had meaningless variable names. Using i as a loop counter is fine (though this project does not NEED a loop with a counter), but other single-character variable names like a, b, x, y and z are meaningless, and so are names like hex and temp, which could hold any one of several related things in the program. Some names were actively misleading; for example, one project named the array holding the register values "load". Again, the person hurt by meaningless or meaningful but misleading variable names is you, since they make the job of writing and debugging your code significantly harder. Some projects had lots of comments that just echoed the code. Since they contain no information that reader cannot get from the code, they do not help, and by distracting the reader from the code, their presence actually hurts: it makes the code HARDER to understand. Many projects did have useful comments, giving the reasons why the code did things a certain way. Unfortunately, in this project, the only things that really needed nontrivial explanations are over-complicated approaches to problems. A simpler approach that is so obviously right that it does not need comments is preferable to a more complex approach that does need comments, even if it has them. Similarly, while some over-complex approaches could benefit from defining your own functions, if you do everything the most direct way possible, then you don't need any functions other than main and a few functions from the C standard library (such as scanf, printf and strcmp). The comments in too many projects had spelling and/or grammar errors. Many submissions got warnings from gnuc. Some of these warned about bugs, and many others pointed out code that could be considerably simplified. For example, gnuc warns about using the %x format string with signed numbers, and using unsigned ints to represent the instruction read in and its components simplifies the code needed to manipulate them. You should make getting no warnings from gnuc a priority for the second project. A few people submitted memory.c files which used Dos/Windows text format, instead of Unix format. The difference is that Windows uses \r\n (CR LF) to represent the end of a line, while Unix uses just \n (LF). The Unix programs dos2unix and unix2dos can convert between the two formats. Some people forgot to answer the question asked in the project spec.