As you probably remember, the single most common problem in project 1 was overcomplication. I am please to say that only a few of the submissions for project 2 exhibited this. The only repeated problem along these lines was that a few people used a linked list instead of an array to represent the set of cache blocks in the cache. Using lists requires significantly more work and is harder to debug, and as a consequence, most such projects had bugs related to the management of the list. There was no single dominant problem exhibited by most submissions, but there were a few problems that occurred in up to about a quarter of the submissions. One was having an initialize() that looked like this: initialize(cache_params_t params) { cache_t *cache; cache->blocks = malloc(...) } When initialize() is executed, the local variable "cache" is not initialized, and its value will therefore be garbage. Accessing what it points to is therefore a bug, whether that access is a read or (as here) a write. As it happens, for most people the garbage pointer happened to point somewhere that was both accessible to the program (so that this assignment did not get a segmentation fault) and not containing anything else used by the program (so that later writes to the fields of the structure cache points to did not screw up anything else). For some people, the garbage pointer did screw up sometime else. What the effect was on you depended on things like the size of your code and data structures, and so it was effectively unpredictable. Despite warnings in lectures and tutorials, many people called malloc without checking whether it returned NULL. Some people called malloc twice or three times (for the cache, for the blocks, and maybe for the parameters), but checked for a NULL return only after the first call, and not the others. Most people did check all the return values, usually by writing and calling a checked_malloc function. For FIFO and LRU caches, many projects kept track of which blocks in the cache are in use not by using the valid bits in the blocks, but by an overall count of how many cache blocks are in use. Because such caches are always filled sequentially from block 0, this can be made to work: if the cache has N filled blocks, then blocks 0 to N-1 are valid, and blocks N to cachesize-1 are invalid. However, a significant fraction of these projects had code that looped over all blocks in the cache (0 to cachesize-1) without checking valid bits, which is a bug. This usually happened when handling writes. Some people also emitted checking the valid bits on writes in direct mapped caches. For FIFO replacement, you want to keep track of which cache block was loaded into from memory the longest time ago (effectively, the read miss the longest time ago), while for LRU, you want to keep track of which cache block was last accessed the longest time ago (the read miss, read hit or write hit the longest time ago). The cleanest design uses different fields in cache_block_t structures to contain the two different time stamps (which can be implemented trivially as access numbers). Some people used the same field in the cache_block_t structure for both counters. This can work, but it demands some documentation of the fact that the field means different things in different kinds of caches, and the name of the field should also reflect this duality. Most projects that used this scheme gave the field a misleading name and did not have the necessary documentation. A valid cache block must contain a record of WHICH memory block it holds a copy of. There are three simple ways to do this in direct mapped caches: - by storing the starting address of the memory block - by storing the memory block number of the memory block (which is the starting address divided by the block size) - by storing the tag of the memory block (which is the memory block number divided by the number of cache blocks) The first two also work for the associative caches. The supplied code had a field named memory_block_num, intended for the second form of identification. Many projects kept the field name but used it to store a starting address or a tag, which is misleading. Many other projects added a second or even a third field, for a second or third form of identification, when one is all you need. May projects had code like this: if (cache->blocks[index].modified) { record_eviction(index, start_addr, TRUE); } else { record_eviction(index, start_addr, FALSE); } The test is unnecessary; you can do record_eviction(index, start_addr, cache->blocks[i].modified); instead. For some reason, a significant number of people write code like this: while (...) { if (...) { ... do something ... } else { continue; } } The "else continue" is totally useless; the loop would continue even if it were not there. For some reason, some people used 0, 1 and 2 to refer to three kinds of caches, and 0 and 1 to refer to reads and writes, ignoring the symbolic names created for these specific purposes by the definitions of the relevant enums. This made the code sufficiently harder to understand for the authors themselves that I think these projects got a lower than average number of total marks for correctness. More people used 0 and 1 for FALSE and TRUE. This has more justification, since these are actually represented that way in hardware, but it still often led to questionable code like this: if (cache->blocks[index].valid == TRUE) { ... do something ... } else if (cache->blocks[index].valid == FALSE) { ... do something else ... } This code has two problems. First, C considers any nonzero value to be true, so the first test is error prone. Second, it implies that the program considers the only two acceptable values of the valid field to be 0 and 1, but it does not detect situations in which (due to a bug elsewhere) it contains 2, 3, ... Indentation was better in an average cache.c compared to an average memory.c, but was still not ideal. A few projects still had totally unreasonable indentation, either with the code being smashed up against the left margin, or each line being indented almost at random. The same comment applies as in project 1: such programs are hard to read and understand not just for the marker, but for you too. I have to look at it only for a short while, while you look at it for a lot longer, so bad indentation hurts you much more than it hurts me. Variable names in an average cache.c were much better than in an average memory.c. Only a few projects had meaningless variable names. Unfortunately, many projects still had lots of comments that just echoed the code. Since they contain no information that reader cannot get from the code, they do not help, and by distracting the reader from the code, their presence actually hurts: it makes the code HARDER to understand. Many submissions got warnings from gnuc. Many of these warned about bugs; for example, the problem with access through an uninitialized cache pointer in initialize() was pointed out by gnuc in almost all cases. Unfortunately, it seems that despite my advice in the feedback for project 1, many students still do not take advantage of this simple and easy way to locate bugs. A few people submitted memory.c files which used Dos/Windows text format, instead of Unix format. The difference is that Windows uses \r\n (CR LF) to represent the end of a line, while Unix uses just \n (LF). The Unix programs dos2unix and unix2dos can convert between the two formats.