Advanced Computing Assignment 7

This is a follow up to the Basic Python 1 problem.

While you were out of town doing field work over the summer Dr. Granger hired another student, Gregory Goyle, to help her modify your code so that it did something a bit different than the original code. The new code was intended to include more size classes and to output the average GC content for each size class to a csv file rather than the individual level data. Unfortunately Greg hasn’t learned an important lesson about programming, that it’s almost always better to work with existing code than to try to rewrite it from scratch, so he figured it would be easier to just start over than to try to understand what you’d already done. Sadly Greg isn’t quite the programmer you are and so didn’t actually finish the project before having to stop to focus on his course work now that school is back in session (and boy does he need to focus). So, he’s committed the current version of his code to your repository. It has all of the parts in place, but isn’t exactly… well… working just yet.

You don’t want to make the same mistake that Greg did and besides, your computer crashed over the summer and you weren’t using version control yet (it’s OK, you didn’t know better, it’s not your fault), so you’ll need to work with Greg’s code, such as it is. Find the bugs in the code and fix them. You’ll need to both read the code and use a debugger to understand what’s going on and fix the problems. Get the code cleaned up at least up to the point where the code is actually executuing. You’re welcome to find and fix/improve other issues as well, but you’ll also be writing tests later to help you track the tricky problems down, so the really important thing at this point is to get the code running so that you can actually run the tests.

Make a new branch for this problem and commit each fix individually.

The code can be downloaded, but your instructor may have already put it in your repository.
This is a follow up to the Debugging 1 problem.

Write tests for your granger_analysis_code for the following cases and save it in a file called test_granger_analysis_code.py (remember that the file has to start with test_ for nose to find it). Make any corrections/improvements that need to be made to the code so that all of your tests pass.

gc_content()
1. Sequence represented by upper case string
2. Sequence represented by lower case string
3. Sequence represented by mixed case string
4. Sequence represented by multiline string
get_size_class()

In an email accompanying your “updated” code, Dr. Granger indicated that the specifications for the earlength size classes were:
1. extralarge: earlength >= 15
2. large: 10 <= earlength < 15
3. medium: 8 <= earlength < 10
4. small: earlength < 8
Write tests to check:
1. That each case is working when the numbers are in the range
2. The edgecases of 8, 10, and 15
3. What happens if non-numerical values are passed to the function (e.g., a string from a header row that didn’t get removed)
Modify the main code so that all of your tests pass.
This is a follow up to the Tests 1 problem.

Now that you’ve got the code working it’s time to deal with the fact that it’s not really well structured (I mean, has this guy not heard of NumPy or Pandas or what), but before messing with working code let’s write a regression test. This test will make sure that the code still does the same thing it did before we started.

The problem is that all of the important code that needs refactoring is outside of functions at the bottom of the script. So,
1. Move the bulk of this code into one of more functions
2. Write a test that executes the main function using StringIO to make example data
3. Make sure it is doing what you expect
4. Refactor the portion of the code that gets the average values of gc content for each size class to use Numpy or Pandas.
5. Rerun your tests to make sure that both the regression test and the unit tests still pass

Programming for Biologists

Advanced Computing Assignment 7

gc_content()

get_size_class()