Advanced Computing Assignment 3

The number of birds banded at a series of sampling sites has been counted by your field crew and entered into the following list. The first item in each sublist is an alphanumeric code for the site and the second value is the number of birds banded. Cut and paste the list into your assignment and then answer the following questions by printing them to the screen.
```
data = [['A1', 28], ['A2', 32], ['A3', 1], ['A4', 0],
        ['A5', 10], ['A6', 22], ['A7', 30], ['A8', 19],
		['B1', 145], ['B2', 27], ['B3', 36], ['B4', 25],
		['B5', 9], ['B6', 38], ['B7', 21], ['B8', 12],
		['C1', 122], ['C2', 87], ['C3', 36], ['C4', 3],
		['D1', 0], ['D2', 5], ['D3', 55], ['D4', 62],
		['D5', 98], ['D6', 32]]
```
1. How many sites are there?
2. How many birds were counted at the 7th site?
3. How many birds were counted at the last site?
4. What is the total number of birds counted across all sites?
5. What is the average number of birds seen on a site?
6. What is the total number of birds counted on sites with codes beginning with C? (don’t just identify this sites by eye, in the real world there could be hundreds or thousands of sites)
Dr. Granger is interested in studying the relationship between the length of house-elves’ ears and aspects of their DNA. This research is part of a larger project attempting to understand why house-elves possess such powerful magic. She has obtained DNA samples and ear measurements from a small group of house-elves to conduct a preliminary analysis (prior to submitting a grant application to the Ministry of Magic) and she would like you to conduct the analysis for her (she might know everything there is to know about magic, but she sure doesn’t know much about computers). She has placed the file on the web for you to download.

You might be able to do this analysis by hand in Excel, but counting all of those bases would be a lot of work, and besides, Dr. Granger seems to always get funded, which means that you’ll be doing this again soon with a much larger dataset. So, you decide to write a script so that it will be easy to do the analysis again.

Write a Python script that:
1. Imports the data into a data structure of your choice
2. Loops over the rows in the dataset
3. For each row in the dataset checks to see if the ear length is large (>10 cm) or small (<=10 cm) and determines the GC-content of the DNA sequence (i.e., the percentage of bases that are either G or C)
4. Stores this information in a table where the first column has the ID for the individual, the second column contains the string ‘large’ or the string ‘small’ depending on the size of the individuals ears, and the third column contains the GC content of the DNA sequence.
5. Prints the average GC-content for both large-eared elves and small-eared elves to the screen.
6. Exports the table of individual level GC values to a CSV (comma delimited text) file titled grangers_analysis.csv.
This code should use functions to break the code up into manageable pieces. For example, here’s a function for importing the data from the web:
```
def get_data_from_web(url):
    webpage = urllib.urlopen(url)
    datareader = csv.reader(webpage)
    data = []
    for row in datareader:
        data.append(row)
    return data
```
This function imports the data as a list of lists. Another good option would be to use either a Pandas data frame or a Numpy array. An example function using Pandas looks like:
```
def get_data_from_web(url):
    data = pd.read_csv(url)
	return data
```
Throughout the assignment feel free to use whatever data structures you prefer. Ask your instructor if you have questions about the best choices.
The species-area relationship characterizes the relationship between the the number of species observed at a site and the area being sampled. This relationship is used widely in ecology and conservation biology for tasks such as estimating the location of biodiversity hotspots to prioritize for conservation.

Unfortunately there is no consensus on the form of the equation that best describes the species-area relationship. This means that any estimate of species richness depends on the choice of model. Most of the models have roughly equivalent statistical support and we are going to be making predictions for regions where there is no data so we can’t determine the best model statistically (and besides, we don’t know how to do statistics yet in Python, so…). Instead we are going to take a consensus approach where we estimate the species richness using all possible models and then use the average prediction as our best estimate.

We are going to deal with 5 models today (which is already kind of a lot), but according to some authors there are as many as 20 reasonable models for the species-area relationship, so we’ll want to make our code easily extensible. The five models we will work with are those defined by Dengler and Oldeland (2010).
- Power: S = b0 * A^b1
- Power-quadratic: S = 10^(b0 + b1 * log(A) + b2 * log(A)^2)
- Logarithmic: S = b0 + b1 * log(A)
- Michaelis-Menten: S = b0 * A / (b1 + A)
- Lomolino: S = b0 / (1 + b1^log^((b2/a)))
All logarithms are base 10. The parameters for each model are available below, along with the areas at which we wish to predict species richness. Each sublist contains the parameters for one model in the order given above. All models contain b0 and b1, but only the Power-quadratic and Lomolino models contain the third paramter b2.
```
sar_parameters = [[20.81, 0.1896], [1.35, 0.1524, 0.0081],
                  [14.36, 21.16], [85.91, 42.57],
				  [1082.45, 1.59, 390000000]]

areas = [1, 5.2, 10.95, 152.3, 597.6, 820, 989.8, 1232.5, 15061]
```
These can be cut and paste into your code. Alternatively, if you’re looking for a more realistic challenge you can import the related csv files for the parameters and the areas directly from the web. Dealing with extracting the data you need from a standard csv import will be a little challenging, but you’ll learn a lot (and you can always solve the main problem first and then go back and solve the import step later; which might well be what an experienced programmer would do in this situtation).

Write a script that calculates the richness predicted by each model for each area, and exports the results to a csv file with the first column containing the area for the prediction and the second column containing the mean predicted richness for that area. To make this easily extensible you will want to write a function that defines each of the different species-area models (5 functions total) and then use higher order functions to call those functions. Depending on how you solve the problem you may find zip and Python’s use of asterisks handy.

Programming for Biologists

Advanced Computing Assignment 3