Advanced Computing Assignment 4

  1. There were a relatively large number of extinctions of mammalian species roughly 10,000 years ago. To help understand why these extinctions happened scientists are interested in understanding whether there were differences in the body size of those species that went extinct and those that did not. Since we’re starting to get pretty good at this whole programming thing let’s stop messing around with made up datasets and do some serious analysis.

    Download the largest dataset on mammalian body size in the world. Fortunately this dataset has data on the mass of recently extinct mammals as well as extant mammals (i.e., those that are still alive today). Take a look at the metadata to understand the structure of the data. One key thing to remember is that species can occur on more than one continent, and if they do then they will occur more than once in this dataset. Also let’s ignore species that went extinct in the very recent past (designated by the word ‘historical’ in the ‘status’ column).

    Import the data into Python. If you’ve looked at a lot of data you’ll realize that this dataset is tab delimited. The special character to indicate tab in Python is \t.

    To start let’s explore the data a little and then start looking at the major question.

    1. The following code will determine how many genera (plural of genus) there are in the dataset: len(data.groupby(['genus'])). Modify this code to determine the number of species. Remember that a species is uniquely defined by the combination of its genus name and its species name. Print the result to the screen. The number should be between 4000 and 5000.
    2. Find out how many of the species are extinct and how many are extant, print the result to the screen. Hint: first separate the data into the extinct and extant components and then count the number of species.
    3. Find out how many families are present in the dataset.
    4. Now print the genus name, the species name, and the mass of the largest and smallest species (note, it is not possible for a mammal to have negative mass
    5. Now let’s get to work. Calculate the average (i.e., mean) mass of an extinct species and the average mass of an extant species. The function mean() should help you here. It is available as both a numpy function and a Pandas DataFrame method. Don’t worry about species that occur more than once. We’ll consider the values on different continents to represent independent data points. Print out the results in the following sentence: “The average mass of extant species is X and the average mass of extinct species is Y.” with the appropriate values filled in for X and Y.
  2. This is a follow up to the Scientific Python 1.

    Looking at the average mass of extinct and extant species overall is useful, but there are lots of different processes that could cause size-biased extinctions so it’s not as informative as we might like. However, if we see the exact same pattern on each of the different continents that might really tell us something. Repeat the analysis in Scientific Python 1, but this time compare the mean masses within each of the different continents. Export your results to a csv file where the first entry on each line is the continent, the second entry is the average mass of the extant species on that continent, the third entry is the average mass of the extinct species on that continent, and the forth entry is the difference between the average extant and average extinct masses. Call the file continent_mass_differences.csv. If you notice anything strange think about what’s going on and present the final data in the way that makes the most sense to you.

  3. This is a follow up to Scientific Python 2.

    We have previously compared the average masses of extant and extinct species on different continents to try to understand whether size has an influence on extinction in mammals. Looking at the averages was a good start, but we really need to look at the full distributions of masses of the two groups to get the best picture of whether or not there was a major size bias in extinctions during the late Pleistocene. Make a graph with a subplot for each continent that you think is worth visualizing. Each subplot should contain two histograms that use the same bins to display the number of extinct and extant species. Use the log(mass) rather than the mass itself so that you can see the form of the distributions more clearly. Label the axes appropriately. The optional argument alpha will allow you to make the histograms transparent, which will help with visualizing two histograms that overlap one another.

    There is a lot of work to do in this problem so make sure to break it down in to manageable pieces. Some logical chunks include:

    • Make a single graph with the histograms for extinct and extant species. This might work well as its own function.
    • Downloading/importing the data
    • Breaking the data up into separate continents
    • Breaking the data up into extinct and extant species
    • Looping over the data to make one plot of each continent
  4. Understanding the total amount of biomass (the total mass of all individuals) in forests is important for understanding the global carbon budget and how the earth will respond to increases in carbon dioxide emissions. Measuring the mass of entire trees is difficult, and it’s pretty much impossible to weigh an entire forest (even if we were willing to clear cut a forest for science), but fortunately we can estimate the mass of a tree based on its diameter.

    There are lots of equations for estimating the mass of a tree from its diameter, but one good option is the equation M = 0.124*D^(2.53), where M is measured in kg of dry (above-ground) biomass and D is in cm d.b.h. (Brown 1997). We’re going to estimate the total tree biomass for trees in a 96 hectare area of the Western Ghats in India.

    1. Write a function that takes an array/Series of tree diameters as an argument and returns an array/Series of tree masses.
    2. The raw data is available on Ecologyical Archives, but unfortunately due to poor database structure using all of the trees would be a hassle. You could try to solve this problem yourself, but it turns out that someone else has already solved it for you. Install the EcoData Retriever and use it to download and cleanup this data automatically (using the command line interface the command would be retriever install csv Ramesh2010 and the data will be stored in Ramesh2010_macroplots.csv) and import it into Python.
    3. If you look at the file or the metadata carefully you’ll notice that the data is actually in girth (i.e., circumference, which is equal to pi * diameter) rather than diameter. Write a function to take an array/Series of circumferences as an argument and returns an array/Series of diameters. Use the math module to get an accurate value of pi.
    4. Use the two functions you’ve written to estimate the total biomass (i.e., the sum of the masses) of trees in this dataset and print the result to the screen.