Advanced Computing Assignment 6

  1. For this and many of the following problems you will create queries that retrieve the relevant information from the Portal small mammal survey database. You can create the SQLite version of this database using the EcoData Retriever by first installing the software and then running:

    retriever install sqlite PortalMammals

    from the command line. Or you can download an already assembled copy of the database.

    You will need to know some details regarding what is in this database in order to answer the questions. For example, you may need to know what species is associated with the two character species ID or you may need to know the units for the individual’s weight. This type of information associated with data is called metadata and the metadata for this dataset is available online at Ecological Archives.

    1. Write a query that displays all of the records for all of the fields in the main table. Save it as a view named All Survey Data.
    2. We want to generate data for an analysis of body size differences between males and females of each species. We have decided that we can ignore the information related to when and where the individuals were trapped. Create a query that returns all of the necessary information, but nothing else. Save this as Size Differences Among Sexes Data.
  2. A population biologist (Dr. Undómiel) who studies the population dynamics of Dipodomys spectabilis would like to use some data from Portal, but she doesn’t know how to work with large datasets. Being the kind and benevolent person that you are, write a query to extract the data that she needs. She wants only the data for her species of interest, when each individual was trapped, and what sex it was. She doesn’t care about where it was trapped within the site because she is going to analyze the entire site as a whole and she doesn’t care about the size of the individuals. Obviously she doesn’t need the species codes because you’re only providing her with the data for one species, and besides since she isn’t actually looking at the database itself the two character abbreviation would probably be confusing. Save this query as a view with the name Spectabilis Population Data.

    Scroll through the results of your query. Do you notice anything that might be an issue for the scientist to whom you are providing this data? [you should] Think about what you should do in this situtation…

    You decide that to avoid invoking her wrath, you’ll send her a short e-mail* requesting clarification regarding what she would like you to do regarding this complexity. Dr. Undómiel e-mails you back and asks that you create two additional queries so that she can decided what to do about this issue later. She would like you to add a query to the same data as above, but only for cases where the sex is known to be male, and an additional query with the same data, but only where the sex is known to be female. Save these as views with the names Spectabilis Population Data Males and Spectabilis Population Data Females.

    *Short for elven-mail

  3. The graduate students that work at the Portal site are hanging out late one evening drinking… soda pop… and they decide it would be an epically awesome idea to put together a list of the 100 largest rodents ever sampled at the site. Since you’re the resident “computer genius” they text you, and since you’re up late working and this sounds like a lot more fun that than homework you’re working on (which isn’t really saying much, if you know what I’m saying) you decide you’ll make the list of them.

    The rules that the Portal students have come up with (and they did spend a sort of disturbingly long time coming up with these rules; I guess you just had to be there) are:

    1. Data needed include the species ID, year of capture, and the mass. These columns should be presented in this order.
    2. Individuals should be sorted in descending order with respect to mass.
    3. Since individuals often have the same mass ties should be settled by sorting next by hind foot length and finally by the year of capture.

    You may find the SQL command LIMIT to be helpful. Save the final query as 100 Largest Individuals.

  4. Write a query that returns a list of the dates that mammal surveys took place at Portal (with no duplicates). Save it as Dates Sampled.

  5. Write a query that returns the number of individuals of each species captured in each year and the total biomass of those individuals (the sum of the wgt column). The units for biomass should be in kilograms. Include the year and species ID in the output. Sort the result chronologically by year and then alphabetically by species. Save as Mass-Abundance Data.

  6. The Plots table in the Portal database can be joined to the main table by joining plot to PlotID and the species table can be joined to the main table by joining species to new_code.

    The Portal mammal data include data from a number of different experimental manipulations. You want to do a time-series analysis of the natural population dynamics of all of the rodent species at the site, so write a query that returns the year, month, day, and full species name of every individual captured on the control plots. Exclude all non-rodent species (i.e., species for which the Rodent field in the species table is equal to 0) and all individuals that have not been identified to species (i.e., species for which the Unknown field in the species table is equal to 1). Save this query as Data From Controls.

  7. Using the Portal dataset write a query that returns the average number of individuals sampled on each plot type on a per plot basis. In other words, how many individuals are sampled on average on a plot of a given plot type. Save this query as Per plot individuals per plot type.

  8. Write a query that returns the year, month, day, species, and mass for every record were there is no missing data in any of these fields. Save it as No Missing Data.

  9. Start a new Python file that you will use to answer the rest of the questions for this assignment.

    Using Python create a new table in the Portal database called FieldNotes and should include information on the date of sampling that the notes apply to (designed in such a way as to allow it to be linked to the main table) and a large text field for entering notes.

    I would definitely recommend using VARCHAR for the notes field because you have no idea how extensive future notes might be and the notes will probably vary wildly in length.

  10. Add the following note to the FieldNotes table in the Portal database for the date 04/01/1963:

    “Just completed the April 1963 census of the site. The region is teeming with Dipodomys spectabilis. Using the time machine to conduct trapping prior to the start of the study is working out great!”

  11. Add the following note to the FieldNotes table in the Portal database for the date 10/1/2013:

    “Vegetation seems to have returned to normal for this time of year. The landscape isn’t exactly green, but there is a decent amount of plant activity and there should be enough food for the rodents to the winter”

    Commit this change to the database. As soon as you’ve commited it you realized that this was the update for October 2012, not October 2013 (obviously; I mean it’s not like we have a time machine…). Update the record so that it contains the appropriate value.

  12. Before starting this problem make sure that you have a backup of your database or that it has recently been committed to version control so that you can revert the changes if necessary. In fact, when developing the code it’s probably best to work on a copy of the database that is specifically just for development. Once it’s working then apply it to your actual database. Always, always, backup your databases before messing with them in this manner. Seriously.

    The Species table in the Portal database has a structural problem in that the oldcode column often contains multiple pieces of information in a single cell. This means that we can’t really run queries that use the oldcode field effectively. Think about what the best structure would be for this table. It might include splitting the table into two separate tables (wink, wink, nudge, nudge). Feel free to check with Ethan to make sure you’ve got the right idea. Using Python, restructure the database and store the new species table as PortalMammal_species(i.e., the same name it has now; you may need to learn about the DROP TABLE command in order to do this) and naming any new tables you create with easy to understand names.