Programming Assignment 2

  1. Write a series of print statements that returns the following (include a blank line between each answer):

    1. Post hoc ergo propter hoc
    2. What’s up with scientists using all of this snooty latin?
    3. atgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgc. Do this using the * operator to make 15 copies of atgc.
    4. Darwin’s “On the origin of species” is a seminal work in biology.
  2. Use functions from the string module or from base Python to print the following strings.

    1. species in all capital letters
    2. gcagtctgaggattccaccttctacctgggagagaggacatactatatcgcagcagtggaggtggaatgg with all of the occurences of a replaced with A
    3. ”    Thank goodness it’s Friday” without the leading white space (i.e., without the spaces before Thank)
    4. The number of a’s in gccgatgtacatggaatatacttttcaggaaacacatatctgtggagagg.
    5. Print the length of this dna sequence gccgatgtacatggaatatacttttcaggaaacacatatctgtggagagg
  3. Use string methods to print the following strings. Remember that methods work by adding the function to the end of the object name using a ., like

    mystring = 'Hello World'
    print mystring.lower()
    
    1. species in all capital letters
    2. gcagtctgaggattccaccttctacctgggagagaggacatactatatcgcagcagtggaggtggaatgg with all of the occurences of a replaced with A
    3. ”    Thank goodness it’s Friday” without the leading white space (i.e., without the spaces before Thank)
    4. The number of a’s in gccgatgtacatggaatatacttttcaggaaacacatatctgtggagagg.
  4. For the DNA sequence below determine the following properties and print them to the screen (you can cut and paste the following into your code, it’s a lot longer than you can see on the screen, but just select the whole thing and when you paste it into Python you’ll see what it looks like):

    dna = ttcacctatgaatggactgtccccaaagaagtaggacccactaatgcagatcctgtgtgtctagctaagatgtattattctgctgtggatcccactaaagatatattcactgggcttattgggccaatgaaaatatgcaagaaaggaagtttacatgcaaatgggagacagaaagatgtagacaaggaattctatttgtttcctacagtatttgatgagaatgagagtttactcctggaagataatattagaatgtttacaactgcacctgatcaggtggataaggaagatgaagactttcaggaatctaataaaatgcactccatgaatggattcatgtatgggaatcagccgggtctcactatgtgcaaaggagattcggtcgtgtggtacttattcagcgccggaaatgaggccgatgtacatggaatatacttttcaggaaacacatatctgtggagaggagaacggagagacacagcaaacctcttccctcaaacaagtcttacgctccacatgtggcctgacacagaggggacttttaatgttgaatgccttacaactgatcattacacaggcggcatgaagcaaaaatatactgtgaaccaatgcaggcggcagtctgaggattccaccttctacctgggagagaggacatactatatcgcagcagtggaggtggaatgggattattccccacaaagggagtgggattaggagctgcatcatttacaagagcagaatgtttcaaatgcatttttagataagggagagttttacataggctcaaagtacaagaaagttgtgtatcggcagtatactgatagcacattccgtgttccagtggagagaaaagctgaagaagaacatctgggaattctaggtccacaacttcatgcagatgttggagacaaagtcaaaattatctttaaaaacatggccacaaggccctactcaatacatgcccatggggtacaaacagagagttctacagttactccaacattaccaggtaaactctcacttacgtatggaaaatcccagaaagatctggagctggaacagaggattctgcttgtattccatgggcttattattcaactgtggatcaagttaaggacctctacagtggattaattggccccctgattgtttgtcgaagaccttacttgaaagtattcaatcccagaaggaagctggaatttgcccttctgtttctagtttttgatgagaatgaatcttggtacttagatgacaacatcaaaacatactctgatcaccccgagaaagtaaacaaagatgatgaggaattcatagaaagcaataaaatgcatgctattaatggaagaatgtttggaaacct

    1. How many occurences of ‘gagg’ occur in the sequence?
    2. What is the starting position of the first occurrence of ‘atta’? [report the actual base pair position as a human would understand it]
    3. How long is the sequence?
    4. What is the GC content of the sequence? The GC content is the percentage of bases that are either G or C (as a percentage of total base pairs) Print the result as “The GC content of this sequence is XX.XX%” where XX.XX is the actual GC content. Do this using a “formatted strings”.
  5. A colleague has produced a file with one DNA sequence on each line. Download the file and load it into Python using numpy.loadtxt(). You will need to use the optional argument dtype=str to tell loadtxt() that the data is composed of strings.

    Calculate the GC content of each sequence. The GC content is the percentage of bases that are either G or C (as a percentage of total base pairs). Print the result for each sequence as “The GC content of the sequence is XX.XX%” where XX.XX is the actual GC content. Do this using a “formatted strings”.

  6. You have a data file with a single “taxonomy” column in it. This column contains the family, genus, and species for a single taxonomic group. You need to figure out how to split that information into separate values for family, genus, and species. To solve the basic problem take a single example string, Ornithorhynchidae Ornithorhynchus anatinus, split it into three separate strings using a Python command, and then print the family, genus, and species, each on a separate line.