Problem #1 (of 2): Fun with linked lists Create a class called FASTAreadset_LL. The purpose of the class will be to contain a FASTA read set(similar to homework #1) and all of the functions needed to operate on this set. Use the linked list data-structure to store the genomic sequences of the given read dataset. Use character arrays (char[ ] ) to store the actual sequence fragment within each node of the linked list – there is no need to have a linked list of a linked list that stores one character at a time. For this assignment, you can completely disregard the headers of the sequence fragments (i.e. R0_0_1…). At minimum, the class must contain: • A default constructor • At least one custom constructor (e.g. one taking a file path or ifstream as input) • A function to read the FASTA fasta file • A destructor • A copy constructor A. Read in the entire 36 million read set and report RAM and CPU time used to load the data into memory. B. Implement a destructor for your class to delete / deallocate your array data structure. How long should it take (big O notation)? Explain why (a couple of sentences at most). C. Implement a copy constructor and perform a deep copy of the entire FASTAreadset_LL object. How long should it take (big O notation)? Explain why. D. Implement a search function which would take a sequence fragment (OK to assume that it will be exactly 50 characters long) and search for this fragment within the FASTAreadset_LL object. The search function should return the pointer to the node containing a match OR the NULL pointer value if a ‘hit’ was not found. Which of the following sequences were found in the read set: • CTAGGTACATCCACACACAGCAGCGCATTATGTATTTATTGGATTTATTT • GCGCGATCAGCTTCGCGCGCACCGCGAGCGCCGATTGCACGAAATGGCGC • CGATGATCAGGGGCGTTGCGTAATAGAAACTGCGAAGCCGCTCTATCGCC • CGTTGGGAGTGCTTGGTTTAGCGCAAATGAGTTTTCGAGGCTATCAAAAA • ACTGTAGAAGAAAAAAGTGAGGCTGCTCTTTTACAAGAAAAAGTNNNNNN
Problem #2 (of 2): Basic search Read in the Bacillus anthracis genome (/common/contrib/classroom/inf503/test_genome.fasta). Read in the sequence read fragments used in problem #1 – you must use the FASTAreadset_LL created in the previous problem to store your sequence reads. A. Break down the genome sequence into all 50-character long fragments contained within (shifting start location by one base each time). Store these fragments in an array or linked list data structure (your choice). How many 50-character fragments did you observe in the genome? B. Iterate through all 50-mers found in the genome, using the search function you developed in 1D to query the read set. How many genome 50-mer fragments were found in your read set? How long it take to complete the entire search process (all 50-mers)? Note that the problem 1B may take a LONG time to complete. If you feel that you will not be able to complete the execution of the program in time to submit the assignment, estimate the total amount of time and the number of found 50-mers by running 1000, 10000, 100000 searches and estimating (extrapolating) the outcome from those.