## Exercise 1: Translate genes, find longest open reading frame
1. Load the file *GeneSequences.fa*
```{r}
library("Biostrings")
genes = readDNAStringSet("GeneSequences.fa")
```
2. [Translate](https://www.rdocumentation.org/packages/Biostrings/versions/2.40.2/topics/translate) the sequences (in 3 frames: only the forward strand)
3. Find the longest ORF in each sequence: use [matchPattern](https://www.rdocumentation.org/packages/Biostrings/versions/2.40.2/topics/matchPattern) for residues "*" and "M"
```{r}
### will store the longest orf for each sequence in this variable
longest.orf = AAStringSet(rep('', length(genes)))
names(longest.orf) = names(genes)
### loop through every frame
for (nf in 1:length(frames)) {
frm = frames[[nf]]
nme = names(frames)[[nf]]
#### search "*" in seqn and loop through results
stops = start(matchPattern("*", frm))
#### then search first "M" between last and current "*"
#### with n0 = position of M, n1 = position of *
#### if seqn[n0:n1] is longer than longest.orf[nme], then replace it
}
```
4. Save it as a fasta file named *orf.fa*
```{r}
writeXStringSet(...)
```
## Exercise 2: Construct an HMM to find ORFs
Implement an HMM according to the schema below

* The states **S1, S2, S3** represent the 3 consecutive nucleotides of a start codon, **E1, E21, E22, E32, E33** represent the
[3 possible stop codons](https://en.wikipedia.org/wiki/Stop_codon), **B** is background and **I[123]** are "inner" codons.
* The emitted symbols are nucleotides *A, C, G, T*, background and inner codons emit with uniform probabilities.
* The emission probabilities of start or end states must be specified (easy to guess).
* The transition probabilities not specified on the schema should be easy to guess. The probabilities must be calculated so that the 3 possible stop codons have the same probability.
1. Complete the code below by filling in all matrix elements:
3. Plot the HMM schema (see [plot.HMM](https://www.rdocumentation.org/packages/aphid/versions/1.3.3/topics/plot.HMM))
4. Run the [Viterbi algorithm](https://www.rdocumentation.org/packages/aphid/versions/1.3.3/topics/Viterbi) on the segment *1501:1800* of the human gene and display the resulting states
```{r}
### plot...
### Viterbi algo
hmm.vtrb = Viterbi(....)
### for a simple visual display: concatenate all nucleotides into one string
### and show the 1st letter of each state name aligned below