Homework
1. Project Aim:
• Develop a conditional random field model which can assess protein functionally utilizing a protein family.
• Protein family acts as a database for scoring new protein sequences for functionality.
2. What are Graphical CRFs?
• More powerful than HMMs due to their application of feature functions.
• Undirected graphical model.
• Has a single exponential model for the joint probability of the entire sequence of labels given the observation sequence.
• Linear CRFs, like HMMs, only impose dependencies on the previous element whereas with general CRFs we can impose dependencies to arbitrary elements.
3. Applications of CRFs
• Natural Language processing
• Parts-of-speech tagging
• Name Entity recognition
• Prediction sequences
• Gene prediction
4. CRF options
• RNNSharp: CRFs based on recurrent neural networks
• CRF-ADF: Linear-chain CRFs with fast online ADF training
• CRFSharp: Linear-chain CRFs
• GCO: CRF with submodular energy functions
• DGM: General CRFs
• HCRF library: Hidden-state CRFs
• PyStruct: Structured Learning and prediction library in Python
5. Advantages
• Design is flexible
o No strict independence assumptions like HMM
• Overcomes the drawbacks of label bias in MEMM
o Computes the conditional probability of global output nodes
• Computes the joint probability distribution
6. Disadvantages
• Highly computationally complex at the training stage
• Difficult to re-train data with newer data
Format your homework according to the following formatting requirements:
o The answer should be typed, using Times New Roman font (size 12), double spaced, with one-inch margins on all sides.
o The response also includes a cover page containing the title of the homework, the student's name, the course title, and the date. The cover page is not included in the required page length.
o Also include a reference page. The Citations and references must follow APA format. The reference page is not included in the required page length.