Research
Select Project
BERT for Sequence Classification
Transformer-based language models applied to biological sequence classification, exploring gene function prediction and regulatory element annotation.
Source ↗
BERT Transformers NLP PyTorch Genomics
Leveraged transformer-based language models (BERT) to classify biological sequences, treating DNA/protein sequences as “language” for deep learning analysis.
Approach
- Adapted BERT architecture for biological sequence tokenization
- Fine-tuned pre-trained models on curated genomic datasets
- Explored applications in gene function prediction and regulatory element annotation
- Compared against traditional sequence classification methods (BLAST, HMM)
Technical Stack
- PyTorch for model training and inference
- Hugging Face Transformers for BERT architecture
- Custom tokenizers for biological sequence encoding
- Evaluated using precision, recall, and F1 metrics against gold-standard annotations