You are currently viewing Basic Python scripts for bioinformatics

Basic Python scripts for bioinformatics

Here are some basic Python scripts for bioinformatics that you can use or modify for various tasks. These scripts cover a range of common bioinformatics operations:

  1. Sequence Length and Composition:
    • Calculate the length of a DNA sequence and determine the GC content.
sequence = "ATGCTAGCTAGCTAGCTAGCTAGCTA"
length = len(sequence)
gc_content = (sequence.count('G') + sequence.count('C')) / length * 100
print(f"Sequence Length: {length} bases")
print(f"GC Content: {gc_content:.2f}%")

2. Transcription and Translation:

  • Transcribe DNA to RNA and translate RNA to protein.
dna_sequence = "ATGCTAGCTAGCTAGCTAGCTAGCTA"
rna_sequence = dna_sequence.replace("T", "U")
codon_table = {"AUG": "M", "GCU": "A", ...}  # Include the entire codon table
protein_sequence = "".join([codon_table[rna_sequence[i:i+3]] for i in range(0, len(rna_sequence), 3)])
print(f"DNA Sequence: {dna_sequence}")
print(f"RNA Sequence: {rna_sequence}")
print(f"Protein Sequence: {protein_sequence}")

3. Sequence Alignment with Biopython:

  • Use Biopython to perform a pairwise sequence alignment.
from Bio import pairwise2
from Bio.Seq import Seq

seq1 = Seq("AGTAGTACGTAGTA")
seq2 = Seq("AGTACGTA")
alignments = pairwise2.align.globalxx(seq1, seq2)
print(pairwise2.format_alignment(*alignments[0]))

4. Basic FASTA File Parsing:

  • Read and parse a FASTA file to extract sequence data.
def read_fasta(filename):
    sequences = {}
    with open(filename, 'r') as f:
        header, sequence = None, []
        for line in f:
            line = line.strip()
            if line.startswith('>'):
                if header:
                    sequences[header] = ''.join(sequence)
                header, sequence = line, []
            else:
                sequence.append(line)
        if header:
            sequences[header] = ''.join(sequence)
    return sequences

sequences = read_fasta("sequences.fasta")
for header, sequence in sequences.items():
    print(f"Header: {header}")
    print(f"Sequence: {sequence}")

5. Calculate Molecular Weight of a Protein:

  • Calculate the molecular weight of a protein sequence.
protein_sequence = "MSTHGGSPKT"
amino_acid_weights = {"A": 71.03711, "C": 103.00919, ...}  # Include weights for all amino acids
molecular_weight = sum([amino_acid_weights[aa] for aa in protein_sequence])
print(f"Molecular Weight: {molecular_weight:.2f} g/mol")

These basic Python scripts cover fundamental bioinformatics tasks such as sequence analysis, alignment, file parsing, and molecular weight calculation. You can use these as starting points and build more complex scripts or integrate them into larger bioinformatics projects as needed.

Leave a Reply