Protein Sequence Analysis: An Essential Tool for Innovation

Reading Time: 4 minutes
Rate this post

“Knowledge of sequences could contribute much to our understanding of living matter.”- Frederick Sanger, Nobel Laureate & biotechnologist. Patenting protein sequences has been a subject of much contention. It often carries the burden of being governed by the doctrine of equivalents, despite the fact that there is an increasing number of modified proteins being created… (Featured image source:


“Knowledge of sequences could contribute much to our understanding of living matter.

– Frederick Sanger, Nobel Laureate & biotechnologist

Patenting protein sequences has been a subject of much contention. It often carries the burden of being governed by the doctrine of equivalents, despite the fact that there is an increasing number of modified proteins being created by pharmaceutical companies with important clinical advantages.

Let’s take a step back to understand this better and how protein sequence analysis plays a key role in patenting activities.

What is protein sequencing?

Protein sequencing is a technique to determine the amino acid sequence of a protein, as well as which conformation the protein adopts and the extent to which it is complexed with any non-peptide molecules.

The sequence of nucleotides, coded in triplets (codons) along the mRNA, determines the sequence of amino acids in protein synthesis. The DNA sequence of a gene can be used to predict the mRNA sequence, and the genetic code can in turn be used to predict the amino acid sequence.

Determining the amino acid sequence

Amino acid sequences can be determined by automated Edman Degradation. In this method, the amino-terminal residue is labeled and cleaved from the peptide without disrupting the peptide bonds between other amino acid residues.

Similarity in amino acid sequences

Sequence similarity is a measure of an empirical relationship between sequences. A common objective of sequence similarity calculations is establishing the likelihood for sequence homology to determine the chance that sequences have evolved from a common ancestor. A similarity score is therefore aimed to approximate the evolutionary distance between a pair of nucleotide or protein sequences. Many implementations for measuring sequence similarity exists, where a general aim is to infer structural or functional characteristics of an unannotated molecular sequence.

Patent protection for protein sequence

Under appropriate circumstances, it should be possible to obtain commercially significant patent protection for certain aspects of a protein sequence. Although case law has held that data sets per se are not patentable, they may be protected indirectly by associating the data set with a particular physical structure or use.

A new protein sequence may be patentable even if the structures of homologous proteins, such as naturally occurring isoforms or species or allelic variants, are known. However, the scope of any available patent protection may be very narrow, because the new protein structure must meet all the statutory requirements, including novelty and non-obviousness. Patent protection may also be available for a new protein structure even if another patent has already been issued with claims to the general features of the new protein structure, such as the general structural features of a protein family.

Again, the new protein structure must have novel and non-obvious structural features in order to obtain patent protection in view of the issued patent. However, the patent does not give the patent holder any right to practice the claimed invention; it only gives the patent holder the right to exclude others from practicing the claimed invention.

There are many search options and patented sequence search tools for Sequence Similarity Searching.

Sequence Similarity Searching is a method of searching protein sequence by aligning to a query sequence. By statistically assessing how well database and query sequences match, one can infer homology and transfer information to the query sequence.

The following tools can be used for Sequence Similarity Searching:


NCBI BLAST is the most commonly used sequence similarity search tool. It uses heuristics to perform fast local alignment searches.


FASTA is commonly used sequence similarity search tool which uses heuristics for fast local alignment searching.


SSEARCH is an optimal (as opposed to heuristics-based) local alignment search tool using the Smith-Waterman algorithm. Optimal searches guarantee you find the best alignment score for your given parameters.

  • PSI-Search

PSI-Search combines the sensitivity of the Smith-Waterman search algorithm (SSEARCH) with the PSI-BLAST profile construction strategy to find distantly related protein sequences.

  • GGSEARCH More Information & Help Documentation

GGSEARCH performs optimal global-global alignment searches using the Needleman-Wunsch algorithm.

  • GLSEARCH More Information & Help Documentation

GLSEARCH performs an optimal sequence search using alignments that are global in the query but local in the database sequence. This can be useful when you want to match all of a short query sequence to part of a larger database sequence.

  • FASTM/S/F More Information & Help Documentation

These specialist programs allow searches of databases using a group of short peptides as the query.


PSI-BLAST allows users to construct and perform a BLAST search with a custom, position-specific, scoring matrix which can help find distant evolutionary relationships. PHI-BLAST functionality is also available to restrict results using patterns.

Therefore it is well understood that similarity searches can provide the incidence of interrelated and complex patenting positions; and also help to detect additional effects of patent density, i.e. of encountering a large number of similar patents in one‘s environment.

(Featured image source:

Pragya Arya
Pragya Arya

Dr. Pragya is a pharmaceutical patent expert and has extensive experience as a research scientist and Intellectual Property Specialist in generic pharmaceutical manufacturing and the chemical industry.

Post a Comment

Your email address will not be published. Required fields are marked *