Finding possible sequence liabilities

Some motifs are known for propensity to oxidation, N-glycosylation, isomerization, hydrolysis, or deamidation under certain conditions. These kinds of liabilities can be problematic for manufacturing and stability. AntPack contains a simple tool you can use to search for motifs that may pose some risk of one or more of these reactions. We use the list of known possible problematic motifs from Satlawa et al., but excluding low risk motifs (e.g. [STK]N which is low-risk for deamidation).

Note that this type of search is prone to false-positives. A motif that can in principle be N-glycosylated or undergo pH-dependent hydrolysis will not always undergo these reactions; it depends on the context. You can think of this as casting a ‘wide net’ to find possible liabilities which can then be narrowed down to find those which should be removed.

class antpack.LiabilitySearchTool
__init__
analyze_seq

Searches for some common motifs which may correspond to possible development liabilities. Note that this may sometimes be a false positive; the presence of a possible N-glycosylation motif, for example, does not guarantee that N-glycosylation will occur. It does however identify sites where there is a risk. Currently only antibodies are allowed; TCRs are not supported.

Parameters:
  • sequence (str) – A sequence containing the usual 20 amino acids – no gaps. X is also allowed but should be used sparingly.

  • alignment (tuple) – A tuple containing (numbering, percent_identity, chain_name, error_message). This tuple is what you will get as output if you pass sequences to the analyze_seq method of SingleChainAnnotator or PairedChainAnnotator.

  • scheme (str) – The numbering scheme. One of ‘aho’, ‘imgt’, ‘kabat’ or ‘martin’. It is very important to use the same scheme that was used to number the sequence; using some other scheme may lead to incorrect motif identification.

  • cdr_scheme (str) – The scheme that is used for CDR definitions. This can be one of ‘imgt’, ‘aho’, ‘martin’, ‘kabat’, ‘north’. Note that you can use a different set of CDR definitions than the numbering scheme (e.g. number with IMGT and define CDRs using Kabat) although usually this will be the same as ‘scheme’.

Returns:

liabilities (list) – A list of tuples. The first element of each tuple is a 2-tuple of (starting position, ending position) numbered with the start of the sequence as 0, indicating the start and end of the liability. The second element of each tuple is a string describing the type of liability found. If the list is empty, no liabilities were found. If sequence numbering fails, the list contains a single tuple indicating the cause of the failure. (This can occur if the expected cysteines in the chain are not present at the expected positions, which is also a liability.)