T cells monitor the health status of cells by identifying foreign peptides
displayed on their surface. T-cell receptors (TCRs), which are protein
complexes found on the surface of T cells, are able to bind to these peptides.
This process is known as TCR recognition and constitutes a key step for immune
response. Optimizing TCR sequences for TCR recognition represents a fundamental
step towards the development of personalized treatments to trigger immune
responses killing cancerous or virus-infected cells. In this paper, we
formulated the search for these optimized TCRs as a reinforcement learning (RL)
problem, and presented a framework TCRPPO with a mutation policy using proximal
policy optimization. TCRPPO mutates TCRs into effective ones that can recognize
given peptides. TCRPPO leverages a reward function that combines the
likelihoods of mutated sequences being valid TCRs measured by a new scoring
function based on deep autoencoders, with the probabilities of mutated
sequences recognizing peptides from a peptide-TCR interaction predictor. We
compared TCRPPO with multiple baseline methods and demonstrated that TCRPPO
significantly outperforms all the baseline methods to generate positive binding
and valid TCRs. These results demonstrate the potential of TCRPPO for both
precision immunotherapy and peptide-recognizing TCR motif discovery.