Extracting information from the electrocardiography (ECG) signal is an
essential step in the design of digital health technologies in cardiology. In
recent years, several machine learning (ML) algorithms for automatic extraction
of information in ECG have been proposed. Supervised learning methods have
successfully been used to identify specific aspects in the signal, like
detection of rhythm disorders (arrhythmias). Self-supervised learning (SSL)
methods, on the other hand, can be used to extract all the features contained
in the data. The model is optimized without any specific goal and learns from
the data itself. By adapting state-of-the-art computer vision methodologies to
the signal processing domain, a few SSL approaches have been reported recently
for ECG processing. However, such SSL methods require either data augmentation
or negative pairs, which limits the method to only look for similarities
between two ECG inputs, either two versions of the same signal or two signals
from the same subject. This leads to models that are very effective at
extracting characteristics that are stable in a subject, such as gender or age.
But they are not successful at capturing changes within the ECG recording that
can explain dynamic aspects, like different arrhythmias or different sleep
stages. In this work, we introduce the first SSL method that uses neither data
augmentation nor negative pairs for understanding ECG signals, and still,
achieves comparable quality representations. As a result, it is possible to
design a SSL method that not only captures similarities between two inputs, but
also captures dissimilarities for a complete understanding of the data. In
addition, a model based on transformer blocks is presented, which produces
better results than a model based on convolutional layers (XResNet50) with
almost the same number of parameters.