Software development comprises the use of multiple Third-Party Libraries
(TPLs). However, the irrelevant libraries present in software application's
distributable often lead to excessive consumption of resources such as CPU
cycles, memory, and modile-devices' battery usage. Therefore, the
identification and removal of unused TPLs present in an application are
desirable. We present a rapid, storage-efficient, obfuscation-resilient method
to detect the irrelevant-TPLs in Java and Python applications. Our approach's
novel aspects are i) Computing a vector representation of a .class file using a
model that we call Lib2Vec. The Lib2Vec model is trained using the Paragraph
Vector Algorithm. ii) Before using it for training the Lib2Vec models, a .class
file is converted to a normalized form via semantics-preserving
transformations. iii) A eXtra Library Detector (XtraLibD) developed and tested
with 27 different language-specific Lib2Vec models. These models were trained
using different parameters and >30,000 .class and >478,000 .py files taken from
>100 different Java libraries and 43,711 Python available at MavenCentral.com
and Pypi.com, respectively. XtraLibD achieves an accuracy of 99.48% with an F1
score of 0.968 and outperforms the existing tools, viz., LibScout, LiteRadar,
and LibD with an accuracy improvement of 74.5%, 30.33%, and 14.1%,
respectively. Compared with LibD, XtraLibD achieves a response time improvement
of 61.37% and a storage reduction of 87.93% (99.85% over JIngredient). Our
program artifacts are available at this https URL