Abstract:
DBSCAN algorithm is a location-based clustering approach; it is used to find
relationships and patterns in geographical data. Because of its widespread
application, several data science-based programming languages include the
DBSCAN method as a built-in function. Researchers and data scientists have
been clustering and analyzing their study data using the built-in DBSCAN
functions. All implementations of the DBSCAN functions require user input for
radius distance (i.e., eps) and a minimum number of samples for a cluster (i.e.,
min_sample). As a result, the result of all built-in DBSCAN functions is believed
to be the same. However, the DBSCAN Python built-in function yields different
results than the other programming languages those are analyzed in this study.
We propose a scientific way to assess the results of DBSCAN built-in function, as
well as output inconsistencies. This study reveals various differences and
advises caution when working with built-in functionality.