NPTEL Data Mining Assignment 7 Week 7 Answers 2023

NPTEL Data Mining Assignment 7 Answers 2023:

#### Q.1. Which of the following statement is NOT true about clustering?

**a. It is a supervised learning technique**- b. It is an unsupervised learning technique
- c. It is also known as exploratory data analysis
- d. It groups data into homogeneous groups

#### Q.2. Which of the following clustering technique start with the points as individual clusters and, at each step, merge the closest pair of clusters

- a. K-Means clustering
- b. DBSCAN
- c. Divisive clustering
**d. Agglomerative clustering**

#### Q.3. DBSCAN is a___________ algorithm

**a. Partitional clustering**- b. Hierarchical clustering
- c. Fuzzy clustering
- d. Complete clustering

#### Q.4. The Euclidean distance matrix between four 2-dimensional points, p1, p2, p3, and p4, is shown below. A possible set of co-ordinate values of these points are:

- a. p1=(0, 0), p2=(0, 1), p3=(1, 0), p4=(1, 1)
**b. p1=(0, 0), p2=(1, 0), p3=(1, 1), p4=(0, 1)**- c. p1=(1, 0), p2=(0, 0), p3=(1, 1), p4=(0, 1)
- d. p1=(0, 0), p2=(1, 1), p3=(1, 0), p4=(0, 1)

#### Q.5.** **The leaves of a dendogram in hierarchical clustering represent?

**a. Individual data points**- b. Clusters of multiple data points
- c. Distances between data points
- d. Cluster membership of the data points

#### Q.6. Distance between two clusters in complete linkage clustering is defined as:

- a. Distance between the closest pair of points between the clusters
**b. Distance between the furthest pair of points between the clusters**- c. Distance between the most centrally located pair of points in the clusters
- d. None of the above

#### Q.7. Consider a set of five 2-dimensional points p1=(0, 0), p2=(5, 0), p3=(5, 1), p4=(0, 1), and p5=(0, 0.5). Euclide-an distance is the distance function. Single linkage clustering is used to cluster the points into two clusters. The clusters are:

- a. {p1, p2, p3} {p4, p5}
**b. {p1, p4, p5} {p2, p3}**- c. {p1, p2, p5} {p3, p4}
- d. {p1, p2, p4} {p3, p5}

#### Q.8. Consider a set of five 2-dimensional points p_{1}=(0, 0), p_{2}=(5, 0), p_{3}=(5, 1), p_{4}=(0, 1), and p_{5}=(0, 0.5). Euclide-an distance is the distance function. Complete linkage clustering is used to cluster the points into two clus-ters. The clusters are:

#### Q.9. Consider a set of five 2-dimensional points p_{1}=(0, 0), p_{2}=(5, 0), p_{3}=(5, 1), p_{4}=(0, 1), and p_{5}=(0, 0.5). Euclidean distance is the distance function. The k-means algorithm is used to cluster the points into two clusters. The initial cluster centers are p1 and p5. The clusters after two iterations of k-means are:

**a. {p1, p4, p5} {p2, p3}**- b. {p1, p2, p3} {p4, p5}
- c. {p3, p4, p5} {p1, p2}
- d. {p1, p2, p4} {p3, p5}

#### Q.10. Given a set of seven 2-dimensional points p_{1}=(0, 0), p_{2}=(5, 0), p_{3}=(5, 1), p_{4}=(0, 1), p_{5}=(0, 0.5), p_{6}=(0, 9), and p_{7}=(5.5, 1). Euclidean distance is the distance function. The DBSCAN algorithm is used to cluster the points. Epsilon = 1, and MinPts = 2 is used for DBSCAN. The clusters and outliers obtained are:

- a. Clusters: {p1, p3, p4, p5} {p2, p7}; Outlier: p6
- b. Clusters: {p1, p2, p3} {p4, p5, p6}; Outlier: p7
**c. Clusters: {p1, p4, p5} {p2, p3, p7}; Outlier: p6**- d. Clusters: {p1, p4, p5} {p2, p3, p6}; Outlier: p7

