시계열 클러스터링

주요 개념:
- DTW
- Soft DTW
- 무게 중심
- DBA
- 시계열 클러스터링

Dynamic Time Warping

시계열 데이터는 비슷한 패턴이라도 타이밍에 차이가 있는 경우가 흔함
DTW는 서로 다른 시계열을 최대한 비슷하게 맞춰서 비교

DTW(x, y) = min_π √∑_(i,j)∈π d(x, y)²

예제

시계열 머신러닝 라이브러리 tslearn 설치
```
!pip install tslearn
```

예제 데이터

import numpy as np
import matplotlib.pyplot as plt

x = np.array([1, 1.2, 1.3, 3, 1.6, 1.3])
y = np.array([1.2, 2.8, 2, 1.9, 1.7, 1.5])

plt.plot(x)
plt.plot(y)

경로와 유사도 계산

유사도만 계산

from tslearn.metrics import dtw, dtw_path
dtw_score = dtw(x, y)

경로와 유사도를 모두 계산

optimal_path, dtw_score = dtw_path(x, y)

발견한 경로에서 x와 y가 일치하는 것을 볼 수 있음

optimal_path = np.array(optimal_path)
plt.plot(x[optimal_path[:, 0]])
plt.plot(y[optimal_path[:, 1]])

시각화

optimal_path = np.array(optimal_path)
plt.plot(x)
plt.plot(y)
for i in range(len(optimal_path)):
    plt.plot(
        optimal_path[i],
        [x[optimal_path[i, 0]], y[optimal_path[i, 1]]], 
        color='gray', linestyle='dotted')

무게 중심 Barycenter

여러 개의 시계열 집합 D가 있을 때, 이 집합에 속하는 모든 시계열과 DTW가 가장 짧은 시계열 μ

min_μ ∑_x∈D DTW(μ, x)²

DTW Barycenter Averaging (DBA) 알고리즘

from tslearn.barycenters import dtw_barycenter_averaging
dataset = np.array([x, y])
b = dtw_barycenter_averaging(dataset)
plt.plot(x, linestyle='dotted')
plt.plot(y, linestyle='dotted')
plt.plot(b)

Soft-DTW

하나의 경로만 찾는 것이 아니라 여러 개의 경로를 부드럽게(soft) 합치는 방법

−γ log ∑_t exp[−aᵢ/γ]

γ가 클 수록 부드러운 형태가 됨. γ = 0이면 DTW와 같음

from tslearn.metrics import soft_dtw_alignment
alignment, score = soft_dtw_alignment(x, y, gamma=0.5)
plt.imshow(alignment, cmap='gray')

시각화

from tslearn.barycenters import softdtw_barycenter
b = softdtw_barycenter(dataset, gamma=0.5)
plt.plot(x, linestyle='dotted')
plt.plot(y, linestyle='dotted')
plt.plot(b)

클러스터링

예제 데이터 준비

from tslearn.clustering import TimeSeriesKMeans
from tslearn.datasets import CachedDatasets
from tslearn.preprocessing import TimeSeriesScalerMeanVariance, \
                                  TimeSeriesResampler

seed = 0
np.random.seed(seed)
X_train, y_train, X_test, y_test = CachedDatasets().load_dataset("Trace")
X_train = X_train[y_train < 4] # Keep first 3 classes
np.random.shuffle(X_train)
X_train = TimeSeriesScalerMeanVariance().fit_transform(X_train[:50])
X_train = TimeSeriesResampler(sz=40).fit_transform(X_train)
sz = X_train.shape[1]

유클리드 거리로 유사성 계산

km = TimeSeriesKMeans(n_clusters=3, verbose=True, random_state=seed)
y_pred = km.fit_predict(X_train)

plt.figure()
for yi in range(3):
    plt.subplot(1, 3, yi + 1)
    for xx in X_train[y_pred == yi]:
        plt.plot(xx.ravel(), "k-", alpha=.2)
    plt.plot(km.cluster_centers_[yi].ravel(), "r-")

DTW로 유사성 계산

dba_km = TimeSeriesKMeans(n_clusters=3, n_init=2, metric="dtw", verbose=True,
                          max_iter_barycenter=10, random_state=seed)
y_pred = dba_km.fit_predict(X_train)

for yi in range(3):
    plt.subplot(1, 3, yi + 1)
    for xx in X_train[y_pred == yi]:
        plt.plot(xx.ravel(), "k-", alpha=.2)
    plt.plot(dba_km.cluster_centers_[yi].ravel(), "r-")

Soft-DTW로 유사성 계산

sdtw_km = TimeSeriesKMeans(n_clusters=3, metric="softdtw",
                           metric_params={"gamma": .01},
                           verbose=True, random_state=seed)
y_pred = sdtw_km.fit_predict(X_train)

for yi in range(3):
    plt.subplot(1, 3, yi + 1)
    for xx in X_train[y_pred == yi]:
        plt.plot(xx.ravel(), "k-", alpha=.2)
    plt.plot(sdtw_km.cluster_centers_[yi].ravel(), "r-")

시계열 클러스터링​

Dynamic Time Warping​

예제​

경로와 유사도 계산​

시각화​

무게 중심 Barycenter​

DTW Barycenter Averaging (DBA) 알고리즘​

Soft-DTW​

시각화​

클러스터링​

유클리드 거리로 유사성 계산​

DTW로 유사성 계산​

Soft-DTW로 유사성 계산​

퀴즈​

시계열 클러스터링

Dynamic Time Warping

예제

경로와 유사도 계산

시각화

무게 중심 Barycenter

DTW Barycenter Averaging (DBA) 알고리즘

Soft-DTW

시각화

클러스터링

유클리드 거리로 유사성 계산

DTW로 유사성 계산

Soft-DTW로 유사성 계산

퀴즈