Clustering
Classes¶
Tak(array, index_patients, dict_label_id, timescale, evt_log)
¶
Defines the TAK object.
Initialize the Tak class.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
array
|
NDArray
|
1 row = 1 patient, 1 column = 1 timestamp |
required |
index_patients
|
NDArray
|
patient IDs in the same order as in |
required |
dict_label_id
|
dict
|
dictionary mapping event names to their IDs |
required |
timescale
|
int
|
time window size (in days); sequences may be resampled if |
required |
evt_log
|
DataFrame
|
initial event log used by TAK |
required |
Source code in opentak/clustering.py
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 |
|
Functions¶
fit()
¶
Fit the TAK model (to be implemented by subclasses).
Source code in opentak/clustering.py
53 54 55 56 57 |
|
get_list_indices_cluster(list_ids_cluster=None)
¶
Compute indices in list_ids_cluster corresponding to patients in list_ids_cluster
Parameters:
Name | Type | Description | Default |
---|---|---|---|
list_ids_cluster
|
list | None
|
list of patient ids in the cluster format |
None
|
Returns:
Type | Description |
---|---|
list of arrays containing indices for each cluster |
Source code in opentak/clustering.py
59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 |
|
get_sorted_array(list_ids_cluster=None)
¶
Compute the sorted array corresponding to list ids clusters
Parameters:
Name | Type | Description | Default |
---|---|---|---|
list_ids_cluster
|
list | None
|
list of patient ids in the cluster format, if None uses self.list_ids_clusters |
None
|
Returns:
Type | Description |
---|---|
list of arrays containing sorted sequences for each cluster |
Source code in opentak/clustering.py
76 77 78 79 80 81 82 83 84 85 86 |
|
TakHca(array, index_patients, dict_label_id, timescale, evt_log)
¶
Bases: Tak
Classic TAK using hierarchical clustering.
Initialize the TakHca class.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
array
|
NDArray
|
1 line = 1 patient, 1 column = 1 timestamp |
required |
index_patients
|
NDArray
|
patients IDs in the same order as the array matrix |
required |
dict_label_id
|
dict
|
dictionary mapping the name of the event to its id |
required |
timescale
|
int
|
time windows size (in days) (resampling if !=1) |
required |
evt_log
|
DataFrame
|
Initial base used by the tak |
required |
Source code in opentak/clustering.py
92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 |
|
Functions¶
compute_pdist(distance='hamming', subset_array=None)
¶
Compute pairwise distance between patients' sequences.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
distance
|
_Metric
|
Computation method for pairwise distance. Default to "Hamming" |
'hamming'
|
subset_array
|
NDArray | None
|
subset of patients. If not provided the pairwise distance will be computed for all patients. |
None
|
Returns:
Type | Description |
---|---|
Tak
|
instance |
Source code in opentak/clustering.py
116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 |
|
get_clusters(n_clusters=1, method='ward', patient_ids=None, optimal_ordering=True)
¶
Clusters patients' sequences.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
n_clusters
|
int
|
number of clusters to create |
1
|
method
|
LinkageMethod
|
linkage method ("ward", "single", "complete", "average") |
'ward'
|
patient_ids
|
Sequence | None
|
list of patients' ids to cluster, if None, all patients are used |
None
|
optimal_ordering
|
bool
|
reorder tree leaves (longer computation time) |
True
|
Returns:
Type | Description |
---|---|
tuple[ndarray, ndarray]
|
tuple of (cluster number for each patient, list of patient indices in optimal order) |
Source code in opentak/clustering.py
177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 |
|
fit(n_clusters=1, method='ward', distance='hamming', optimal_ordering=True)
¶
Cluster patients' sequences.
Shorthand for: 1. Computing pairwise distances 2. Building the linkage matrix 3. Ordering patients by dendrogram leaves
Parameters:
Name | Type | Description | Default |
---|---|---|---|
n_clusters
|
int
|
number of clusters to create |
1
|
method
|
LinkageMethod
|
linkage method ("ward", "single", "complete", "average") |
'ward'
|
distance
|
_Metric
|
pairwise distance method |
'hamming'
|
optimal_ordering
|
bool
|
whether to reorder tree leaves (optimal ordering) |
True
|
Returns:
Type | Description |
---|---|
Tak
|
TAK fitted |
Source code in opentak/clustering.py
226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 |
|