-
Hi everyone, I recently tried out yasa's automatic sleep staging on my dataset, and the overall accuracy was good at around 82% compared to manual scoring. However, I've noticed that some predictions are strange. In some cases, yasa predicted stages as N2 while they supposed to be N3. I also checked the spectrograms, those stages have strong power in delta band. This issue only occurred in just a few cases but includes both high- and low-accuracy cases. Figures are attached below, they were from different participants. Two vertical black lines are lights-out and lights-on, accuracy rates were calculated within this window. I understand some prediction's missing are acceptable but this missing N3 periods were last too long, especially in the low-accuracy cases. Dose anyone had encountered this issue before and know why? Or have any suggestions for me in what ways I can figure this out? Thank you so much! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Hi Zy, thanks for the detailed post! After looking this over, my first thought is that YASA seems to be functioning properly. As you said, you have generally good agreement between YASA and your human scorer (82% is right around expected across all samples). Naturally, your 82% is going to include a range of good/bad agreement, and in cases of bad agreement, confusion between N2/N3 is highly expected. See the YASA validation paper, specifically Figure 1C, where it shows that the most common mislabeling of "true" N3 is N2 by far. Regarding the visual display of delta power, note there are many other features the algorithm considers (see Figure 1C supplement 7 for a chart of features and their relative importance). So in brief, I think your results are consistent with how YASA generally performs. Of course, whether these are acceptable for your needs is totally up to you. But here are a few more things to consider, if you haven't already:
|
Beta Was this translation helpful? Give feedback.
Hi Zy, thanks for the detailed post! After looking this over, my first thought is that YASA seems to be functioning properly. As you said, you have generally good agreement between YASA and your human scorer (82% is right around expected across all samples). Naturally, your 82% is going to include a range of good/bad agreement, and in cases of bad agreement, confusion between N2/N3 is highly expected. See the YASA validation paper, specifically Figure 1C, where it shows that the most common mislabeling of "true" N3 is N2 by far. Regarding the visual display of delta power, note there are many other features the algorithm considers (see Figure 1C supplement 7 for a chart of features and th…