You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
tl;dr: I’m seeing 50-100x longer (a)syncq_wait read times for one drive in a 4 drive raidz2 vdev than the other 3 drives. It’s always just one drive, but it’s not the same drive all the time.
I have a raidz2 pool, r2d2, setup on 4 identical drives (Exos x16 14 TB). I ran a recursive hashing task (hashdeep) on the pool and saw pretty poor performance, 40MB/s from each drive. Given that the task is largely sequential reads I would have expected better throughput. I replicated the pool onto a single disk pool (also an Exos x16 14TB drive) and with the same task saw approximately 150MB/s speeds. Digging in deeper I ran zpool iostat -vyl 30 1
As you can see above, one member of the raidz2-0 vdev has (a)syncq_wait times that are 50x longer than the other 3 drives. My first thought was that maybe the drive was failing. But when I checked zpool iostat again sometime later, it was a different member that was exhibiting the longer wait times. I then monitored the output and saw that there is always one member of the vdev that is exhibiting these extended (a)syncq_wait times and that member changes over time.
Below is zpool iostat -vyw 30 1 output. Similar story to the output above, 3 members seem to have q_wait times that are between 1-500ms, and one member where they are 500ms-2s.This was run a few minutes after the output above and you can see that it’s now a different drive (32f5 vs 2b14) that is exhibiting the longer wait times.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
tl;dr: I’m seeing 50-100x longer (a)syncq_wait read times for one drive in a 4 drive raidz2 vdev than the other 3 drives. It’s always just one drive, but it’s not the same drive all the time.
I have a raidz2 pool, r2d2, setup on 4 identical drives (Exos x16 14 TB). I ran a recursive hashing task (hashdeep) on the pool and saw pretty poor performance, 40MB/s from each drive. Given that the task is largely sequential reads I would have expected better throughput. I replicated the pool onto a single disk pool (also an Exos x16 14TB drive) and with the same task saw approximately 150MB/s speeds. Digging in deeper I ran
zpool iostat -vyl 30 1
As you can see above, one member of the raidz2-0 vdev has (a)syncq_wait times that are 50x longer than the other 3 drives. My first thought was that maybe the drive was failing. But when I checked zpool iostat again sometime later, it was a different member that was exhibiting the longer wait times. I then monitored the output and saw that there is always one member of the vdev that is exhibiting these extended (a)syncq_wait times and that member changes over time.
Below is
zpool iostat -vyw 30 1
output. Similar story to the output above, 3 members seem to have q_wait times that are between 1-500ms, and one member where they are 500ms-2s.This was run a few minutes after the output above and you can see that it’s now a different drive (32f5 vs 2b14) that is exhibiting the longer wait times.Click to see full `zpool iostat -vyw 30 1` putput
Beta Was this translation helpful? Give feedback.
All reactions