@p last week we racked a server full of 12 10TB disks and 3/4 dozen NVMe and I finally started working on it after getting home and one of the drives shit out errors like crazy, in a soft raid10 it was marked as failed but i removed it via mdadm, used wipefs to remove the RAID headers from the disk and readded it to the RAID and it's rebuilding without issues
hoping the disk passes this recovery would be shitty to replace a drive literally the first couple hours i use it
@graf Well, if there's a 0.1% defect rate, that's a 1.2% chance that there'll be at least one defective one in a batch of 12. Not completely unheard-of but not impossible.
@p@splitshockvirus i think the failure might have happened when i was making other raids on the same bus. the drive seems fine and passes smart tests so i’m rebuilding assuming resetting it’s status is enough. i don’t think the drive is at fault right now. but something definitely fucked up and in production i can’t have that
Add comment