{"id":469,"date":"2012-10-19T15:49:01","date_gmt":"2012-10-19T19:49:01","guid":{"rendered":"https:\/\/infotechguy.net\/?p=469"},"modified":"2025-02-22T11:29:37","modified_gmt":"2025-02-22T16:29:37","slug":"mdadm-recovering-from-drive-failure","status":"publish","type":"post","link":"https:\/\/infotechguy.net\/?p=469","title":{"rendered":"Linux &#8212; Recovering from Drive Failure with mdadm"},"content":{"rendered":"<p>So it happened. I had a drive fail on me. Degrading my RAID 6 media server. Luckily I was notified by mdadm and was able to order a new one from newegg.com and rebuild it.<\/p>\n<p>I want to walk through the steps I took getting my RAID file system backup and running, starting with the notification I received to my gmail account (which i received on my phone).<\/p>\n<p><!--more--><\/p>\n<ol>\n<li style=\"list-style-type: none;\">\n<ol>\n<li>Screen shot of email regarding the drive failure.<br \/>\n<a href=\"https:\/\/infotechguy.net\/wp-content\/uploads\/2021\/03\/email_notification.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-3570\" src=\"https:\/\/infotechguy.net\/wp-content\/uploads\/2021\/03\/email_notification-300x142.png\" alt=\"\" width=\"600\" height=\"284\" srcset=\"https:\/\/infotechguy.net\/wp-content\/uploads\/2021\/03\/email_notification-300x142.png 300w, https:\/\/infotechguy.net\/wp-content\/uploads\/2021\/03\/email_notification-768x364.png 768w, https:\/\/infotechguy.net\/wp-content\/uploads\/2021\/03\/email_notification.png 826w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><\/a><\/li>\n<li>At home, I could see on the chassis the error light indicator on the drive bay with the failed Hard Disk. Luckly I had RAID 6 which allowed the RAID 1 more disk of fault tolerance.<\/li>\n<li>On the server command line, I needed to inform mdadm that I was removing the drive from the array. So first make sure the logical disk representing the array is unmounted. In my case <strong>\/dev\/md0<\/strong>.<\/li>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">umount \/dev\/md0<\/pre>\n<li>Check which drive has failed.\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">mdadm --detail \/dev\/md0<\/pre>\n<p><a href=\"https:\/\/infotechguy.net\/wp-content\/uploads\/2021\/03\/mdadm-onfailure.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-3572\" src=\"https:\/\/infotechguy.net\/wp-content\/uploads\/2021\/03\/mdadm-onfailure-300x171.png\" alt=\"\" width=\"600\" height=\"342\" srcset=\"https:\/\/infotechguy.net\/wp-content\/uploads\/2021\/03\/mdadm-onfailure-300x171.png 300w, https:\/\/infotechguy.net\/wp-content\/uploads\/2021\/03\/mdadm-onfailure-1024x585.png 1024w, https:\/\/infotechguy.net\/wp-content\/uploads\/2021\/03\/mdadm-onfailure-768x439.png 768w, https:\/\/infotechguy.net\/wp-content\/uploads\/2021\/03\/mdadm-onfailure.png 1280w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><\/a><br \/>\nAs you can see from the screenshot above, mdadm is aware of the drive failure and which drive has failed. Also notice the State: information. Clean, Degraded. Which is a good thing, cause if not it would mean the RAID is not recoverable.<\/li>\n<li>Next we have to remove the drive from mdadm&#8217;s knowledge.\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">mdadm --manage \/dev\/md0 --remove \/dev\/sdj1<\/pre>\n<p>If you rerun the command at step 4, you should no longer see \/dev\/sdj1 in mdadm scope.<\/li>\n<li>Put the new drive in, and prepare it.<br \/>\nDelete any partitions that may be on the new drive and create 1 primary partition that takes up all the blocks on the disk. Lastly set it to linux raid filesystem type.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">fdisk \/dev\/sdj\n......................................\nCommand (m for help):<strong>n<\/strong>\nCommand action\n\te\textended\n\tp\tprimary partition (1-4)\nPartition <strong>p<\/strong>\nPartition number (1-4): <strong>1<\/strong>\nFirst cylinder (1-182401, default 1): <strong>[blank]<\/strong>\nLast cylinder or +size or +sizeM or +sizeK (1-182401, default 182401): <strong>[blank]<\/strong>\n\nCommand (m for help): <strong>t<\/strong>\nHex code (type L to list codes): <strong>fd<\/strong>\nChanged system type of partition 1 to fd (Linux raid auto)\n\nCommand (m for help): <strong>w<\/strong>\nThe partition table has been altered!\n\nCalling ioctl() to re-read partition table.\nSyncing disks.\n\n<\/pre>\n<\/li>\n<li>Now before we add the drive back and begin the recovery process, let&#8217;s increase our raid drive I\/O speed, as according to <a href=\"http:\/\/zackreed.me\/articles\/48-adding-an-extra-disk-to-an-mdadm-array\" target=\"_blank\" rel=\"noopener noreferrer\">zackreed<\/a>.\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">echo 50000 &gt; \/proc\/sys\/dev\/raid\/speed_limit_min \necho 200000 &gt; \/proc\/sys\/dev\/raid\/speed_limit_max<\/pre>\n<p>Effectively what we are doing is we are increasing the throttling <strong>mdadm<\/strong> can use to transfer blocks in the RAID. This is limited to the transfer speed of your Hard Disk device regardless of what you set the minimum to. Therefore, in my case the Hard Disks are SATA-II and get average 45MB\/s (guessing, haven&#8217;t done a speed test in awhile). Our first statement sets <strong>50000 bytes to the speed_limit_min<\/strong> throttle, this is well above the drive&#8217;s speed limitation, so don&#8217;t be alarmed in the next step when the transfer speed doesn&#8217;t show exactly 50MB\/s when rebuilding the RAID array.<\/li>\n<li>\/dev\/sdj1 should now be available and prepared to use with the RAID array. Add it to the array.\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">mdadm --manage \/dev\/md0 --add \/dev\/sdj1<\/pre>\n<p>Now you can watch the progress.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">watch cat \/proc\/mdstat<\/pre>\n<p><a href=\"https:\/\/infotechguy.net\/wp-content\/uploads\/2021\/03\/mdadm-recovery.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-3573\" src=\"https:\/\/infotechguy.net\/wp-content\/uploads\/2021\/03\/mdadm-recovery-300x171.png\" alt=\"\" width=\"600\" height=\"342\" srcset=\"https:\/\/infotechguy.net\/wp-content\/uploads\/2021\/03\/mdadm-recovery-300x171.png 300w, https:\/\/infotechguy.net\/wp-content\/uploads\/2021\/03\/mdadm-recovery-1024x585.png 1024w, https:\/\/infotechguy.net\/wp-content\/uploads\/2021\/03\/mdadm-recovery-768x439.png 768w, https:\/\/infotechguy.net\/wp-content\/uploads\/2021\/03\/mdadm-recovery.png 1280w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><\/a><\/p>\n<p>Also, if you rerun the <strong>mdadm &#8211;detail \/dev\/md0<\/strong>command you see that it is reporting rebuilding the drive.<br \/>\n<a href=\"https:\/\/infotechguy.net\/wp-content\/uploads\/2021\/03\/rebuilding-array.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-3574\" src=\"https:\/\/infotechguy.net\/wp-content\/uploads\/2021\/03\/rebuilding-array-300x171.png\" alt=\"\" width=\"602\" height=\"343\" srcset=\"https:\/\/infotechguy.net\/wp-content\/uploads\/2021\/03\/rebuilding-array-300x171.png 300w, https:\/\/infotechguy.net\/wp-content\/uploads\/2021\/03\/rebuilding-array-1024x585.png 1024w, https:\/\/infotechguy.net\/wp-content\/uploads\/2021\/03\/rebuilding-array-768x439.png 768w, https:\/\/infotechguy.net\/wp-content\/uploads\/2021\/03\/rebuilding-array.png 1280w\" sizes=\"auto, (max-width: 602px) 100vw, 602px\" \/><\/a><\/li>\n<li>After a few hours, the watch cat \/proc\/mdstat stated that the recovery was no longer going on. Check mdadm details.\n<pre><code>mdadm --detail \/dev\/md0<\/code><\/pre>\n<p><a href=\"https:\/\/infotechguy.net\/wp-content\/uploads\/2021\/03\/rebuild-complete.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-3575\" src=\"https:\/\/infotechguy.net\/wp-content\/uploads\/2021\/03\/rebuild-complete-300x171.png\" alt=\"\" width=\"602\" height=\"343\" srcset=\"https:\/\/infotechguy.net\/wp-content\/uploads\/2021\/03\/rebuild-complete-300x171.png 300w, https:\/\/infotechguy.net\/wp-content\/uploads\/2021\/03\/rebuild-complete-1024x585.png 1024w, https:\/\/infotechguy.net\/wp-content\/uploads\/2021\/03\/rebuild-complete-768x439.png 768w, https:\/\/infotechguy.net\/wp-content\/uploads\/2021\/03\/rebuild-complete.png 1280w\" sizes=\"auto, (max-width: 602px) 100vw, 602px\" \/><\/a><\/li>\n<li>Lastly remount te drive and do a fsck check on it, for mine th array is in ext4 so,\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\">fsck.ext4 -f \/dev\/md0<\/pre>\n<p>This will do an consistency check on the file-system and should come back clean.<\/li>\n<\/ol>\n<p>That&#8217;s it! Recovered from a drive failure and all my data on the RAID array is intact.<\/p>\n<p><b>Sources:<\/b><\/p>\n<ul>\n<li><a title=\"http:\/\/zackreed.me\/articles\/48-adding-an-extra-disk-to-an-mdadm-array\" href=\"http:\/\/zackreed.me\/articles\/48-adding-an-extra-disk-to-an-mdadm-array\" target=\"_blank\" rel=\"noopener noreferrer\">http:\/\/zackreed.me\/articles\/48-adding-an-extra-disk-to-an-mdadm-array<\/a><\/li>\n<li><a title=\"http:\/\/www.jamierf.co.uk\/2009\/11\/04\/software-raid-5-using-mdadm-in-ubuntu-9-10\/\" href=\"http:\/\/www.jamierf.co.uk\/2009\/11\/04\/software-raid-5-using-mdadm-in-ubuntu-9-10\/\" target=\"_blank\" rel=\"noopener noreferrer\">http:\/\/www.jamierf.co.uk\/2009\/11\/04\/software-raid-5-using-mdadm-in-ubuntu-9-10\/<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>So it happened. I had a drive fail on me. Degrading my RAID 6 media server. Luckily I was notified by mdadm and was able to order a new one from newegg.com and rebuild it.&#46;&#46;&#46;<\/p>\n","protected":false},"author":2,"featured_media":4240,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[7],"tags":[135,110],"class_list":["post-469","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-linux","tag-mdadm","tag-san"],"_links":{"self":[{"href":"https:\/\/infotechguy.net\/index.php?rest_route=\/wp\/v2\/posts\/469","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/infotechguy.net\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/infotechguy.net\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/infotechguy.net\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/infotechguy.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=469"}],"version-history":[{"count":1,"href":"https:\/\/infotechguy.net\/index.php?rest_route=\/wp\/v2\/posts\/469\/revisions"}],"predecessor-version":[{"id":4182,"href":"https:\/\/infotechguy.net\/index.php?rest_route=\/wp\/v2\/posts\/469\/revisions\/4182"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/infotechguy.net\/index.php?rest_route=\/wp\/v2\/media\/4240"}],"wp:attachment":[{"href":"https:\/\/infotechguy.net\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=469"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/infotechguy.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=469"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/infotechguy.net\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=469"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}