SW/FS/SVC Volume Mobility

SAN Volume Controller / Storwize / Flash System version 8.4.2 allows you to non-disruptively migrate a LUN between array controller clusters. It’s set up like remote copy, except you can map the remote copy to the same host at the same time. The remote copy becomes non-preferred paths for the same vdisk and vdisk ID. Then you can switch who is primary. Then you can remove the old copy.

Here is someone who did a demo video: https://www.youtube.com/watch?v=NpcOoshkm4w


reducevg very slow

This is an APAR, but really it’s a description. Reducevg sends the equivalent of TRIM commands, but on a storage array, this is writing nulls. On a big LUN, or with a busy array, this can take a long time. If you do not need to worry about this, then you can disable that space reclaim.

ioo -o -dk_lbp_enabled=0

Here is the IBM doc about it.

 

IJ23045: REDUCEVG UNCLEAR ON DELAY WHEN WAITING FOR INFLIGHT RECLAIM REQ APPLIES TO AIX 7100-05

 

A fix is available

APAR status

  • Closed as program error.

Error description

  • reducevg may be unclear, why there is some delay
    when waiting on inflight reclaim requests.
    

Local fix

  • Disable space reclamation by running:
    ioo -o dk_lbp_enabled=0
    

Problem summary

  • reducevg may be unclear, why there is some delay
    when waiting on inflight reclaim requests.
    

Problem conclusion

  • reducevg displays message incase there are space reclamation
    IOs inflight to indicate reducevg may take some time to
    complete.

SVC, StorWize, FlashSystem, Spectrum Virtualize – replace a drive

When you replace a drive on one of these, mdisk arrays do not auto-rebuild.

If the GUI fix procedures go away, or never show up, or whatever causes the replacement drive to not get included as a new drive in the mdisk, you can do this manually.

 

First, look for the candidate or spare drive you want to use.

lsdrive | grep -v member

 

Then, make sure that drive ID is a candidate:

chdrive -use candidate 72

 

Then, find the missing member:

lsarraymember mdisk1 | grep -v exact

 

Then, set the new drive to use that missing member ID:

charraymember -member 31 -newdrive 72 mdisk1

 

You can watch the progress of the rebuild:

lsarraymemberprogress mdisk1


IBM SVC / Storwize DRP – Still a no-go

IBM released “Data Reduction Pools” for their SAN Volume Controller, which includes Storwize, FlashSystem etc products back in early 2018. The internal flag for it is “deduplication”, but it can be used for thin, compressed, or deduplicated volumes. You can put thick volumes in them too, but that is more of a “block off this space” thing.

DRP is a “log structured device”, meaning it is random read, but always sequential write. When you enable DRP, it inserts into the high cache (front-end cache). Because of this, it adds 3-4ms of latency right off the bat. Some of the higher end controllers can reduce that to 2-3ms, but it never goes away. Also, this affects ALL storage pools (mdisk groups), not just the ones built with DRP enabled. IBM product engineering says this is not a defect. It’s just they way it is, because you have to twiddle the data on the CPU.

In late 2018, I found one of their code defects, which has been resolved; however, that defect highlighted a troubling design choice. DRP is a whole-cluster thing, not a whole I/O group thing, nor a whole pool thing. That means, if you trigger a defect that kills DRP for one pool, it will kill DRP for every pool. If you’re in a hyperswap or stretched cluster, and something happens (future defect maybe) to take one site’s DRP offline while the nodes are still live, then that could take out BOTH sites.

This year, I found out that their sizing partner, “Intellimagic”, does not understand DRP. When a “Disk Magic” report comes out, it can recommend DRP. It cannot tell how much cache affects things, so it just recommends maximum cache. It cannot tell how much DRP affects things, so it cannot recommend against it for latency sensitive workloads. The IBM technical sales team is trying to work around this by simply recommending the next model up when DRP is to be used.

However, that is not enough. DRP garbage collection is VERY heavy handed. There is no way to throttle it. It’s a log structured array, so when you re-write a block on a LUN, the old block gets invalidated, then the old chunk gets read, then new data is packed in, then it’s written to the end of the array. This is sequential for each major section of the pool, and it has additional latency added by the code. We found that a normal system of 25% writes would be suppressed by 80% in DRP. Latency during normal operations would fluctuate from 6 to 18, and that’s to the client. Any major I/O operations on the client (big rewrites, or big block freeing), or any vdisk (LUN) deletions would cause latency to fluctuate from 16-85ms (completely unusable).

Once DRP is enabled, it affects the whole system. You cannot turn it off for a pool. You have to delete that pool. You cannot migrate a vdisk out of nor into a DRP, you have to mirror to a new pool, then delete or split the old copy. Funny thing is, that’s just what a migrate does at all but the lowest levels. It’s been 2 years, that really should have been folded into the standard commands by now.

Once you have everything split, and all of the hosts are running on the new, non-DRP enabled pool, that is not enough. All of the performance problems will still be there. It is only once you delete the old pool that performance returns to normal. Since vdisk deletions take several hours each, and running more than 4 at a time causes severe, deadly latency, it’s best to just verify nothing is in use and force-delete the whole DRP-enabled pool. Risky, but otherwise, you could spend weeks deleting the old LUNs by hand due to how slow they clear.

The sales materials, technical whitebooks, and the gritty redbooks all recommend DRP as the best thing ever. Staff and books shy away from highlighting any of these limitations. Other products handle deduplication without this sort of negative impact, but here, even with no deduplication in use, just the OPTION of using dedupe, it causes the system to be unusable.

Maybe there will be a “fix” or a “rewrite”, but I’ve been burned hard, with severe customer satisfaction, over two years. This negatively impacts trust, and is not the only IBM storage product causing me trust issues.

I expect that this is the fallout from Ginny Rometty’s command, “Don’t try to protect the past”. IBM’s vision is stated as “Cloud, Data, and Watson”. It will be interesting to see exactly how that translates into a future structure. IBM has been divesting a lot of pieces lately. Rometty ran services, and that’s what she knows as Chair and CEO. Anything she sees as a commodity, or low margin will be spun off. (IBM seems to need at least a 6:1 revenue to cost in order to operate. There are a lot of layers.) I’m starting to wonder if Storage and POWER will be sold off to Lenovo. I see hints, but no roadmaps, nothing concrete.


SVC mdisk stuck in “degraded_ports”

DS4000/5000 series array
A couple paths go offline and come back due to cabling changes.
Clear the errors from monitoring -> events
Verify the storage manager shows proper mappinging, and has no alarms.
The mdisk (LUN) is still stuck at “Degraded Ports”.

The fix? Go to the CLI and run “svctask includemdisk