0x8007003B timeout copying large file to Samba server

PROBLEM: 0x8007003B timeout copying large file to Samba server

SOLUTION: This is an SMB.CONF issue, solved / fixed with this line:

strict allocate = no

DESCRIPTION: I had this issue for a long time, and mostly the web mocks people, tells them to do stupid things, or generally is unhelpful.  Lots of 2GB, or “your network” or “your firewall” or “turn off DPI” or whatever, none of it applicable to me.  I just accepted it, but decided to dig a little deeper today.

The exact amount of data written before it fails would vary, but the size from LS would always be the full file size.  Higher performance filesystems such as XFS, EXT4, JFS, all of them on NVMe arrays, I found I could get about 55GB allocated before timeout.  On spinning disk, it was much less, which is probably why many people fell down the rabbit hole of claiming 2GB limits, etc.

Strict Allocate = YES tells it to allocate the whole file upon request, which is what Windows does.  Samba says “OK, hold on”, and then times out.  Some people used powershell on a client to change the smbclient timeout to 600 seconds, or whatever, but that’s not really ideal, since it does not scale.

Strict Allocate = NO says to use normal UNIX semantics, where the file has no pre-allocated blocks, and allocates blocks only as the data comes in.  This starts with a fully sparse file, and data copy status on the windows client shows it processing immediately.  This is what we want for large files.  If it was only small files, then we don’t care.

I made this a global change.  I don’t need fully pre-allocated, non-sparse files on my file server.  It’s possible someone writing databases might need this, and you’d want to make sure you didn’t feed data faster than the kernel can allocate blocks.  Another one of those multiple filesystems kind of solutions.

When you play with tunables, you run into things that people don’t really know how to troubleshoot.  That’s what this is for, just so it shows up in web searches.


reducevg very slow

This is an APAR, but really it’s a description. Reducevg sends the equivalent of TRIM commands, but on a storage array, this is writing nulls. On a big LUN, or with a busy array, this can take a long time. If you do not need to worry about this, then you can disable that space reclaim.

ioo -o -dk_lbp_enabled=0

Here is the IBM doc about it.

 

IJ23045: REDUCEVG UNCLEAR ON DELAY WHEN WAITING FOR INFLIGHT RECLAIM REQ APPLIES TO AIX 7100-05

 

A fix is available

APAR status

  • Closed as program error.

Error description

  • reducevg may be unclear, why there is some delay
    when waiting on inflight reclaim requests.
    

Local fix

  • Disable space reclamation by running:
    ioo -o dk_lbp_enabled=0
    

Problem summary

  • reducevg may be unclear, why there is some delay
    when waiting on inflight reclaim requests.
    

Problem conclusion

  • reducevg displays message incase there are space reclamation
    IOs inflight to indicate reducevg may take some time to
    complete.