os.path.exists()谎言

2024-10-01 04:59:57 发布

您现在位置:Python中文网/ 问答频道 /正文

我在linux集群上运行许多python脚本,一个作业的输出通常是另一个脚本的输入,可能在另一个节点上运行。我发现,在python注意到在其他节点上创建的文件之前,有一些不明显的延迟——os.path.exists()返回false,open()也失败。在文件出现之前,我可以执行一段时间而不是os.path.exists(mypath)循环,这可能需要一整分钟以上的时间,这在具有多个步骤并可能并行运行多个数据集的管道中是不理想的。

到目前为止,我找到的唯一解决方法是调用subprocess.Popen(“ls%s”%(pathdir),shell=True),它神奇地解决了这个问题。我想这可能是一个系统问题,但是python有可能导致这个问题吗?什么缓存什么的?到目前为止,我的系统管理员没有太大帮助。


Tags: 文件数据path脚本false节点oslinux
2条回答

这个问题与Python进程在自己的shell中运行有关。当您运行subprocess.Popen(shell=True)时,您正在生成一个新的shell,它正在解决您遇到的问题。

Python没有引起这个问题。它结合了NFS(文件存储)和目录列表在Linux中的功能。

os.path.exists()只需调用C库的stat()函数。

我相信你在内核的NFS实现中遇到了一个缓存。下面是一个指向页面的链接,该页面描述了该问题以及刷新缓存的一些方法。

File Handle Caching

Directories cache file names to file handles mapping. The most common problems with this are:

•You have an opened file, and you need to check if the file has been replaced by a newer file. You have to flush the parent directory's file handle cache before stat() returns the new file's information and not the opened file's.

◦Actually this case has another problem: The old file may have been deleted and replaced by a new file, but both of the files may have the same inode. You can check this case by flushing the open file's attribute cache and then seeing if fstat() fails with ESTALE.

•You need to check if a file exists. For example a lock file. Kernel may have cached that the file does not exist, even if in reality it does. You have to flush the parent directory's negative file handle cache to to see if the file really exists.

A few ways to flush the file handle cache:

•If the parent directory's mtime changed, the file handle cache gets flushed by flushing its attribute cache. This should work quite well if the NFS server supports nanosecond or microsecond resolution.

•Linux: chown() the directory to its current owner. The file handle cache is flushed if the call returns successfully.

•Solaris 9, 10: The only way is to try to rmdir() the parent directory. ENOTEMPTY means the cache is flushed. Trying to rmdir() the current directory fails with EINVAL and doesn't flush the cache.

•FreeBSD 6.2: The only way is to try to rmdir() either the parent directory or the file under it. ENOTEMPTY, ENOTDIR and EACCES failures mean the cache is flushed, but ENOENT did not flush it. FreeBSD does not cache negative entries, so they do not have to be flushed.

http://web.archive.org/web/20100912144722/http://www.unixcoding.org/NFSCoding

相关问题 更多 >