<p>我还需要简单地列出一个桶的内容。理想情况下,我想要类似于tf.gfile提供的东西。gfile支持确定条目是文件还是目录。</p>
<p>我尝试了上面@jterrace提供的各种链接,但结果并不理想。这说明它值得展示结果。</p>
<p>给定一个包含“目录”和“文件”的bucket,很难在“文件系统”中找到感兴趣的项。我在代码中提供了一些注释
上面引用的代码是如何工作的。</p>
<p>在这两种情况下,我使用的是一个包含凭据的datalab笔记本。给定结果,我将需要使用字符串解析来确定哪些文件在特定目录中。如果有人知道如何扩展这些方法或其他方法来解析类似tf.gfile的目录,请回复。</p>
<h2>方法一</h2>
<pre><code>import sys
import json
import argparse
import googleapiclient.discovery
BUCKET = 'bucket-sounds'
def create_service():
return googleapiclient.discovery.build('storage', 'v1')
def list_bucket(bucket):
"""Returns a list of metadata of the objects within the given bucket."""
service = create_service()
# Create a request to objects.list to retrieve a list of objects.
fields_to_return = 'nextPageToken,items(name,size,contentType,metadata(my-key))'
#req = service.objects().list(bucket=bucket, fields=fields_to_return) # returns everything
#req = service.objects().list(bucket=bucket, fields=fields_to_return, prefix='UrbanSound') # returns everything. UrbanSound is top dir in bucket
#req = service.objects().list(bucket=bucket, fields=fields_to_return, prefix='UrbanSound/FREE') # returns the file FREESOUNDCREDITS.TXT
#req = service.objects().list(bucket=bucket, fields=fields_to_return, prefix='UrbanSound/FREESOUNDCREDITS.txt', delimiter='/') # same as above
#req = service.objects().list(bucket=bucket, fields=fields_to_return, prefix='UrbanSound/data/dog_bark', delimiter='/') # returns nothing
req = service.objects().list(bucket=bucket, fields=fields_to_return, prefix='UrbanSound/data/dog_bark/', delimiter='/') # returns files in dog_bark dir
all_objects = []
# If you have too many items to list in one request, list_next() will
# automatically handle paging with the pageToken.
while req:
resp = req.execute()
all_objects.extend(resp.get('items', []))
req = service.objects().list_next(req, resp)
return all_objects
# usage
print(json.dumps(list_bucket(BUCKET), indent=2))
</code></pre>
<p>这会产生如下结果:</p>
<pre><code>[
{
"contentType": "text/csv",
"name": "UrbanSound/data/dog_bark/100032.csv",
"size": "29"
},
{
"contentType": "application/json",
"name": "UrbanSound/data/dog_bark/100032.json",
"size": "1858"
} stuff snipped]
</code></pre>
<h2>方法二</h2>
<pre><code>import re
import sys
from google.cloud import storage
BUCKET = 'bucket-sounds'
# Create a Cloud Storage client.
gcs = storage.Client()
# Get the bucket that the file will be uploaded to.
bucket = gcs.get_bucket(BUCKET)
def my_list_bucket(bucket_name, limit=sys.maxsize):
a_bucket = gcs.lookup_bucket(bucket_name)
bucket_iterator = a_bucket.list_blobs()
for resource in bucket_iterator:
print(resource.name)
limit = limit - 1
if limit <= 0:
break
my_list_bucket(BUCKET, limit=5)
</code></pre>
<p>这会产生这样的输出。</p>
<pre><code>UrbanSound/FREESOUNDCREDITS.txt
UrbanSound/UrbanSound_README.txt
UrbanSound/data/air_conditioner/100852.csv
UrbanSound/data/air_conditioner/100852.json
UrbanSound/data/air_conditioner/100852.mp3
</code></pre>