从网络下载媒体内容

从网络下载媒体内容是一个简单的过程：使用 Requests 或其他库并像下载 HTML 内容一样下载它。

准备工作

解决方案的 util 文件夹中的 urls.py mdoule 中有一个名为 URLUtility 的类。此类处理本章中的几个下载和解析 URL 的场景。我们将在本食谱和其他一些食谱中使用此类。确保模块文件夹位于您的 Python 路径中。此外，此配方的示例位于 04/01_download_image.py 文件中。

怎么做

以下是我们如何继续示例：

URLUtility 类可以从 URL 下载内容。示例文件中的代码如下：

import const
from util.urls import URLUtility

util = URLUtility(const.ApodEclipseImage())
print(len(util.data))

运行此命令时，您将看到以下输出：

Reading URL: https://apod.nasa.gov/apod/image/1709/BT5643s.jpg
Read 171014 bytes
171014

该示例读取 171014 字节的数据。

运行原理

URL 在 const 模块中定义为常量 const.ApodEclipseImage()：

def ApodEclipseImage():
    return "https://apod.nasa.gov/apod/image/1709/BT5643s.jpg"

URLUtility 类的构造函数具有以下实现：

def __init__(self, url, readNow=True):
    """ Construct the object, parse the URL, and download now ifspecified"""
    self._url = url
    self._response = None
    self._parsed = urlparse(url)
    if readNow:
        self.read()

构造函数存储 URL、解析它，并使用 read() 方法下载文件。以下是 read() 方法的代码：

def read(self):
    self._response = urllib.request.urlopen(self._url)
    self._data = self._response.read()

@property
def data(self):
    self.ensure_response()
    return self._data

然后，代码仅报告该数据的长度，值为 171014。

还有更多

此类将用于其他任务，例如确定这些文件的内容类型、文件名和扩展名。接下来我们将检查文件名 URL 的解析。