This article is from adventer Python Advent Calendar 2015 This is the article on the 14th day of. (Qiita also has another Python Advent Calender)
Recently, I used the m3u8 package because I touched the video related code for various reasons, so I will try to read the code for m3u8.
HTTP Live Streaming
HLS (HTTP Live Streaming) is a video streaming protocol. For normal videos such as mp4, one video is written in one file. HLS splits the video and plays it while downloading it. There are three files that are relevant when streaming video with HLS.
type | extension | Contents |
---|---|---|
ts file | ts | This is the actual video data. There are usually multiple. |
m3u8 file | m3u8 | Holds video metadata such as the order in which video data should be played. |
key | nothing special | This is the composite key when encrypting the ts file. |
You may be able to specify another m3u8 file inside the m3u8 file (but I won't write that this time).
For example, a text file in this format.
#EXTM3U
#EXT-X-KEY:METHOD=AES-256,URI="http://example.com/keyfile",IV=000000000000000
#EXTINF:2,"aaa"
http://example.com/1.ts
#EXTINF:2,
http://example.com/2.ts
#EXTINF:2,
http://example.com/3.ts
#EXTINF:2,
http://example.com/4.ts
Of course, you want to generate it dynamically. For example, the key file is to generate an m3u8 file with the URL of the ts file as a temporary URL. The library that can be used in such cases is m3u8. This time we will code read this m3u8 library.
Github: https://github.com/globocom/m3u8 PyPI: https://pypi.python.org/pypi/m3u8
The usage is properly written in the README etc., so here I will only touch it.
::
$ pip install m3u8
Enter normally.
It seems to depend on iso8601. https://github.com/globocom/m3u8/blob/master/requirements.txt#L1 iso8601 is a date and time notation standard.
I will import it for the time being.
>>> import m3u8
From now on, it is assumed that m3u8 has already been imported into the Python interactive shell.
Suppose the m3u8 file above is in the current directory with the name playlist.m3u8.
$ cat playlist.m3u8
#EXTM3U
#EXT-X-KEY:METHOD=AES-256,URI="http://example.com/keyfile",IV=000000000000000
#EXTINF:2,"aaa"
http://example.com/1.ts
#EXTINF:2,
http://example.com/2.ts
#EXTINF:2,
http://example.com/3.ts
#EXTINF:2,
http://example.com/4.ts
$
Try loading this file in an interactive shell.
>>> playlist = m3u8.load('./playlist.m3u8')
['METHOD=AES-256', 'URI="http://example.com/keyfile"', 'IV=000000000000000']
>>> playlist
<m3u8.model.M3U8 object at 0x1024b97f0>
The m3u8.model.M3U8 object is returned.
You can access each video URL in .segments.
>>> playlist.segments
[<m3u8.model.Segment object at 0x10292f3c8>, <m3u8.model.Segment object at 0x10292f400>, <m3u8.model.Segment object at 0x10292f4a8>, <m3u8.model.Segment object at 0x10292f518>]
You can change the read information and output it by manipulating the attributes.
Let's write the playlist object to a file called output.m3u8.
>>> playlist.dump('./output.m3u8')
Check the output.
$ cat output.m3u8
#EXTM3U
#EXT-X-KEY:METHOD=AES-256,URI="http://example.com/keyfile",IV=000000000000000
#EXTINF:2,"aaa"
http://example.com/1.ts
#EXTINF:2,
http://example.com/2.ts
#EXTINF:2,
http://example.com/3.ts
#EXTINF:2,
http://example.com/4.ts% $
It is being output.
For the time being, check the overall configuration of the repository.
$ tree
.
├── LICENSE
├── MANIFEST.in
├── README.rst
├── m3u8
│ ├── __init__.py
│ ├── model.py
│ ├── parser.py
│ └── protocol.py
├── requirements-dev.txt
├── requirements.txt
├── runtests
├── setup.py
└── tests
├── m3u8server.py
├── playlists
│ ├── relative-playlist.m3u8
│ └── simple-playlist.m3u8
├── playlists.py
├── test_loader.py
├── test_model.py
├── test_parser.py
├── test_strict_validations.py
└── test_variant_m3u8.py
3 directories, 20 files
Hmmm, it seems that there are only 4 actual sources. Their size is like this.
$ wc -l m3u8/*.py
75 m3u8/__init__.py
674 m3u8/model.py
261 m3u8/parser.py
22 m3u8/protocol.py
1032 total
It looks like a little over 1000 lines in all.
load()/loads()
Let's dig up the code from the load we used earlier.
m3u8.load() https://github.com/globocom/m3u8/blob/210db9c494c1b703ab7e169d3ae4ed488ec30eac/m3u8/init.py#L35-L43
if is_url(uri):
return _load_from_uri(uri)
else:
return _load_from_file(uri)
It seems that the process is branched by checking whether uri is url internally. The one who passed earlier is probably _load_from_file (uri). It seems that you can specify it with the URL, so I will try it.
>>> m3u8.load('https://gist.githubusercontent.com/TakesxiSximada/04189f4f191f55edae90/raw/1ecab692886508db0877c0f8531bd1f455f83795/m3u8%2520example')
<m3u8.model.M3U8 object at 0x1076b3ba8>
Oh, it looks like you can do it. It seems to be obtained by urlopen (). https://github.com/globocom/m3u8/blob/210db9c494c1b703ab7e169d3ae4ed488ec30eac/m3u8/init.py#L46-L53
By the way, there is also m3u8.loads (), so you can also create an M3U8 object from a string. https://github.com/globocom/m3u8/blob/210db9c494c1b703ab7e169d3ae4ed488ec30eac/m3u8/init.py#L28-L33
>>> playlist_str = '#EXTM3U\n#EXT-X-KEY:METHOD=AES-256,URI="http://example.com/keyfile",IV=000000000000000\n#EXTINF:2,"aaa"\nhttp://example.com/1.ts\n#EXTINF:2,\nhttp://example.com/2.ts\n#EXTINF:2,\nhttp://example.com/3.ts\n#EXTINF:2,\nhttp://example.com/4.ts'
>>> m3u8.loads(playlist_str)
<m3u8.model.M3U8 object at 0x1076cfef0>
m3u8.model.M3U8()
load () and loads () return M3U8 objects. https://github.com/globocom/m3u8/blob/210db9c494c1b703ab7e169d3ae4ed488ec30eac/m3u8/init.py#L33 https://github.com/globocom/m3u8/blob/210db9c494c1b703ab7e169d3ae4ed488ec30eac/m3u8/init.py#L53 https://github.com/globocom/m3u8/blob/210db9c494c1b703ab7e169d3ae4ed488ec30eac/m3u8/init.py#L74
The M3U8 class is defined in m3u8.model. https://github.com/globocom/m3u8/blob/210db9c494c1b703ab7e169d3ae4ed488ec30eac/m3u8/model.py#L19
The M3U8 class defines dump () and dumps (), which can be written to a file or string. It looks like a json module such as a function name, but it seems that you can not pass a file object. https://github.com/globocom/m3u8/blob/210db9c494c1b703ab7e169d3ae4ed488ec30eac/m3u8/model.py#L217-L271
dump () will create one if the directory doesn't exist. It's pretty good. https://github.com/globocom/m3u8/blob/210db9c494c1b703ab7e169d3ae4ed488ec30eac/m3u8/model.py#L260
An instance of M3U8 has the following attributes:
Attribute name | Mold |
---|---|
key | m3u8.model.Key |
segments | m3u8.model.SegmentList |
media | m3u8.model.MediaList |
playlists | m3u8.model.PlaylistList (Name w) |
iframe_playlists | m3u8.model.PlaylistList (This too...) |
There are some things that I have a little trouble with naming, but these are the main ones.
In the dumps () process, we are doing something like writing if there is a value from these attributes. https://github.com/globocom/m3u8/blob/210db9c494c1b703ab7e169d3ae4ed488ec30eac/m3u8/model.py#L222-L254
output = ['#EXTM3U']
if self.is_independent_segments:
output.append('#EXT-X-INDEPENDENT-SEGMENTS')
if self.media_sequence > 0:
output.append('#EXT-X-MEDIA-SEQUENCE:' + str(self.media_sequence))
if self.allow_cache:
output.append('#EXT-X-ALLOW-CACHE:' + self.allow_cache.upper())
if self.version:
output.append('#EXT-X-VERSION:' + self.version)
if self.key:
output.append(str(self.key))
if self.target_duration:
output.append('#EXT-X-TARGETDURATION:' + int_or_float_to_string(self.target_duration))
if self.program_date_time is not None:
output.append('#EXT-X-PROGRAM-DATE-TIME:' + parser.format_date_time(self.program_date_time))
if not (self.playlist_type is None or self.playlist_type == ''):
output.append(
'#EXT-X-PLAYLIST-TYPE:%s' % str(self.playlist_type).upper())
if self.is_i_frames_only:
output.append('#EXT-X-I-FRAMES-ONLY')
if self.is_variant:
if self.media:
output.append(str(self.media))
output.append(str(self.playlists))
if self.iframe_playlists:
output.append(str(self.iframe_playlists))
Looking here, it seems that self.key and self.playlists can be converted to strings with str ().
m3u8.mdoel.Key()
This class manages the value of EXT-X-KEY in m3u8 format. (iv is the initial vector) For example, if the key is dynamically generated or switched, it will be reflected in the output m3u8 file by doing the following.
>>> m3uobj = m3u8.model.M3U8()
>>> print(m3uobj.dumps())
#EXTM3U
>>> key = m3u8.model.Key(method='AES-256', uri='http://example.com/key.bin', base_uri='', iv='0000000')
>>> str(key)
'#EXT-X-KEY:METHOD=AES-256,URI="http://example.com/key.bin",IV=0000000'
>>> m3uobj.key = key
>>> print(m3uobj.dumps())
#EXTM3U
#EXT-X-KEY:METHOD=AES-256,URI="http://example.com/key.bin",IV=0000000
m3u8.model.SegmentList()
https://github.com/globocom/m3u8/blob/210db9c494c1b703ab7e169d3ae4ed488ec30eac/m3u8/model.py#L386
The .segments of an instance of the M3U8 class is not a list, but a List Like object called m3u8.model.SegmentList. This class inherits from the list type.
>>> type(playlist.segments)
<class 'm3u8.model.SegmentList'>
>>> isinstance(playlist.segments, list)
True
Put m3u8.model.Segment () in the element.
m3u8.model.Segment()
https://github.com/globocom/m3u8/blob/210db9c494c1b703ab7e169d3ae4ed488ec30eac/m3u8/model.py#L310-L383
This class is used to specify the ts file. Specify uri or base_uri. You also need to pass the duration. This is the length to play the ts file in seconds (you can use a small number).
>>> m3uobj = m3u8.model.M3U8()
>>> m3uobj.segments.append(m3u8.model.Segment(uri='http://example.com/1.ts', base_uri='', duration=1))
>>> m3uobj.segments.append(m3u8.model.Segment(uri='http://example.com/2.ts', base_uri='', duration=1))
>>> print(m3uobj.dumps())
#EXTM3U
#EXTINF:1,
http://example.com/1.ts
#EXTINF:1,
http://example.com/2.ts
By the way, the duration passed in the constructor is a parameter that can be omitted in the constructor with duration = None, but if dumps () is omitted in the omitted state, TypeError will be raised. https://github.com/globocom/m3u8/blob/210db9c494c1b703ab7e169d3ae4ed488ec30eac/m3u8/model.py#L369
This is not a very good place.
So far we have passed base_uri to the constructor. Until now, I've been using it softly, wondering what it's for, so I'll look into it.
base_uri is used in m3u8.model.BasePathMixin.
This class has been mixed into Segment, Key, Playlist, IFramePlaylist, Media.
@property
def absolute_uri(self):
if self.uri is None:
return None
if parser.is_url(self.uri):
return self.uri
else:
if self.base_uri is None:
raise ValueError('There can not be `absolute_uri` with no `base_uri` set')
return _urijoin(self.base_uri, self.uri)
These have .absolute_uri properties. The behavior changes depending on whether self.uri is in the form of a URL. In the case of URL, self.uri is used as it is, and if it is not in URL format (file path format, etc.), self.base_uri and self.uri are combined to make the value absolute_uri.
Whether it is a URL or not is determined using m3u8.parser.is_url (). URLs that start with http: // or https: // are judged to be URLs.
def is_url(uri):
return re.match(r'https?://', uri) is not None
Specify a non-URL value for uri and a URL-like value for base_uri for Segment.
>>> m3uobj = m3u8.model.M3U8()
>>> m3uobj.segments.append(m3u8.model.Segment(uri='1.ts', base_uri='http://example.com', duration=1))
Let's check the text generated by this code.
>>> print(m3uobj.dumps())
#EXTM3U
#EXTINF:1,
1.ts
... absolute_uri, not used. There was no place other than test that used this even with grep. I wonder if the API for the outside is positioned rather than the one used inside ....
In contrast to base_uri, base_path is a pretty usable child. This is also defined in m3u8.model.BasePathMixin. If you rewrite the value of the M3U8 object when you want to rewrite the uri such as segment, the values of the attributes such as segments and key that the M3U8 object has will also be rewritten.
As an example, try dumps () without giving base_path.
>>> m3uobj = m3u8.model.M3U8()
>>> m3uobj.segments.append(m3u8.model.Segment(uri='1.ts', base_uri='', duration=1))
>>> print(m3uobj.dumps())
#EXTM3U
#EXTINF:1,
1.ts
1.ts uses the uri passed when Segment () was generated. Next, put the URL in m3uobj.base_path and try dumps ().
>>> m3uobj.base_path = 'http://example.com'
>>> print(m3uobj.dumps())
#EXTM3U
#EXTINF:1,
http://example.com/1.ts
Then, it was output with base_path attached to 1.ts. base_path is implemented as property / setter, and when you set the value, the uri value is reset to the uri property with base_path attached.
https://github.com/globocom/m3u8/blob/master/m3u8/model.py#L290-L294
@base_path.setter
def base_path(self, newbase_path):
if not self.base_path:
self.uri = "%s/%s" % (newbase_path, self.uri)
self.uri = self.uri.replace(self.base_path, newbase_path)
sorry. I'm exhausted. However, media and playlists have basically the same structure, and the difference is in what kind of value is generated. If you feel like it, I may update the continuation. ..
Tomorrow is @ininsanus. Regards Onacious !! http://www.adventar.org/calendars/846#list-2015-12-15
Recommended Posts