Code reading for m3u8, a library for manipulating HLS video format m3u8 files in Python

This article is from adventer Python Advent Calendar 2015 This is the article on the 14th day of. (Qiita also has another Python Advent Calender)

Recently, I used the m3u8 package because I touched the video related code for various reasons, so I will try to read the code for m3u8.

HTTP Live Streaming

HLS (HTTP Live Streaming) is a video streaming protocol. For normal videos such as mp4, one video is written in one file. HLS splits the video and plays it while downloading it. There are three files that are relevant when streaming video with HLS.

type extension Contents
ts file ts This is the actual video data. There are usually multiple.
m3u8 file m3u8 Holds video metadata such as the order in which video data should be played.
key nothing special This is the composite key when encrypting the ts file.

You may be able to specify another m3u8 file inside the m3u8 file (but I won't write that this time).

m3u8 files and m3u8 libraries

For example, a text file in this format.

#EXTM3U
#EXT-X-KEY:METHOD=AES-256,URI="http://example.com/keyfile",IV=000000000000000
#EXTINF:2,"aaa"
http://example.com/1.ts
#EXTINF:2,
http://example.com/2.ts
#EXTINF:2,
http://example.com/3.ts
#EXTINF:2,
http://example.com/4.ts

Of course, you want to generate it dynamically. For example, the key file is to generate an m3u8 file with the URL of the ts file as a temporary URL. The library that can be used in such cases is m3u8. This time we will code read this m3u8 library.

Github: https://github.com/globocom/m3u8 PyPI: https://pypi.python.org/pypi/m3u8

Installation procedure and usage (really just touching)

The usage is properly written in the README etc., so here I will only touch it.

Installation

::

$ pip install m3u8 

Enter normally.

It seems to depend on iso8601. https://github.com/globocom/m3u8/blob/master/requirements.txt#L1 iso8601 is a date and time notation standard.

How to use

I will import it for the time being.

>>> import m3u8

From now on, it is assumed that m3u8 has already been imported into the Python interactive shell.

File parsing

Suppose the m3u8 file above is in the current directory with the name playlist.m3u8.

$ cat playlist.m3u8 
#EXTM3U
#EXT-X-KEY:METHOD=AES-256,URI="http://example.com/keyfile",IV=000000000000000
#EXTINF:2,"aaa"
http://example.com/1.ts
#EXTINF:2,
http://example.com/2.ts
#EXTINF:2,
http://example.com/3.ts
#EXTINF:2,
http://example.com/4.ts
$

Try loading this file in an interactive shell.

>>> playlist = m3u8.load('./playlist.m3u8')
['METHOD=AES-256', 'URI="http://example.com/keyfile"', 'IV=000000000000000']
>>> playlist
<m3u8.model.M3U8 object at 0x1024b97f0>

The m3u8.model.M3U8 object is returned.

You can access each video URL in .segments.

>>> playlist.segments
[<m3u8.model.Segment object at 0x10292f3c8>, <m3u8.model.Segment object at 0x10292f400>, <m3u8.model.Segment object at 0x10292f4a8>, <m3u8.model.Segment object at 0x10292f518>]

You can change the read information and output it by manipulating the attributes.

Export to file

Let's write the playlist object to a file called output.m3u8.

>>> playlist.dump('./output.m3u8')

Check the output.

$ cat output.m3u8 
#EXTM3U
#EXT-X-KEY:METHOD=AES-256,URI="http://example.com/keyfile",IV=000000000000000
#EXTINF:2,"aaa"
http://example.com/1.ts
#EXTINF:2,
http://example.com/2.ts
#EXTINF:2,
http://example.com/3.ts
#EXTINF:2,
http://example.com/4.ts%                                                                                                                                                             $ 

It is being output.

Code reading

Overlooking the whole

For the time being, check the overall configuration of the repository.

$ tree
.
├── LICENSE
├── MANIFEST.in
├── README.rst
├── m3u8
│   ├── __init__.py
│   ├── model.py
│   ├── parser.py
│   └── protocol.py
├── requirements-dev.txt
├── requirements.txt
├── runtests
├── setup.py
└── tests
    ├── m3u8server.py
    ├── playlists
    │   ├── relative-playlist.m3u8
    │   └── simple-playlist.m3u8
    ├── playlists.py
    ├── test_loader.py
    ├── test_model.py
    ├── test_parser.py
    ├── test_strict_validations.py
    └── test_variant_m3u8.py

3 directories, 20 files

Hmmm, it seems that there are only 4 actual sources. Their size is like this.

$ wc -l m3u8/*.py
      75 m3u8/__init__.py
     674 m3u8/model.py
     261 m3u8/parser.py
      22 m3u8/protocol.py
    1032 total

It looks like a little over 1000 lines in all.

load()/loads()

Let's dig up the code from the load we used earlier.

m3u8.load() https://github.com/globocom/m3u8/blob/210db9c494c1b703ab7e169d3ae4ed488ec30eac/m3u8/init.py#L35-L43

    if is_url(uri):
        return _load_from_uri(uri)
    else:
        return _load_from_file(uri)

It seems that the process is branched by checking whether uri is url internally. The one who passed earlier is probably _load_from_file (uri). It seems that you can specify it with the URL, so I will try it.

>>> m3u8.load('https://gist.githubusercontent.com/TakesxiSximada/04189f4f191f55edae90/raw/1ecab692886508db0877c0f8531bd1f455f83795/m3u8%2520example')
<m3u8.model.M3U8 object at 0x1076b3ba8>

Oh, it looks like you can do it. It seems to be obtained by urlopen (). https://github.com/globocom/m3u8/blob/210db9c494c1b703ab7e169d3ae4ed488ec30eac/m3u8/init.py#L46-L53

By the way, there is also m3u8.loads (), so you can also create an M3U8 object from a string. https://github.com/globocom/m3u8/blob/210db9c494c1b703ab7e169d3ae4ed488ec30eac/m3u8/init.py#L28-L33

>>> playlist_str = '#EXTM3U\n#EXT-X-KEY:METHOD=AES-256,URI="http://example.com/keyfile",IV=000000000000000\n#EXTINF:2,"aaa"\nhttp://example.com/1.ts\n#EXTINF:2,\nhttp://example.com/2.ts\n#EXTINF:2,\nhttp://example.com/3.ts\n#EXTINF:2,\nhttp://example.com/4.ts'
>>> m3u8.loads(playlist_str)
<m3u8.model.M3U8 object at 0x1076cfef0>

m3u8.model.M3U8()

load () and loads () return M3U8 objects. https://github.com/globocom/m3u8/blob/210db9c494c1b703ab7e169d3ae4ed488ec30eac/m3u8/init.py#L33 https://github.com/globocom/m3u8/blob/210db9c494c1b703ab7e169d3ae4ed488ec30eac/m3u8/init.py#L53 https://github.com/globocom/m3u8/blob/210db9c494c1b703ab7e169d3ae4ed488ec30eac/m3u8/init.py#L74

The M3U8 class is defined in m3u8.model. https://github.com/globocom/m3u8/blob/210db9c494c1b703ab7e169d3ae4ed488ec30eac/m3u8/model.py#L19

The M3U8 class defines dump () and dumps (), which can be written to a file or string. It looks like a json module such as a function name, but it seems that you can not pass a file object. https://github.com/globocom/m3u8/blob/210db9c494c1b703ab7e169d3ae4ed488ec30eac/m3u8/model.py#L217-L271

dump () will create one if the directory doesn't exist. It's pretty good. https://github.com/globocom/m3u8/blob/210db9c494c1b703ab7e169d3ae4ed488ec30eac/m3u8/model.py#L260

An instance of M3U8 has the following attributes:

Attribute name Mold
key m3u8.model.Key
segments m3u8.model.SegmentList
media m3u8.model.MediaList
playlists m3u8.model.PlaylistList (Name w)
iframe_playlists m3u8.model.PlaylistList (This too...)

There are some things that I have a little trouble with naming, but these are the main ones.

In the dumps () process, we are doing something like writing if there is a value from these attributes. https://github.com/globocom/m3u8/blob/210db9c494c1b703ab7e169d3ae4ed488ec30eac/m3u8/model.py#L222-L254

        output = ['#EXTM3U']
        if self.is_independent_segments:
            output.append('#EXT-X-INDEPENDENT-SEGMENTS')
        if self.media_sequence > 0:
            output.append('#EXT-X-MEDIA-SEQUENCE:' + str(self.media_sequence))
        if self.allow_cache:
            output.append('#EXT-X-ALLOW-CACHE:' + self.allow_cache.upper())
        if self.version:
            output.append('#EXT-X-VERSION:' + self.version)
        if self.key:
            output.append(str(self.key))
        if self.target_duration:
            output.append('#EXT-X-TARGETDURATION:' + int_or_float_to_string(self.target_duration))
        if self.program_date_time is not None:
            output.append('#EXT-X-PROGRAM-DATE-TIME:' + parser.format_date_time(self.program_date_time))
        if not (self.playlist_type is None or self.playlist_type == ''):
            output.append(
                '#EXT-X-PLAYLIST-TYPE:%s' % str(self.playlist_type).upper())
        if self.is_i_frames_only:
            output.append('#EXT-X-I-FRAMES-ONLY')
        if self.is_variant:
            if self.media:
                output.append(str(self.media))
            output.append(str(self.playlists))
            if self.iframe_playlists:
                output.append(str(self.iframe_playlists))

Looking here, it seems that self.key and self.playlists can be converted to strings with str ().

m3u8.mdoel.Key()

This class manages the value of EXT-X-KEY in m3u8 format. (iv is the initial vector) For example, if the key is dynamically generated or switched, it will be reflected in the output m3u8 file by doing the following.

>>> m3uobj = m3u8.model.M3U8()
>>> print(m3uobj.dumps())
#EXTM3U

>>> key = m3u8.model.Key(method='AES-256', uri='http://example.com/key.bin', base_uri='', iv='0000000')
>>> str(key)
'#EXT-X-KEY:METHOD=AES-256,URI="http://example.com/key.bin",IV=0000000'
>>> m3uobj.key = key
>>> print(m3uobj.dumps())
#EXTM3U
#EXT-X-KEY:METHOD=AES-256,URI="http://example.com/key.bin",IV=0000000


m3u8.model.SegmentList()

https://github.com/globocom/m3u8/blob/210db9c494c1b703ab7e169d3ae4ed488ec30eac/m3u8/model.py#L386

The .segments of an instance of the M3U8 class is not a list, but a List Like object called m3u8.model.SegmentList. This class inherits from the list type.

>>> type(playlist.segments)
<class 'm3u8.model.SegmentList'>
>>> isinstance(playlist.segments, list)
True

Put m3u8.model.Segment () in the element.

m3u8.model.Segment()

https://github.com/globocom/m3u8/blob/210db9c494c1b703ab7e169d3ae4ed488ec30eac/m3u8/model.py#L310-L383

This class is used to specify the ts file. Specify uri or base_uri. You also need to pass the duration. This is the length to play the ts file in seconds (you can use a small number).

>>> m3uobj = m3u8.model.M3U8()
>>> m3uobj.segments.append(m3u8.model.Segment(uri='http://example.com/1.ts', base_uri='', duration=1))
>>> m3uobj.segments.append(m3u8.model.Segment(uri='http://example.com/2.ts', base_uri='', duration=1))
>>> print(m3uobj.dumps())
#EXTM3U
#EXTINF:1,
http://example.com/1.ts
#EXTINF:1,
http://example.com/2.ts

By the way, the duration passed in the constructor is a parameter that can be omitted in the constructor with duration = None, but if dumps () is omitted in the omitted state, TypeError will be raised. https://github.com/globocom/m3u8/blob/210db9c494c1b703ab7e169d3ae4ed488ec30eac/m3u8/model.py#L369

This is not a very good place.

What is base_uri?

So far we have passed base_uri to the constructor. Until now, I've been using it softly, wondering what it's for, so I'll look into it.

base_uri is used in m3u8.model.BasePathMixin.

This class has been mixed into Segment, Key, Playlist, IFramePlaylist, Media.

    @property
    def absolute_uri(self):
        if self.uri is None:
            return None
        if parser.is_url(self.uri):
            return self.uri
        else:
            if self.base_uri is None:
                raise ValueError('There can not be `absolute_uri` with no `base_uri` set')
            return _urijoin(self.base_uri, self.uri)

These have .absolute_uri properties. The behavior changes depending on whether self.uri is in the form of a URL. In the case of URL, self.uri is used as it is, and if it is not in URL format (file path format, etc.), self.base_uri and self.uri are combined to make the value absolute_uri.

Whether it is a URL or not is determined using m3u8.parser.is_url (). URLs that start with http: // or https: // are judged to be URLs.

def is_url(uri):
    return re.match(r'https?://', uri) is not None

Try the case where base_uri is used

Specify a non-URL value for uri and a URL-like value for base_uri for Segment.

>>> m3uobj = m3u8.model.M3U8()
>>> m3uobj.segments.append(m3u8.model.Segment(uri='1.ts', base_uri='http://example.com', duration=1))

Let's check the text generated by this code.

>>> print(m3uobj.dumps())
#EXTM3U
#EXTINF:1,
1.ts

... absolute_uri, not used. There was no place other than test that used this even with grep. I wonder if the API for the outside is positioned rather than the one used inside ....

Batch update of paths such as segment with base_path

In contrast to base_uri, base_path is a pretty usable child. This is also defined in m3u8.model.BasePathMixin. If you rewrite the value of the M3U8 object when you want to rewrite the uri such as segment, the values of the attributes such as segments and key that the M3U8 object has will also be rewritten.

As an example, try dumps () without giving base_path.

>>> m3uobj = m3u8.model.M3U8()
>>> m3uobj.segments.append(m3u8.model.Segment(uri='1.ts', base_uri='', duration=1))
>>> print(m3uobj.dumps())
#EXTM3U
#EXTINF:1,
1.ts

1.ts uses the uri passed when Segment () was generated. Next, put the URL in m3uobj.base_path and try dumps ().

>>> m3uobj.base_path = 'http://example.com'
>>> print(m3uobj.dumps())
#EXTM3U
#EXTINF:1,
http://example.com/1.ts

Then, it was output with base_path attached to 1.ts. base_path is implemented as property / setter, and when you set the value, the uri value is reset to the uri property with base_path attached.

https://github.com/globocom/m3u8/blob/master/m3u8/model.py#L290-L294

    @base_path.setter
    def base_path(self, newbase_path):
        if not self.base_path:
            self.uri = "%s/%s" % (newbase_path, self.uri)
        self.uri = self.uri.replace(self.base_path, newbase_path)

Summary

sorry. I'm exhausted. However, media and playlists have basically the same structure, and the difference is in what kind of value is generated. If you feel like it, I may update the continuation. ..

Tomorrow is @ininsanus. Regards Onacious !! http://www.adventar.org/calendars/846#list-2015-12-15

Recommended Posts

Code reading for m3u8, a library for manipulating HLS video format m3u8 files in Python
Code reading of Safe, a library for checking password strength in Python
Code reading of faker, a library that generates test data in Python
Automatically format Python code in Vim
uproot: Python / Numpy based library for reading and writing ROOT files
About psd-tools, a library that can process psd files in Python
[For beginners] How to register a library created in Python in PyPI
Type annotations for Python2 in stub files!
Automate jobs by manipulating files in Python
[DSU Edition] AtCoder Library reading with a green coder ~ Implementation in Python ~
I made a script in python to convert .md files to Scrapbox format
Get a token for conoha in python
A tool for easily entering Python code
Download files in any format using Python
Character code for reading and writing csv files with python ~ windows environment ver ~
[Python] Get the files in a folder with Python
Reading and writing CSV and JSON files in Python
Settings for Python coding in Visual Studio Code
[Python] Manipulating elements in a list (array) [Sort]
format in python
I want to write in Python! (1) Code format check
Specific sample code for working with SQLite3 in Python
A collection of code often used in personal Python
A memorandum when writing experimental code ~ Logging in python
VS Code settings for developing in Python with completion
Try searching for a million character profile in Python
[Introduction for beginners] Reading and writing Python CSV files
Recursively search for files and directories in Python and output
Expose settings.json for efficient Python coding in VS Code
Publish / upload a library created in Python to PyPI
Set a proxy for Python pip (described in pip.ini)
Python code for k-means method in super simple case
Can be used with AtCoder! A collection of techniques for drawing short code in Python!
Image format in Python
Manipulating video files (ffmpeg)
[Python] Reading CSV files
Insert Import statements needed for Python code completion in Neovim
Published a library that hides character data in Python images
Create a child account for connect with Stripe in Python
Building a development environment for Android apps-creating Android apps in Python
A simple way to avoid multiple for loops in Python
Developed a library to get Kindle collection list in Python
How to define multiple variables in a python for statement
Do something like a Python interpreter in Visual Studio Code
A sample for drawing points with PIL (Python Imaging Library).
Create code that outputs "A and pretending B" in python
Try building a neural network in Python without using a library
Library for specifying a name server and dig with python
A set of script files that do wordcloud in Python3
[Python] Create a screen for HTTP status code 403/404/500 with Django