[Python] Get one year's message history from Slack

For some reason, I got the message history from slack for about 1 year, so I will write how to implement it in Python. I reformatted the retrieved message to make it easier to analyze, but I can't publish it all because it's so bad. I would like to write an article again if there is a part that can be published.

Python has slackapi / python-slackclient, but I haven't used it this time. If you want to know how to implement using python-slackclient, I recommend reading other than this article.

environment

language

Main libraries used

Development assistance library

Implementation

Client

The token of slack is an instance variable so that it can be obtained from the environment variable or written directly in the main script. If you are using pipenv, it will automatically read .env, so the one set in the environment variable is the default value. It is an implementation that depends on my development environment, but I also made it compatible with cases where pipenv is not used (I do not want to set it in the environment variable). There is a method: BaseSlackMethod in the argument of the request function, but this is because in the case of slack, each API endpoint is called a method. I will explain the implementation of BaseSlackMethod later, but I made BaseSlackMethod a base class so that I can increase the number of classes for method. Doing so made the request parameters manageable in code. You can save the trouble of going to the reference one by one. You did it!

src/slack/client.py


import os
from dataclasses import dataclass
from typing import Any, ClassVar, Dict

import requests

from src.log import get_logger
from src.slack.exceptions import SlackRequestError
from src.slack.types import Headers
from src.slack.methods.base import BaseSlackMethod


SLACK_API_TOKEN = os.getenv("SLACK_API_TOKEN", "")

logger = get_logger(__name__)


@dataclass
class SlackClient:
    api_url: ClassVar[str] = "https://slack.com/api"
    token: str = SLACK_API_TOKEN

    def _get_headers(self, headers: Headers) -> Headers:
        """Get headers

        Args:
            headers (Headers)

        Returns:
            Headers
        """

        final_headers = {
            "Content-Type": "application/x-www-form-urlencoded;charset=utf-8",
        }

        if self.token:
            final_headers["Authorization"] = f"Bearer {self.token}"

        final_headers.update(headers)

        return final_headers

    def request(
        self, method: BaseSlackMethod, headers: Dict[str, Any] = None,
    ) -> Dict[Any, Any]:
        """API request to Slack

        Args:
            method (BaseSlackMethod)
            headers (Dict[str, Any], optional): Defaults to None.

        Raises:
            SlackRequestError
            err

        Returns:
            Dict[Any, Any]: response body
        """

        if not isinstance(headers, dict):
            headers = {}
        headers = self._get_headers(headers)

        url = f"{self.api_url}/{method.endpoint}"

        try:
            res = requests.get(url, headers=headers, params=method.params)

            if res.ok is False:
                raise SlackRequestError(res.text)
        except Exception as err:
            logger.error(err)
            logger.error("Data acquisition failure from slack")
            raise err
        else:
            logger.info("Data acquisition completed from slack")
            return res.json()

Message history method

The API method to get the message history is conversations.history. Read the reference for more information on request parameters. By dropping the parameters into the code as shown below, it is easier to understand the parameters that can be requested by method. The code can also be a good reference with the appropriate comments. For the time being, I will explain the important parameters for acquiring the history for one year. They are cursor and ʻoldest`. cursor is the next token to recursively get history. oldest specifies the start date and time of history as the general meaning. The point to note is that Unix Time stamp can be specified by oldest.

src/slack/methods/conversation.py


import os
from datetime import datetime
from dataclasses import dataclass, asdict
from typing import ClassVar, Optional

from src.slack.types import SlackParams


SLACK_CHANNEL_ID = os.getenv("SLACK_CHANNEL_ID", "")


@dataclass
class ConversationsHistory:
    endpoint: ClassVar[str] = "conversations.history"

    channel: str = SLACK_CHANNEL_ID
    cursor: Optional[str] = None
    inclusive: bool = False
    limit: int = 100
    latest: float = datetime.now().timestamp()
    oldest: float = 0

    @property
    def params(self) -> SlackParams:
        self_dict = asdict(self)

        if self.cursor is None:
            del self_dict["cursor"]

        return self_dict

: arrow_down_small: is the base class.

src/slack/methods/base.py


from dataclasses import dataclass, asdict
from typing import ClassVar

from src.slack.types import SlackParams


@dataclass
class BaseSlackMethod:
    endpoint: ClassVar[str] = ""

    @property
    def params(self) -> SlackParams:
        return asdict(self)

main script

I want to get the history for one year, so I use the formula datetime.now () --timedelta (days = 365) to calculate the date and time one year ago. Timedelta is convenient because you can calculate the date and time one year later by changing minus to plus. Thank you ~~: pray: This time, I adopted a simple while loop because I have to recursively get the history for another year. It's a crappy disposable script, so I didn't have to implement the if statement carefully to see if there was a next_cursor, but I didn't like ending with a KeyError, so I did that.

src/slack/__main__.py


from datetime import datetime, timedelta

from src.utils import save_to_file
from src.slack.client import SlackClient
from src.slack.methods.conversation import ConversationsHistory


def main() -> None:
    tmp_oldest = datetime.now() - timedelta(days=365)
    oldest = tmp_oldest.timestamp()
    method = ConversationsHistory(inclusive=True, oldest=oldest)
    client = SlackClient()

    count = 1

    while True:
        res = client.request(method)
        save_to_file(res, f"outputs/tests/sample{count}.json")

        if (
            "response_metadata" in res
            and "next_cursor" in res["response_metadata"]
        ):
            method.cursor = res["response_metadata"]["next_cursor"]
            count += 1
        else:
            break


if __name__ == "__main__":
    main()

At the end

When I tried to get the history of one channel for one year, more than 2000 lines of one file were created about 200 files. Terrifying: scream:

Reference

Recommended Posts

[Python] Get one year's message history from Slack
Get metric history from MLflow in Python
Send a message from Slack to a Python server
Post from Python to Slack
Get message from first offset with kafka consumer in python
Get celebrity tweet history from twitter
Get one column from DataFrame with DataFrame
Get upcoming weather from python weather api
Post a message from IBM Cloud Functions to Slack in Python
Get html from element with Python selenium
Get exchange rates from open exchange rates in Python
[Note] Get data from PostgreSQL with Python
Get keystrokes from / dev / input (python evdev)
Get battery level from SwitchBot in Python
Get Precipitation Probability from XML in Python
[Python] Get the main color from the screenshot
Create a setting in terraform to send a message from AWS Lambda Python3.8 to Slack
Get time series data from k-db.com in Python
Get the contents of git diff from python
[Bash] Use here-documents to get python power from bash
Get BTC / JPY board information from Python --bitflyer
Get only articles from web pages in Python
[Python] Throw a message to the slack channel
sql from python
MeCab from Python
POST messages from python to Slack via incoming webhook
[Python] Get the text of the law from the e-GOV Law API
Get data from GPS module at 10Hz in Python
Get the return code of the Python script from bat
Get mail from Gmail and label it with Python3
Get files from Linux using paramiko and scp [Python]
Python hand play (get column names from CSV file)
Get data from database via ODBC with Python (Access)