Introduction

In the previous article, Smart meter measurement logger with Worker Thread design pattern, long-term operation is performed by restarting the object (thread) responsible for measurement. It was an approach that made it possible. However, it is a last resort, and it is assumed that the threads are stable. Also, even if the thread is running, it will not be a countermeasure if it is blocked by some processing.

In particular, when communicating with a smart meter via route B, you need to take several steps such as scanning and authentication, and some steps may fail or time out. You have to think about processing such as retrying or resetting and starting over, but the things to do will change depending on the situation, so if you do not organize and implement it well, you will wait forever. You are more likely to continue or receive unexpected data and crash.

This time, I will introduce the communication with the smart meter of route B because I tried to implement it by managing the state transition.

From Source Code keilog / broute.py

Write a state transition diagram

First of all, it is necessary to properly organize what to do in what situation and then program it in an easy-to-understand manner. In the run () thread loop of the BrouteReader class, processing is roughly executed with the following state transitions. The blue square represents the state, and depending on the execution result of the function called in that state, retries or state transitions are made. This is not always the correct answer, and other methods are possible, but it works well for the purpose of obtaining instantaneous power and integrated power.

Broute状態遷移図.001.jpeg

Implementation method

The code that implements this is as follows. (Actually a little more complicated)

class BrouteReader ( Worker ):
    
    # <abridgement>

    def run( self ):
        while not self.stopEvent.is_set():
            if self.state == self._STATE_INIT:
                self._open()

            elif self.state == self._STATE_OPEN:
                self._setup()

            elif self.state == self._STATE_SETUP:
                self._scan()

            elif self.state == self._STATE_SCAN:
                self._join()

            elif self.state == self._STATE_JOIN:
                if time_has_come(): #Actually compare the last update time with the current time
                    self._sendto('Property value request message')
                dataframe = self._receive() #Receive telegram receive()Times out in 1 second
                if dataframe:
                    self._accept(dataframe) #Process the received telegram

                if long_time_has_gone():
                    logger.error('Telegram hasn't come for a long time')
                    self._term()
                    self._close()
                    self.state = self._STATE_INIT
                    time.sleep(5)

        self._term() #Disconnect smart meter
        self._close() #Opening WiSun devices
        logger.info('[STOP]')

As introduced in the previous Article, inside run (), the loop is repeated unless stopEvent is set.

Inside the loop, the code that belongs to one of the states is executed depending on the current state represented by self.state. Here, only the function is called, but the necessary processing is performed in the function and self.state is rewritten to update the state. Then, in the next loop, the code in the next state of the transition will be executed. This is the basic flow. Retry processing is also defined in scan () and join (). Cases tend to be long depending on the condition, so I think it is important to keep it as simple and clear as possible.

I will explain the flow of operation a little. It seems that it takes about 1 second for the smart meter to respond to the telegram sent by sendto (). If time_has_come () is set in a loop and a message requesting the property value of the smart meter is sent, the response to the request message does not arrive immediately, so it is processed in the next or next loop instead of in that loop. Will do. Therefore, the response or notification message received by receive does not always match that of the request sent immediately before, and you cannot tell what is in it without looking inside. accept () needs to be supported so that any telegram can be processed appropriately.

Also, keep in mind that every function must be defined to time out. Otherwise, the loop may be blocked there and you may not be able to proceed. One of the roots of the block is serial.readline (). It is necessary to set a timeout (about 1 second) for this function and program the subsequent processing on that assumption. Then, set a timeout so that even if it is a retry loop or a process that just waits for "OK", it will not wait forever.

Below are examples of open () and setup ().

    def _open( self ):
        if self.wisundev.open():
            self.state = self._STATE_OPEN
        else:
            #In case of error, stop for 5 seconds. Fast infinite loops no longer occupy the CPU. (Same below)
            time.sleep(5)

    def _setup( self ):
        #reset
        if self.wisundev.reset():
            pass
        else:
            time.sleep(5)
            return False
 
        #B Set root ID and password in device (registered in register)
        if self.wisundev.setup( self.broute_id, self.broute_pwd ):
            self.state = self._STATE_SETUP
        else:
            time.sleep(5)

In the function, the function with the same name of the WiSun device driver is called, but when the execution is successful, the state transition is performed at the same time. Whenever an error occurs, sleep before exiting the function. Otherwise, it will continue to retry at high speed when the device is disconnected, and will continue to log errors, occupying the CPU. Similarly, be aware that if you set the serial.readline () timeout mentioned above too short, the loop may run faster.

To improve stability

By managing state transitions and controlling execution, I think it will be clearer what to do and less likely to make mistakes in logic. And even if a problem occurs, the code will be easy to fix and fix. It is still necessary to improve the accuracy of each function for stable operation, but since it is easy to isolate the problem, I think that we can accurately pinpoint the countermeasures.

In terms of accuracy, it is necessary to take sufficient measures against unintended events and processing when a telegram is received. If you specify the index of the array in a fixed manner without considering the possibility of other events or telegrams, the risk of error increases. Also, since most of the data to be processed is a hexadecimal HEX character string, we check whether it is correct as a hexadecimal character string. However, depending on the environment, this kind of error does not seem to occur very often.

On the other hand, it is relatively common for communication to be lost. At the end of the above code run (), if the message is not received for a certain period of time, it returns to INIT to deal with this problem. The PANA session has an expiration date and requires regular re-authentication to refresh the key, but the RL7023 used here has the ability to automatically re-authenticate. Looking at the log, there was evidence of that, and re-authentication was successful, but there were cases where communication was interrupted after that. Not limited to this, it is highly possible that communication will be interrupted for some reason, so this measure is essential. I don't know what about other WiSun devices, but if you need to reauthenticate yourself, you can do a regular rejoin ().

in conclusion

This time, when I created a program that manages state transitions, I found that there are various points such as the need for an appropriate timeout and attention to the high-speed retry infinite loop. If a program error or error causes a burst of radio waves to be transmitted, it will cause trouble to surrounding devices using the same band, and in the worst case, it may violate the Radio Law. There is. I thought I had to be careful not to emit radio waves carelessly. I would appreciate it if you could point out any other points to note.

Then I was unexpectedly worried about naming the state. After all, it is the same as the function name, but please let me know if there is a good nomenclature.

reference

-Worker Thread design pattern smart meter measurement logger