Audio/video desync recording from RTSP & my experience dealing with it

Player701

n3wb
Joined
Jul 16, 2022
Messages
1
Reaction score
4
Location
Russia
Hello everyone! (Sorry if this is not the proper section for posting this, it's my first time on these forums. Please move to the appropriate section if necessary, thank you very much.)

I've recently encountered a very peculiar issue with one of my IP cameras, and would like to tell other people how I managed to deal with it, in case someone encounters a similar problem.

Synopsis: I have several IP cameras at home, and also a low-power server (really just an old PC), which continuously records footage from the cameras, storing it in 10 minute-long segments. Footage older than 24 hours is automatically deleted. Recording is done via FFmpeg, without re-encoding audio or video, which saves CPU resources. All cameras support RTSP, so the command-line to record from each camera is similar and takes the following form:

Code:
ffmpeg -nostdin -flags low_delay -fflags +nobuffer+discardcorrupt \
-rtsp_transport tcp -timeout 3000000 \
-i rtsp:/user:password@ip.ad.dre.ss:554/url \
-map 0:v -c:v copy \
-map 0:a -c:a copy \
-f segment -strftime 1 -reset_timestamps 1 -segment_atclocktime 1 -segment_time 600 "%Y-%m-%dT%H-%M-%S.mkv"
This used to work flawlessly until I got another camera, which is a different make and model than the rest. Previously, I only used Hikvision cameras, but this one came from AliExpress (China), and doesn't have a particular brand. From the device info page in the web UI it appears to be called "F8/IPG-9280PGS-AI".

After some hours of recording raw footage from this Chinese camera, I saw that the audio and the video were gradually going out of sync with each other, the lag between them constantly increasing and eventually reaching enormous levels (up to tens of minutes). Segments recorded at a later time had many minutes of black screen inserted in the beginning, with just the audio. Something was going terribly wrong, and I started to experiment in order to find the cause of the problem.

Eventually, it was discovered that the timestamps in the video stream from the camera exhibit a constant forward drift. E.g. suppose the video has 25 frames per second, and timestamps are measured in 1/1000ths of a second. Then, assuming that no frames are skipped, a smooth video will have a sequence of timestamps that looks like this:

Code:
0, 40, 80, 120, 160, 200, 240, 280, 320, ...

But the timestamps produced by this particular camera may look like this:

Code:
0, 40, 81, 121, 163, 203, 248, 328, ...

This drift doesn't correspond to the actual frame time, even when the frame rate stays constant. This is confirmed by one of my experiments, where I was recording video in segments for several hours without the audio. One of such recorded 10-minute segments allegedly was 10 minutes and 12 seconds long, but still had 15000 frames (as expected), and the camera clock ran for exactly 10 minutes. It is obvious that the timestamps are incorrect because during real-time recording, they will eventually appear to come from the future. Unfortunately, FFmpeg seems to have no means of detecting this. (If it actually does, please let me know!)

However, while recording with audio, the audio stream does not seem to exhibit any timestamp drift. Therefore, when matching video and audio timestamps for the output, a delay between audio and video is gradually introduced.

To mitigate this problem, I first tried to adjust the timestamps with the "setts" filter (see documentation here). But no matter what I did with them, the video still went out of sync with audio in the long run. I analyzed a few more recordings and found out that the drift offset can vary wildly: for example, during the course of several minutes, it can increase from a mere 2 to more than 1000 microseconds. The only solution I found was to not trust these timestamps at all and generate new ones instead. FFmpeg has an option called "-use_wallclock_as_timestamps" to do just that. However, with this particular camera, it also resulted in constant stuttering in the recorded video. I had to spend many more hours to find a way to fix it, and the "setts" filter actually helped me this time. The filter I had to apply looked like this:

Code:
setts='max(floor(PTS/40)*40,if(N,PREV_OUTPTS+40))'

This filter rounds down the timestamps to be multiples of 40, and then ensures that they are constantly increasing by choosing between the rounded value and the previous value plus 40, depending on which one is greater than the other. To ensure smooth playback, the filter has to be applied both to the video and the audio streams. Additionally, a constant frame rate has to be assumed for the input stream with the option "-r 25" because the camera will sometimes lower the FPS (usually just before switching to night mode), and this will cause the filters to produce weird results (e.g. playback speed fluctuating). And the resulting command-like looks like this:

Code:
ffmpeg -nostdin -flags low_delay -fflags +nobuffer+discardcorrupt \
-rtsp_transport tcp -timeout 3000000 \
-r 25 -use_wallclock_as_timestamps 1 -i rtsp:/user:password@ip.ad.dre.ss:554/url \
-map 0:v -c:v copy -bsf:v setts='max(floor(PTS/40)*40,if(N,PREV_OUTPTS+40))' \
-map 0:a -c:a copy -bsf:a setts='max(floor(PTS/40)*40,if(N,PREV_OUTPTS+40))' \
-f segment -strftime 1 -reset_timestamps 1 -segment_atclocktime 1 -segment_time 600 "%Y-%m-%dT%H-%M-%S.mkv"

The video recorded with these filters is smooth as butter, and appears to be always in perfect sync with audio. However, I'm still wondering whether this is a "proper" solution because even though it does work, it looks extremely complicated. Not to mention it took me several days of researching and analyzing the video files to implement a fix for something that at first looks like a minor problem. Also, the live feed in the camera's web UI does not exhibit any sync issues, and viewing the RTSP stream with VLC appears to be fine either (although I haven't run the latter for long enough).

I hope this information will be useful to anyone who might run into the same problem. In case someone has dealt with something like this before, I'd appreciate any extra comments.

Thank you very much.
 
Top