Experimentation with snapshot-based video compression

bp2008

Staff member
Mar 10, 2014
12,795
14,326
USA
I warn you, this is a very nerdy post with little practical purpose.

I recently got this idea in my head that I could create a simple video compression algorithm using JPEG compression, but more efficiently than Motion JPEG (MJPEG) which is found on many IP cameras (especially older ones). MJPEG is one of the least efficient encodings ever devised in terms of bandwidth usage. I figured I could improve upon it with relatively little effort, and I was right.

See, MJPEG does not utilize the previous video frame when it is encoding the next one. This is the first and most obvious thing that a video encoding algorithm can do to save space. In most videos, there is relatively little difference from one frame to the next, and if you encode only the changes between frames then you will save space.

So that is what I set out to do. I have an encoding function that works a lot like any other encoding function. You feed it a video frame from your source, and it gives you back a blob of data that the decoding function can convert back into the original video frame (with some quality loss). What my encoding function does is subtract the pixels of the previous frame from the pixels of the new frame and this results in what I call a "diff frame". This diff frame is actually an image that can be compressed as a jpeg image just like any other. Except this diff frame is a relatively low contrast image so it compresses a lot better than the original.

My encoding routine is actually broken right now, but I'm still seeing file size / bandwidth savings of up to about 40% compared to simply recompressing the original frames with the same JPEG compression ratio that I use to compress the "diff frames".

Original frame:
527f57tm.jpg


Broken diff frame: (this should be a mostly flat gray image, but something is not working quite right)
dLgBDS8m.jpg


If my encoding was working correctly, then a diff frame like that shown above would indicate that all the bright parts of the image got brighter and all the dark parts of an image got darker. A very rare occurrence in the real world...

I suspect if I can figure out what I did wrong and get the diff frames to look the way they are supposed to, then this algorithm would result in space savings closer to 60% if not higher.
 
so you re-invented MPEG? :laugh:

props for the effort; nerdy indeed.. why not just pipe the MJPEG through oah say mplayer or vlc and have it transcode it into h264 in real time?
 
Sounds like mpeg, but, I think it is way cool one can do this type of stuff. Practical experimentation & learning. It is hard to invent something completely new, but may evolve into something practical.
 
This is really nothing like MPEG-1 at least. It is much simpler in concept and implementation, and not very fast (when implemented using .NET Bitmaps anyway).

I fixed my bug, so now diff frames look like this when there is no motion:
7u5c1w2m.jpg


And they compress a lot better. Now my compressed "diff frames" are shaving 70-85% on average off the file size compared to normal MJPEG. This is much more worthwhile, even if it doesn't come close to the savings of a true video compression algorithm like VP8 or h264 which are likely to compress frames like these to less than 10 KB each.

There is quality loss; it is about like having each frame compressed twice using a lossy jpeg encoding. Not bad looking, all things considered. I may put together a demo app later.

Here in this spoiler block are some of the file sizes I'm seeing (in bytes). The "Source" column on the left is the size of the source frame as it arrived at the encoder. The "Diff Frame" column on the right is the size of the diff frame (JPEG compressed). The "Reference" column exists to show how large each frame would be if I was to simply re-compress the "Source" frame using the same JPEG compression level used to create the "Diff Frame". So you could consider the "Reference" column to be the size of normal MJPEG.

Code:
Time: "Source" "Reference" "Diff Frame"
...
8:36 AM:  696153-> 527704-> 116793
8:36 AM:  695162-> 527469-> 110076
8:36 AM:  694620-> 367199->  48062
8:36 AM:  694018-> 367115->  52244
8:36 AM:  697244-> 368843->  78080
8:36 AM:  696200-> 368735->  62073
8:36 AM:  695347-> 368732->  59379
8:36 AM:  694482-> 368633->  59165
8:36 AM:  694247-> 368562->  58114
8:36 AM:  695995-> 367634->  82667
8:36 AM:  695362-> 368218->  64245
8:36 AM:  694423-> 368284->  61514
8:36 AM:  694144-> 368413->  59342
8:36 AM:  693808-> 368365->  59778
8:36 AM:  697239-> 368750->  83988
8:36 AM:  695923-> 368702->  64613
8:36 AM:  694918-> 368642->  59996
8:36 AM:  694170-> 368535->  59866
8:36 AM:  693830-> 368571->  59107
8:36 AM:  696013-> 367922->  85795
8:36 AM:  695017-> 368103->  64126
8:36 AM:  694300-> 368211->  61668
8:36 AM:  693705-> 368272->  61323
8:36 AM:  693208-> 368144->  60964
8:36 AM:  698152-> 369154->  85841
8:36 AM:  696312-> 368908->  63868
8:36 AM:  695112-> 368630->  61527
8:36 AM:  694245-> 368529->  61585
8:36 AM:  693893-> 368531->  60293
8:36 AM:  697367-> 369177->  86001
8:36 AM:  695678-> 368892->  63365
8:36 AM:  694795-> 368816->  62326
8:36 AM:  694161-> 368624->  62443
8:36 AM:  697631-> 369031->  86352
8:36 AM:  696541-> 369134->  65053
8:36 AM:  695218-> 368923->  63152
8:36 AM:  694735-> 368763->  61478
8:36 AM:  694551-> 368949->  61761
8:36 AM:  697202-> 368468->  86718
8:36 AM:  696228-> 368759->  65622
8:36 AM:  694882-> 368563->  63012
8:36 AM:  694015-> 368429->  61975
8:36 AM:  694060-> 368602->  60999
8:36 AM:  697641-> 368971->  86795
8:36 AM:  695255-> 368774->  66338
8:36 AM:  694545-> 368779->  64062
8:36 AM:  694099-> 368756->  62002
8:36 AM:  697371-> 589371->  86733
8:36 AM:  696497-> 588242-> 193320
8:36 AM:  694944-> 563270-> 107336
8:36 AM:  694437-> 563120-> 105294
8:36 AM:  693928-> 562851-> 102118
8:37 AM:  696645-> 564780-> 165563
8:37 AM:  695192-> 563709-> 115976
8:37 AM:  694378-> 563250-> 108614
8:37 AM:  693972-> 562996-> 107113
8:37 AM:  691790-> 561342-> 141195
8:37 AM:  696070-> 564208-> 130983
8:37 AM:  694942-> 563531-> 105755
8:37 AM:  694090-> 563064-> 105845
8:37 AM:  693729-> 562923-> 105532
8:37 AM:  697501-> 565404-> 165610
8:37 AM:  696571-> 564472-> 111243
8:37 AM:  695457-> 563640-> 106557
8:37 AM:  694801-> 563497-> 104494
8:37 AM:  694182-> 563081->  98494
8:37 AM:  697548-> 589487-> 164883
8:37 AM:  695775-> 587732-> 135467
8:37 AM:  694641-> 563438-> 123025
8:37 AM:  693942-> 562763-> 102649
8:37 AM:  693699-> 562897->  94557
8:37 AM:  697343-> 565221-> 165009
8:37 AM:  695422-> 563786-> 113209
8:37 AM:  694338-> 699741-> 346689
8:37 AM:  693957-> 693992-> 196020
8:37 AM:  693342-> 693150-> 228977
8:37 AM:  696888-> 696879-> 270632
8:37 AM:  695784-> 695757-> 195052
8:37 AM:  694492-> 694424-> 197202
8:37 AM:  694095-> 694073-> 190958
8:37 AM:  695710-> 695667-> 314932
8:37 AM:  695660-> 695631-> 200996
8:37 AM:  695013-> 694917-> 198063
8:37 AM:  694188-> 694163-> 195649
8:37 AM:  693607-> 693559-> 189106
8:37 AM:  695832-> 695830-> 315534
8:37 AM:  694736-> 694805-> 212137
8:37 AM:  694056-> 694071-> 196254
8:37 AM:  693358-> 693276-> 194212
8:37 AM:  693139-> 693103-> 183534
8:37 AM:  696447-> 696473-> 314207
8:37 AM:  694898-> 694817-> 216005
8:37 AM:  694069-> 694057-> 196709
8:37 AM:  693436-> 693406-> 197353
8:37 AM:  697138-> 697083-> 315002
8:37 AM:  696391-> 696291-> 206033
8:37 AM:  694847-> 694858-> 198489
8:37 AM:  694289-> 694267-> 195170
8:37 AM:  693608-> 693582-> 186272
8:37 AM:  697119-> 697024-> 313009
8:37 AM:  695624-> 695607-> 205500
8:37 AM:  694446-> 694429-> 198363
8:37 AM:  693494-> 693542-> 194121
8:37 AM:  693284-> 693256-> 183107

Throughout execution I changed the compression level a few times. It starts at about 40% compression quality, then goes to 60%, then 80% at the end. It should also be noted that any motion in the frame would greatly increase the size of the diff frame to the point it could actually end up larger than the source frame. So if this format was being streamed over a slow network, frame rate would suffer whenever there was motion.
 
I have experimented further with this technique in the last month by integrating it into my UI2 page for Blue Iris to handle streaming video. I wrote a decoder in Javascript that uses HTML5 canvas elements to manipulate the raw image data. It succeeded in reducing bandwidth usage (by perhaps 50% in the best case), but unfortunately there are many problems which will prevent me from releasing it as a UI2 feature.

  • To perform this special encoding, a special service has to run on the Blue Iris machine, consuming significant CPU time and creating a maintenance hassle.
  • It turns out that different jpeg decoders produce slightly different outputs. So when streaming this format of video to a web browser, the video quality continuously degrades as the server and client get more and more out of sync with each other. I worked around this by including a concept of I-frames into the streaming protocol but the video quality degradation still exists between "I-frames".
  • The re-encoding process adds hundreds of milliseconds to the processing time of each frame and this negates the efficiency gain so that the frame rate of remote viewing actually goes down compared to just streaming jpeg video the normal way.
  • In the end it is more effective just to ask Blue Iris to encode jpeg images at a lower quality, as this achieves similar efficiency gains without the added complexity.