Streaming Video to a Pico 1 with MicroPython
I've been doing a bit of graphics experimentation recently with a Pimoroni Pico Display Pack 2.0 attached to a Raspberry Pi Pico W (with the RP2040 microcontroller). I got interested in rendering sprites, and decided to see how quickly you can actually get pixels on screen. This lead me to exploring the different framebuffer formats, and how to get bytes from the network onto the screen with MicroPython.
Scroll to the bottom for the result - streaming video at 10FPS over the network to the Pico!
Pico Graphics
Pimoroni offer a basic 2D drawing library called Pico Graphics. It provides drivers for the various screens they offer, providing 2D drawing primitives and also text support. There are a few ways to initialize the screen, each with tradeoffs of memory vs color palette:
Pen | Palette | Colors | Depth |
---|---|---|---|
PEN_P4 | Manual | 16 | 4 bits |
PEN_P8 | Manual | 256 | 8 bit |
PEN_RGB332 | Preset | 256 | 8 bits |
PEN_RGB565 | Preset | 64K | 16 bits |
PEN_RGB888 | Preset | 16M | 24 bits |
You can calculate the memory usage based on the required size of the frame buffer. In my case, with a 320 x 240 pixel screen:
Pen | Formula | Buffer |
---|---|---|
PEN_P4 | 320 * 240 * 4 / 8 / 1000 | 38.4 kilobytes |
PEN_P8 | 320 * 240 * 8 / 8 / 1000 | 76.8 kilobytes |
PEN_RGB332 | 320 * 240 * 8 / 8 / 1000 | 76.8 kilobytes |
PEN_RGB565 | 320 * 240 * 16 / 8 / 1000 | 153.6 kilobytes |
PEN_RGB888 | 320 * 240 * 32 / 8 / 1000 | 307.2 kilobytes |
It's worth noting that the RP2040 only has 264 KB SRAM, meaning PEN_RGB888 is out of the question. The Dynamic color palettes are interesting but not easy to work with if we're talking about video, unless you want to analyse each frame and tailor a new palette. Perhaps that's another post.
Drawing Pixels
What's the fastest way to change the pixels for the entire screen? Let's analyse some different techniques. I'll be using the following very simple stopwatch class to measure performance:
import utime
class Stopwatch:
def restart(self):
self.start_time = utime.ticks_us()
def elapsed_ms(self):
return utime.ticks_diff(utime.ticks_us(), self.start_time) / 1000
Baseline Update
Let's start by testing how fast the framebuffer can be presented to the screen; everything else will be slower than this because it will involve more work.
import picographics
import random
from stopwatch import Stopwatch
display = picographics.PicoGraphics(display=picographics.DISPLAY_PICO_DISPLAY_2, pen_type=picographics.PEN_RGB332)
display_width, display_height = display.get_bounds()
sw = Stopwatch()
while True:
sw.restart()
display.update()
print(sw.elapsed_ms())
With PEN_RGB332
this takes 34 milliseconds, and with PEN_RGB565
it takes 24 milliseconds. I think these (on the face of it counter-intuitive) results are because the screen itself is using the 24-bit RGB888
format, so more processing is needed to get from e.g. 8 bits or 16 bits to 24 bits (however I did not have time to prove/disprove this hypothesis).
Using Clear
Here we'll set the color at random, and tell the screen to clear. This isn't useful for rendering video because we can only use the same color for every pixel, but it does give us an idea of how efficiently native code can update all pixels in the framebuffer.
import picographics
import random
from stopwatch import Stopwatch
display = picographics.PicoGraphics(display=picographics.DISPLAY_PICO_DISPLAY_2, pen_type=picographics.PEN_RGB332)
display_width, display_height = display.get_bounds()
sw = Stopwatch()
while True:
sw.restart()
display.set_pen(display.create_pen(random.randint(0, 255), random.randint(0, 255), random.randint(0, 255)))
display.clear()
display.update()
print(sw.elapsed_ms())
For PEN_RGB332
on a RP2040, the above code reports around 40ms (25 FPS), and for PEN_RGB565
, it reports 30ms (33 FPS).
The overhead for setting the pen is negligable - in general around 5ms goes to the clear
method, and the remaining time is spent on update
. Replacing the call to clear
with rectangle
yields exactly the same results as you might expect:
# display.clear()
display.rectangle(0, 0, display_width, display_height)
Setting Pixels with Pico Graphics
What happens if we iterate all pixels on the screen and set them individually? The documentation states that this is slow - but how does it compare? We'll change our render loop to the following:
while True:
sw.restart()
display.set_pen(display.create_pen(random.randint(0, 255), random.randint(0, 255), random.randint(0, 255)))
for x in range(0, display_width):
for y in range(0, display_height):
display.pixel(x, y)
display.update()
print(sw.elapsed_ms())
For For PEN_RGB332
on a RP2040, the above code reports around 1.63 seconds to render a frame, and for PEN_RGB565
it reports 1.62 seconds. We're able to set each pixel independently, but it's far too slow for watching video.
Setting Pixels in the Framebuffer
Pico Graphics allows us to access the framebuffer by wrapping the Pico Graphics instance in a memoryview
. This can be treated as a byte array and manipulated directly. Let's see what happens if we access the framebuffer directly, and manipulate each byte in the same manner:
fb = memoryview(display)
while True:
sw.restart()
byte = random.randint(0, 255)
for i in range(0, len(fb)):
fb[i] = byte
display.update()
print(sw.elapsed_ms())
For For PEN_RGB332
on a RP2040, the above code reports around 996 milliseconds to render a frame, and for PEN_RGB565
it reports 1.94 seconds. Since we're operating directly on bytes in the framebuffer, it makes sense that double the bytes means double the time.
We're getting warmer: it's clear that setting pixels is too slow, however we do it. So, what if we were to copy a pre-computed buffer directly?
Reading from a File
The following ffmpeg command will generate a buffer from an image with the pixel format we specify:
ffmpeg -i <input image> -vf "scale=320:240" -f rawvideo -pix_fmt <pixel format> frame.bin
Pico Graphics Pen | ffmpeg -pix_fmt | Example |
---|---|---|
PEN_RGB332 | rgb8 | link |
PEN_RGB565 | rgb565be | link |
PEN_RGB888 | rgb24 | - |
The full set of pixel formats supported by ffmpeg can be shown with ffmpeg -pix_fmts
. Here's how we tweak the render loop to read directly from a file into the framebuffer:
fb = memoryview(display)
f = open('frame.bin', 'rb')
while True:
sw.restart()
f.seek(0)
f.readinto(fb)
display.update()
print(sw.elapsed_ms())
With PEN_RGB332
we get around 45ms to perform the update, and, interestingly, we see the same time for a buffer 2x the size in PEN_RGB565
format. It seems that the I/O is fast and the bottleneck is the drawing. So, what if instead, we do the same thing using a TCP stream, over the network?
Reading from a TCP Socket
The following ffmpeg command will open a TCP socket and stream the specified video at 10 FPS:
ffmpeg -re -i <input video> -vf "scale=320:240,fps=10:round=up" -f rawvideo -pix_fmt rgb8 tcp://0.0.0.0:2000\?listen
See the amended code, to read the TCP stream directly into the framebuffer:
import picographics
import socket
display = picographics.PicoGraphics(display=picographics.DISPLAY_PICO_DISPLAY_2, pen_type=picographics.PEN_RGB332)
sockaddr = socket.getaddrinfo('<ip address>', 2000)[0][-1]
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect(sockaddr)
fb = memoryview(display)
while True:
sock.readinto(fb)
display.update()
Preview
Below is the 10 FPS result with Big Buck Bunny, filmed on a phone (which makes it look a lot worse than it actually is).
The limiting factor here is the available WiFi bandwidth, considering we're transmitting raw video frames at about 6mbits/s. However compression adds compute requirements which are hard to meet on a microcontroller. Still, this is pretty good for a quick view of something like a CCTV or doorbell camera- since the stream in this case is being generated by ffmpeg, you can provide whatever you like as a source.
🏷️ pen pico screen framebuffer pixels video graphics bits stream reports rp2040 byte buffer pixel kilobytes
Please click here to load comments.