Chris Double: Playing Ogg files with audio and video in sync
My last post in this series had Vorbis audio playing but with Theora video out of sync. This post will go through an approach to keeping the video in sync with the audio. To get video in sync with the audio we need a timer incrementing from when we start playback. We can't use the system clock for this as it is not necessarily keeping the same time as the audio or video being played. The system clock can drift slightly and over time this audio and video to get out of sync. The audio library I'm using, libsydneyaudio, has an API call that allows getting the playback position of the sound sample being played by the audio system. This is a value in bytes. Since we know the sample rate and number of channels of the audio stream we can compute a time value from this. Synchronisation becomes a matter of continuously feeding the audio to libsydneybackend, querying the current position, converting it to a time value, and displaying the frame for that time. The time for a particular frame is returned by the call to th_decode_packetin. The last parameter is a pointer to hold the 'granulepos' of the decoded frame. The Theora spec explains that the granulepos can be used to compute the time that this frame should be displayed up to. That is, when this time is exceeded this frame should no longer be displayed. It also enables computing the location of the keyframe that this frame depends on - I'll cover what this means when I write about how to do seeking. The libtheora API th_granule_time converts a 'granulepos' to an absolute time in seconds. So decoding a frame gives us 'granulepos'. We store this so we know when to stop displaying the frame. We track the audio position, convert it to a time. If it exceeds this value we decode the next frame and display that. Here's a breakdown of the steps:
Notice that the structure of the program is different to the last few articles. We no longer read all packets from the stream, processing them as we get them. Instead we specifically process the audio packets and only handle the video when it's time to display them. Since we are driving our a/v sync off the audio clock we must continously feed the audio data. I think it tends to be a better user experience to have flawless audio with video frame skipping rather than skipping audio but smooth video. Worse is to have both skipping of course. The example code for this article is in the 'part4_avsync' branch on github. This example takes a slightly different approach to reading headers. I use ogg_stream_packetpeek to peek ahead in the bitstream for a packet and do the header processing on the peeked packet. If it is a header I then consume the packet. This is done so I don't consume the first data packet when reading the headers. I want the data packets to be consumed in a particular order (audio, followed by video when needed). // Process all available header packets in the stream. When we hit To read packets for a particular stream I use a 'read_packet' function that operates on a stream passed as a parameter: bool OggDecoder::read_packet(istream& is, If we need to read a new page (to be able to get more packets) we check the stream for the read page and if it is not for the stream we want we store the packet in the bitstream for that page so it can be retrieved later. I've added an 'active' flag to the streams so we can ignore streams that we aren't intersted in. We don't want to continuously buffer data for alternative audio tracks we aren't playing for example. The streams are marked inactive when the headers are finished reading. The code that does the checking to see if it's time to display a frame is:
The code for decoding and display the Theora video is similar to the Theora decoding article. The main difference is we store the granulepos in mGranulepos so we know when to stop displaying the frame. This version of 'plogg' should play Ogg files with a Theora and Vorbis track in sync. You can test it on the transformers trailer for example. It does not play Theora files with no audio track - we can't synchronise to the audio clock if there is no audio. This can be worked around by falling back to delaying for the required framerate as the previous Theora example did. The a/v sync is not perfect however. If the video is large and decoding keyframes takes a while then we can fall behind in displaying the video and go out of sync. This is because we only play one frame when we check the time. One approach to fixing this is to decode, but not display, all frames up until the audio time rather than just the next time. The other issue is that the API call we are using to write to the audio hardware is blocking. This is using up valuable time that we could be using to decode a frame. When the write to the sound hardware returns we have very little time to decode a frame before glitches start appearing in the audio due to buffer underruns. Try playing a larger video (like the Ghostbusters HD Trailer and the audio and video will skip (depending on the speed of your hardware). This isn't a pleasant experience. Because of the blocking audio writes we can't skip more than one frame due to the frame decoding time taking too long causing audio skip. The fixes for these aren't too complex and I'll go through it in my next article. The basic approach is to move to an asynchronous method of writing the audio, skip displaying frames when needed (to reduce the cost of the YUV decoding), skip decoding frames if possible (depending on location of keyframes we can do this), and to check how much audio data we have queued before decoding to always ensure we won't drop audio while decoding. With these fixes in place I can play the 1080p Ogg version of Big Buck Bunny on a Macbook laptop (running Arch Linux) with no audio interruption and with a/v syncing correctly. There is a fair amount of frame skipping however but it's a lot more watchable than if you try playing it without these modifications in place. And better than watching with the video lagging further and further behind the longer you watch it. Further improvements can be made to reduce the frame skipping by utilising threads to take advantage of extra core's on the PC. After the followup article on improving the a/v sync I'll look at covering seeking. Categories: ogg, theora, vorbis, firefox John Benediktsson: IPInfoDB
Providing location-based services is all the rage these days. On the newer mobile devices, there are services that use GPS and nearby cellular towers to improve your user experience with the knowledge of where you are at any moment. http://ipinfodb.com/ip_query.php?ip=74.125.45.100 Curious to see how easily it would be to use this API from Factor, I wrote a vocabulary for accessing IPInfoDB. First, we define the characteristics of an ip-info tuple.TUPLE: ip-info ip country-code country-name The API returns XML response, which we need to convert into an ip-info object. For this, we use the excellent XML parsing vocabulary that has seen a lot of improvements recently.: find-tag ( tag name -- value ) With that done, it is very easy to make an http request to retrieve one's own location: : locate-my-ip ( -- info ) Or determine the location of any arbitrary IP: : locate-ip ( ip -- info ) The example I showed earlier would be: ( scratchpad ) "74.125.45.100" locate-ip . Presumably, if this code were to be used more frequently, it would need some error handling for when the http server is not available, or non-responsive. Anyway, this vocabulary is available on my GitHub Chris Double: Decoding Vorbis files with libvorbis
Decoding Vorbis streams require a very similar approach to that used when decoding Theora streams. The public interface to the libvorbis library is very similar to that used by libtheora. Unfortunately the libvorbis documentation doesn't contain an API reference that I could find so I'm following the approached used by the example programs. Assuming we have already obtained an ogg_packet, the general steps to follow to decode and play Vorbis streams are:
In the example code in the github repository I create a VorbisDecode object that holds the objects needed for decoding. This is similar to the TheoraDecode object mentioned in my Theora post: class VorbisDecode { I added a TYPE_VORBIS value to the StreamType enum and the stream is set to this type when a Vorbis header is successfully decoded: int ret = vorbis_synthesis_headerin(&stream->mVorbis.mInfo, The example program uses libsydneyaudio for audio output. This requires sound samples to be written as signed short values. When I get the floating point data from Vorbis I convert this to signed short and send it to libsydneyaudio: int ret = 0; A couple of minor changes were also made to the example program:
The code for this example is in the 'part3_vorbis' branch of the github repository. This also includes the Theora code but does not do any a/v synchronisation. Files containing Theora streams will show the video data but it will not play smoothly and will not be synchronised with the audio. Fixing that is the topic of the next post in this series. Categories: firefox, ogg, theora, vorbis Chris Double: Decoding Theora files using libtheora
My last post covered read Ogg files using libogg. The resulting program didn't do much but it covered the basic steps needed to get an ogg_packet which we need to decode the data in the stream. The thing step I want to cover is decoding Theora streams using libtheora. In the previous post I stored a count of the number of packets in the OggStream object. For theora decoding we need a number of different objects to be stored. I encapsulate this in a TheoraDecode structure: class TheoraDecode { th_info, th_comment and th_setup_info contain data read from the Theora headers. The Theora stream contains three headers packets. These are the info, comment and setup headers. There is one object for holding each of these as we read the headers. The th_dec_ctx object holds information that the decoder requires to keep track of the decoding process. th_info and th_comment need to be initialized using th_info_init and th_comment_init. Notice that th_setup_info is a pointer. This needs to be free'd when we're finished with it using th_setup_free. The decoder context object also needs to be free'd. Use th_decode_free. A convenient place to do this is in the TheoraDecode constructor and destructor: class TheoraDecode {The TheoraDecode object is stored in the OggStream structure. The OggStream stucture also gets a field holding the type of the stream (Theora, Vorbis, Unknown, etc) and a boolean indicating whether the headers have been read: class OggStream Once we get the ogg_packet from an Ogg stream we need to find out if it is a Theora stream. The approach I'm using to do this is to attempt to extract a Theora header from it. If this succeeds, it's a Theora stream. th_decode_headerin will attempt to decode a header packet. A return value of '0' indicates that we got a Theora data packet (presumably the headers have been read already). This function gets passed the info, comment, and setup objects and it will populate them with data as it reads the headers: ogg_packet* packet = ...got this previously...; In this example code we attempt to decode the header. If it fails it bails out, possibly to try decoding the packet using libvorbis or some other means. If it succeeds the stream is marked as type TYPE_THEORA so we can handle it specially later. If all headers packets are read and we got the first data packet then we call th_decode_alloc to get a decode context to decode the data. Once the headers are all read, the next step is to decode each Theora data packet. To do this we first call th_decode_packetin. This adds the packet to the decoder. A return value of '0' means we can get a decoded frame as a result of adding the packet. A call to th_decode_ycbcr_out gets the decoded YUV data, stored in a th_ycbcr_buffer object. This is basically an array of the YUV data. ogg_int64_t granulepos = -1; The 'granulepos' returned by the th_decode_packetin call holds information regarding the presentation time of this frame, and what frame contains the keyframe that is needed for this frame if it is not a keyframe. I'll write more about this in a future post when I cover synchronising the audio and video. For now it's going to be ignored. Once we have the YUV data I use SDL to create a surface, and a YUV overlay. This allows SDL to do the YUV to RGB conversion for me. I won't copy the code for this since it's not particularly relevant to using the libtheora API - you can see it in the github repository. Once the YUV data is blit to the screen the final step is to sleep for the period of one frame so the video can playback at approximately the right framerate. The framerate of the video is stored in the th_info object that we got from the headers. It is represented as the fraction of two numbers: float framerate = With all that in place, running the program with an Ogg file containing a Theora stream should play the video at the right framerate. Adding Vorbis playback is almost as easy - the main difficulty is synchronising the audio and video. I'll cover these topics in a later post. Categories: firefox, theora, ogg Chris Double: Reading Ogg files using libogg
Reading data from an Ogg file is relatively simple. The file format is well documented in RFC 3533. I showed how to read the format using JavaScript in a previous post. For C and C++ programs it's easier to use the xiph.org libraries. There are libraries for decoding specific formats (libvorbis, libtheora) and there is a library for reading data from Ogg files (libogg). I'm prototyping some approaches to improve the performance of the Firefox Ogg video playback and while I'm at it I'll write some posts on using these libraries to decode/play Ogg files. Hopefully it'll prove useful to others using them and I can get some feedback on usage. All the code for this is in the plogg git repository on github. The 'master' branch contains the work in progress player that I'll describe in a series of posts, and there are branches specific to the examples in each post. The libogg documentation describes the API that I'll be using in this post. All that this example will do is read an Ogg file, read each stream in the file and count the number of packets for that stream. It prints the number of packets. It doesn't decode the data or do anything really useful. That'll come later. You can think of an Ogg file as containing logical streams of data. Each stream has a serial number that is unique within the file to identify it. A file containing Vorbis and Theora data will have two streams. A Vorbis stream and a Theora stream. Each stream is split up into packets. The packets contain the raw data for the stream. The process of decoding a stream involves getting a packet from it, decoding that data, doing something with it, and repeating. That describes the logical format. The physical format of the Ogg file is split into pages of data. Each physical page contains some part of the data for one stream. The process of reading and decoding an Ogg file is to read pages from the file, associating them with the streams they belong to. At some point we then go through the pages held in the stream and obtain the packets from it. This is the process the code in this example follows. The first thing we need to do when reading an Ogg file is find the first page of data. We use a ogg_sync_state structure to keep track of search for the page data. This needs to be initialized with ogg_sync_init and later cleaned up with ogg_sync_clear: ifstream file("foo.ogg", ios::in | ios::binary);Note that the libogg functions return an error code which should be checked, A result of '0' generally indicates success. We want to obtain a complete page of Ogg data. This is held in an ogg_page structure. The process of obtaining this structure is to do the following steps:
Here's the code to do this: ogg_page page; We need to keep track of the logical streams within the file. These are identified by serial number and this number is obtained from the page. I create a C++ map to associate the serial number with an OggStream object which holds information I want associated with the stream. In later examples this will hold data needed for the Theora and Vorbis decoding process. class OggStream Each stream has an ogg_stream_state object that is used to keep track of the data read that belongs to the stream. We're storing this in the OggStream object that we associated with the stream serial number. Once we've read a page as described above we need to tell libogg to add this page of data to the stream. StreamMap streams; This code uses ogg_page_serialnoto get the serial number of the page we just read. If it is the beginning of the stream (ogg_page_bos) then we create a new OggStream object, initialize the stream's state with ogg_stream_init, and store it in out streams map. If it's not the beginning of the stream we just get our existing entry in the map. The final call to ogg_stream_pagein inserts the page of data into the streams state object. Once this is done we can start looking for completed packets of data and decode them. To decode the data from a stream we need to retrieve a packet from it. The steps for doing this are:
while (..read a page...) {That's all there is to reading an Ogg file. There are more libogg functions to get data out of the stream, identify end of stream, and various other useful functions but this covers the basics. Try out the example program in the github repository for more information. Note that the libogg functions don't require reading from a file. You can use these routines with any data you've obtained. From a socket, from memory, etc. In the next post about reading Ogg files I'll go through using libtheora to decode the video data and display it. Categories: theora, ogg, firefox John Benediktsson: Brainf*ck
The Brainfuck programming language is a curious invention. Seemingly useful only for proving oneself as a True Geek at a party, it could also be useful for educational purposes. ++++++++++[>+++++++>++++++++++>+++>+<<<<-] For fun, I thought I would build a Brainfuck compiler for Factor. (scratchpad) USE: brainfuck Behind the scene, the Brainfuck code is being compiled into proper Factor using a macro that parses the Brainfuck code string. When translated into Factor, the "Hello, world!" example becomes: <brainfuck> I made only a slight optimization, which you might notice above, to collapse a series of identical operators together into a single call to the operator word, while staying true to the original set of Brainfuck operators. Some fun examples of Brainfuck in the brainfuck-tests.factor unit tests include addition, multiplication, division, uppercase, and a cat utility.It is available on my Github, and hopefully will be pulled into the main repository soon. Chris Double: Reading Ogg files with JavaScript
On tinyvid.tv I do quite a bit of server side reading of Ogg files to get things like duration and bitrate information when serving information about the media. I wondered if it would be possible to do this sort of thing using JavaScript running in the browser.
Categories: tinyvid, javascript, theora, firefox Chris Double: Video for Everybody - HTML 5 video fallback
Kroc Camen has made available Video for Everybody, an HTML snippet that uses HTML 5 video if it's available in the browser, otherwise falling back to different video playback options.
Categories: video, firefox |
Blogroll
planet-factor is an Atom/RSS aggregator that collects the contents of Factor-related blogs. It is inspired by Planet Lisp. |