Video and audio skills
From WickedSim
Contents
|
Setting up a Video System
This is a non-trivial exercise, and can cost a fortune; however it is possible to set up a good system for surprisingly little if you have video savvy on both the hardware and software side of things. As is often the case if you want to produce quality work and are using computers, the software can cost more than the hardware. In the Internet age, it is possible for the geek with a small (or absent) wallet to nevertheless produce surprising quality by largely using well-chosen freeware!
Hardware and logistic considerations
The following useful points come from an SSH discussion list[1]:
- Get to know your system, and make it (a) easy to use, (b) reliable and (c) provide performance and handling. Make sure you can live with the system.
- Look at the specifications. Even mid-range pan/tilt/zoom cameras are expensive; costs go up once you're looking at the matrix/switcher, controllers, mixers, servers, and so forth* You can pay US$300 000 -- 600 000 or more for a simple setup with four rooms, each with three cameras, all linked to a central control room with 'picture in picture';
- You can set up a 20 room 2000 square metre centre with two computers and 2 or 3 cameras per room, microphones, wiring and installation, server storage and still get some change out of US$75 000. (Delia Anderson; software interest with WebSP declared).
- Ultimately you may well get what you pay for. Consider "Expertise? Listening? System Design? Product Selection? Installation and Troubleshooting? Training? Follow-up Support? Performance Guarantees?" (Richard Kyle)
- Some may only ever use a particular video once, or not even use it for powerful feedback;
- In some facilities, 99% of activity is not recorded (Michael Goodrow)
- DV quality is good enough, don't go for HD (Valeriy Kozmenko);
- You might be able to justify four or five cameras per room if you have specific needs (Consider for example the need to focus in on a barcode-swiping system, ...) (Michael Seropian).
- consider a portable solution. It facilitates in-situ simulation.
- Consider capturing all video to the server, without a complex gatekeeper functionality;
- Get all the different companies to have a look, and try to get them to work together in the long term, as they have different and often complementary capabilities. Company names mentioned were EMS (good on training management), B-Line (good for training automation, particularly with METI), and KB-port (good ergonomics).
- Digital may be less expensive than analogue.
Remember that one size does not fit all. Consider the following quote from Bruce Nappi:
For example, where do you fall on the following " equipment continuum": virtual reality simulator; standardized patients, mixed standardized patient – manikin; manikin alone; manikin hybrid with animal or cadaver tissue; animal or cadaver alone. Where do you fall on the following "customer vs. size chart": small class of med students; small class of attending physicians; medium size class of residents; medium size class of disaster first responders; medium size class of physicians who fly in for special training; large size class of nurses doing annual skills training ( 1100+ RNs, for example ); large size class of regional first responders ( 1000+ police, fire fighters and EMTs, for example ). While these may seem like extreme examples, we've faced them all here at CSESaR.
References
Confusing video standards and stuff
Video can be confusing, especially the plethora of standards and abbreviations. Here are a few ideas for the uninitiated. A lot of the following information has simply been gleaned from the Internet. There are particularly good (but long) articles in Wikipedia on many of the following topics --- here's our redux!
Analogue versus digital
Analogue is fairly easy --- the basic idea is that an electronic signal is used to carry information, and often the information is 'encoded' using the principle that a stronger signal means a more intense component to the video. You can see the problem with this --- especially if you're feeding your information over long cables or through a noisy environment, the signal can easily become degraded, and it may not be easy to pull out the signal from all of the other rubbish. An example of an analogue transmission standard is the signal coming into an ordinary, old-fashioned VGA computer monitor.
Using digital technology, information is represented using numbers, which are entirely represented as just zeroes and ones (lots of them). Here it is possible to set aside some of the numbers as checks on the integrity of the information. Even a weak signal can be correctly interpreted (it's either a zero or a one!) and when there is catastrophic failure of information transfer, the problem can be detected and sometimes even corrected. In addition, depending on the technology employed, higher resolution pictures can be transferred. (Although this last point might seem to be the most important reason to 'go digital' it's actually the least significant, as to most viewers what counts is the absolute size of the image, rather than being able to see every blemish on a person's face). DVI-D is an example of a digital transmission standard.
All video standards are rather complex and difficult to understand for the newcomer to video technology.
DVI
DVI stands for Digital Video Interface (or Digital Visual Interface) and is commonly used to transfer information to flat panel LCD monitors (and a few other types of monitor). Some high-end video displays for TV also used it, but most of these now work on 'HDMI' which can be seen as the next step up from DVI.
It's confusing that there are three DVI formats:
- DVI-analogue;
- DVI-digital (DVI-D);
- DVI-integrated.
DVI analogue uses a DVI cable to transfer a (high quality) analogue signal to a monitor. More exciting is the digital DVI-D, which transports the information as a digital signal. The two are not compatible, but integrated DVI-I cables allow transmission of both formats using the same cable.
DVI-D is quite sexy and can transmit a display resolution of up to 1920x1080 (single link) or even 2048x1536 (using a special dual-link) along a cable of up to five metres or perhaps more. For longer cable lengths, DVI boosters are available. Cable length is a major limitation, as cables use 'twisted pairs' which are less immune to noise than coaxial cable, and the DVI standard doesn't include error correction. For more details on cabling see this link, which also has pictures of the various connectors, and how to convert from DVI-A to DVI-D. You may wish to skip the following paragraph, as it's a little technical!
The bandwidth (ability of the link to transfer information) for a single link DVI cable is 4.95 billion bits per second (4.95 Gbps). The maximum number of picture elements which can be transmitted is 165 million pixels per second, because each picture element (pixel) in DVI-D needs thirty binary digits (bits), corresponding to 24 bit colour (eight bits per colour). The reason why ten bits are needed to represent each eight bits of colour is because colour information is encoded using a fancy mechanism called TMDS, short for transition minimised differential signalling. What TMDS does is to minimise the number of transitions (changes from 0 to 1 or 1 to 0) in every ten bits of encoded colour information. This fancy scheme allows for more reliable transmission of information. It also allows reliable recognition of video synchronisation signals, so even if other information is lost, these signals can be detected! The use of twisted pairs is a key component of TMDS --- a differential (complementary) signal is sent on each of the two, twisted wires, so that environmental noise can relatively easily be filtered out of the signal. When transmitting TMDS it's apparently important to know that the signal is DC coupled.
HDMI and HDCP
HDMI (the 'high-definition multimedia interface') is backward-compatible with DVI, with several flashy features and one or two 'features' which will cause you extreme irritation. HDMI will (or should) work with high-end digital televisions. Note that there are three different HDMI connectors, but it's not that bad --- you won't encounter the 29-pin Type B connector too often, and 19 pin type C connector for portable devices is similar to the standard 19 pin 'Type A' connector.
There are different versions of HDMI (currently from 1.0 -- 1.3), the later versions supporting higher bandwidth. Version 1.0 supports transfer of 165 million pixels per second, as does DVI, whereas the 'pixel clock rate' is 340 MHz (340 million pixels per second) in version 1.3.
Additional features of HDMI include:
- 8 channel uncompressed digital audio (24 bits per sample at 192 kHz sampling rate);
- Certain compressed audio formats (Dolby Digital, DTS);
- Super Audio support;
- Remote control features.
The primary reason why industry has been pushing HDMI so hard is its most irritating feature: HDCP. HDCP (High-bandwidth Digital Content Protection) is a mandatory component of HDMI. HDCP was invented by Intel to prevent piracy of fancy new digital media. Everyone implementing HDMI (because it's based on HDCP) has to pay Intel and limit the capability of their machines. If the display doesn't support HDCP, the machine cannot output high quality signals to that display. Such content cannot be copied.
High-definition content transmitted across any HDCP-based channel is heavily encrypted to prevent eavesdropping. If a vendor violates the (malevolent) spirit of HDCP then the sword of Damocles will fall --- the keys of that vendor will be revoked, and devices manufactured by that vendor will stop playing HDCP content.
Each HDCP device contains a unique set of forty keys (each key contains 56 bits). Before content is played, the devices exchange information using a protocol based on a cryptographic process called Blom's scheme. The HDCP encoding scheme is widely regarded as being fatally flawed; anyone with the keys from 39 devices can compute the secret master matrix and this gives them total control over HDCP content. HDCP stripper boxes are now available but apart from their questionable legality, once a box has been identified, its keys can be revoked, rendering it inoperative. It is also conceivable that the Digital Millenium Copyright Act (in the US) and the EU Copyright Directive (in the EU) will be used as big sticks to whack users or vendors of such devices. Here's a good exposition of HDCP. HD-DVD and Blu-ray media use data encryption based on the Advanced Access Content System (AACS).
Other problems with HDMI abound --- weak connectors which break easily, problems of interoperability, and trouble with closed captioning. It's likely that in the next few years all television signals will be HDMI-compliant, as will many monitors.
What are CSS and Macrovision?
Content Scramble System (CSS) is the rather pathetic 40-bit encryption scheme used on almost all DVDs. The algorithm has been reverse-engineered and is susceptible to a brute-force attack. Macrovision is an even sillier analogue copy protection, where extra signals are implanted in video --- pulses not normally displayed onscreen (during the vertical blanking interval) which mess around with the automatic gain control on a recording VCR, making the picture unwatchable. Macrovision interferes with the picture in some setups even when the user is going about their normal business of not pirating things, but is now enforced by the Digital Millenium Copyright Act, to the great profit of Macrovision Inc. Software for decrypting CSS abounds (formerly DVD Decrypter, acquired by Macrovision and expunged in 2005--2006; now software like DVD43).
Composite video (YUV)
This is the good old (and possibly fast-disappearing) standard TV picture, without the sound. It's also called 'CVBS', which stands for Composite Video, Blanking and Synch. The origins of composite video date back to really old-fashioned black and white TV, which was basically an analogue 'luminance' (brightness) signal dressed up a little with 'synchronisation pulses'. When colour came along, smart engineers added two colour components to the luminance signal. The luminance signal is usually referred to simply as 'Y', and the two colour components are 'U' and 'V'; these last two are also called the chrominance.
YUV refers to the composite luma and chroma signal. The components of the 'colour space' used in television are often referred to as 'YPbPr' for analogue components, or YCbCr for digital ones. This is because the U component is a 'blue' signal (actually the difference between the blue and the luma), and V is a 'red' signal (the difference between red and luma), so b is for blue and r is for red. It would seem that green is not represented, but the 'green' component can easily be determined if we know the total (luma), Pb and Pr. In fact, green is over-represented in the actual encoding of information, and it's blue that suffers! This is because luma is 30% red, 59% green and just 11% blue. An excellent page at answers.com provides lots more detail and also explains how to convert between YUV and the RGB (red/green/blue) signals used in computer and many high-end displays.
Composite video is easily transmitted by modulating a radio frequency carrier and easily stored on a VCR with minimal modification. It's common to use a yellow RCA jack to connect composite video devices (with red and white connectors being used respectively for right and left audio channels --- Red is for Right).
It's not quite as simple as you might imagine, as different methods are used to encode colour. The three methods are:
- PAL
- NTSC
- SECAM
PAL is phase alternating line, a European standard used in most of the world. It was developed to overcome shortcomings of the North American NTSC standard. PAL is actually very similar to NTSC but because it alternates the phase for every second line displayed, colours are more reliable (To overcome this weakness in NTSC, a tint control is required to manually adjust the colour of NTSC). Vertical colour resolution of PAL is half that of NTSC (because it combines information from two lines in generating colour information), but this doesn't matter much as the colour resolution of the human visual apparatus isn't that hot.
PAL and NTSC use an extra signal to transmit chrominance (UV). This added signal is called a subcarrier. U and V information is carried using quadrature amplitude modulation --- there are actually two carrier waves which are out of phase by 90 degrees, hence the word 'quadrature'.
SECAM uses a different (French) scheme from PAL/NTSC. It too is backward-compatible with black and white TV, but uses frequency modulation to send colour information, only transmitting one of the two colour signals at a time. This approach requires storing the colour information in memory, but removes a lot of artefacts induced by 'cross talk' between the U and V channels in PAL/NTSC. SECAM is used in France, Eastern Europe and the former USSR, and many other countries besides, predominantly French-speaking ones.
S-Video, Component video, RGB and Scart!
There are various refinements to composite video. S-Video separates the luminance and chrominance signals (discussed above) resulting in slightly improved picture quality, at least over short distances. Component video takes this a step further with three separate coaxial cables. SCART (Peritel, IEC 933-1) is a French standard where multiple pins are used to carry composite video, unidirectional RGB (separate red, green and blue) components, stereo audio input, and digital signalling. The signal to go for is the (excellent quality) RGB component video. Here's a rather good review of component video. At the end of the day, simply remember that RGB component video is really just VGA under another name. By the way, there are several issues with SCART cabling and connectors.
Interlacing and frame rates
An image on a video screen persists for some time for two reasons --- the screen phosphor continues to emit light for some time, and the human eye doesn't perceive flickering if images are projected at more than about 40 times per second (although far faster flickering may be perceived in very bright conditions).
Early creators of TV sets rapidly worked out that this persistence of vision could be used to good advantage. Rather than sending signals in 405 line detail (which required expensive components) they alternated odd and even lines. We call the half a frame which results from sending only the odd or the even lines a field. By sending 202.5 lines per field, and relying on persistence of vision (and the phosphor) to complete the effect, the required bandwidth was halved!
If your TV set works at 60 Hz (North America) or 50 Hz (most of the rest of the world) then individual lines will only be refreshed at 30 times per second in the US (or 25 times elsewhere).
With better quality displays and the ever-decreasing cost of components, just over 400 lines doesn't cut it, but even higher quality displays have until recently often relied on interlacing. NTSC uses a 525 line interlaced system, and the European standard became 625 lines (both PAL and SECAM use 625 lines interlaced).
Fancier systems now use progressive scan which does provide better video quality. This approach started on very basic computer systems (does anyone remember the old IBM CGA with its 640 by 200, yes two hundred line display?) but Video Graphics Array appeared in 1987 with its magnificent ;-) 640 x 480 resolution, rapidly followed by higher-resolution devices, all using progressive scan.
It's now easy to determine whether a particular device uses interlaced or progressive scan. Just look at the specifications, and see whether there's an 'i' or a 'p' after the number specifying the vertical resolution. For example 1080i is the HDTV (interlaced standard), while 1080p is what everyone wants for a good quality progressive scan.
Frame rates and refresh rates
Clearly the number of frames per second varies from device to device. For example, commercial movies are almost always filmed at 24 frames per second (24 fps), and as already mentioned television images are refreshed at 60 fps (North America) or 50 fps (elsewhere), noting that these are interlaced pictures. The vertical resolution and fps rate are often combined in stating a device's specifications, for example 1080p24.
The obvious question is how commercial movies get away with a flicker frequency of 24 fps when most humans can perceive flicker at under 40 Hz. Trickery is involved --- the projector has a shutter which interrupts the light twice (or three times) for every frame, elevating the flicker rate above the 'flicker fusion frequency' of the human visual system! So the movie has a frame rate of 24 fps, but a refresh rate of 48 or 72 Hz.
Liquid crystal (LCD) or thin-film transistor (TFT) displays work in a different fashion from CRT monitors as they limit transmission of light. A refresh rate for these monitors of about 60Hz is usually quite acceptable, while a large CRT monitor may require rates of over 80Hz to prevent an irritating flicker.
Deinterlacing, timing, pull-down and conversions
When you wish to convert an interlaced movie (for example, taken with a video camera) to a progressive scan image you wish to display on a computer CRT, then as a first step you need to de-interlace the movie. All good video-editing software will perform this with ease. You don't need to fork out money for basic but still good software, for example the freeware VirtualDub is excellent at the task.
Converting films (shot at 24 fps) to NTSC video (at very nearly 30fps) also requires some fancy footwork. The required trickery is referred to as '2-3 pulldown'. The movie is interlaced, and in addition, twelve fields (effectively six frames) are inserted within every 24 frames to beef up the rate to nearly 30 fps. Two frames are converted to four fields, and then the last field is duplicated to form a fifth field; the process is then repeated for the following two frames.
Because NTSC actually runs at 29.97002616431 fps and not 30 fps, there are audio implications of such conversions. You might ask why the NTSC rate is ~29.97 fps and not exactly thirty. The answer is apparently that this minute change was introduced when colour television was introduced. The frequency spectrum of the sound carrier used to transfer the TV audio signal was very similar to that of the colour carrier, potentially resulting in interference at 920 kHz, which would be visible. The solution was a tiny shift in the NTSC frame rate to almost 29.97 fps, which is what we're stuck with! Note that when converting 24 fps video to ~29.97 NTSC, engineers play back the film slowed down by 0.1 percent, to accommodate the difference between 30 fps and the actual NTSC standard.
In coverting from 24 fps to PAL (25 fps) it's common to simply play the movie a little faster, resulting in a slightly shorter movie, with the sound at a higher pitch.
It's clear that timing is pretty important in making video. We've only scratched the surface.Here are details of time codes established by the Society of Motion Picture and Television Engineers (SMPTE). This digital encoding allocates a unique timestamp to each video frame!
Overlays and genlocks
In displaying and editing video you will often want to combine video signals. This may be as 'picture in picture' (or even displaying four or more pictures on the same screen), or overlaying one video on top of another. An example of overlaying in a simulation setting would be videoing the participants in a scenario and overlaying the monitor display, with the ECG, saturation and arterial traces.
There are many ways of combining signals. If you aren't constrained to real time and have time to sit and play, it's relatively easy to digitise a video signal (many camcorders allow export to a PC via a USB or FireWire connection; more generally you can use a video capture card) and then edit the video. A most powerful tool for non-linear editing on a PC is called AVIsynth (and it's free) --- the minor disadvantage is that you have to sit down and learn how to write AVIsynth scripts to do things like overlays or picture-in-picture. There is a host of other non-linear editing software, much of which is rather pricey. For simple (freeware) linear editing, VirtualDub is fine, but for more complex tasks you need a non-linear editor.
If you want realtime overlay of monitor data onto a video, options include:
- A genlock overlay which takes the VGA signal from the monitor, subtracts the black, and overlays the signal on a composite video signal. There are many available, some such as the Globalmediapro P-203 (for a ridiculously low price); we've had good reports of the somewhat more expensive Coriogen Eclipse. The DeltaScan/Multigen RC01 is in a nasty plastic box and breaks easily.
- Turning the VGA signal into a composite video signal (devices which do this abound; many are inexpensive yet still provide a good signal), and then using a vision mixer (like the Edirol V4, which you can get for under $2K) to mix the various video streams.
What does the term 'genlock' mean? It stands for 'generator lock' which refers to the synchronisation of video signals --- if you wish to overlay one video on top of another, clearly the frames must be in synch, and this is what a genlock achieves. Genlocking involves synchronising not just the horizontal and vertical timing of the picture signals, but also the phase of the chrominance subcarriers (discussed above) of the two sources.
Some video and audio resources
Here are a few useful links:
- Littmann Heart And Lung Sounds
- Website "Auscultation Assistant" has great heart and lung sounds

