|
Batch Convert
| [Previous] [Main] [Next] |
| · | Convert from: allows you to select the convert from file family, type and sample rate. If this information is written in the files themselves (in the file header) then Vox Studio will find the information automatically (like in ".wav" files). Many other file formats (especially the telephony formats) do not contain that information and then this has to be supplied to Vox Studio because the program has no other way of finding out what the input is. All input source files should have the same format. All converted output files will have the same format.
|
| The browse button allows you to navigate to and select multiple files (literally thousands of files) or even a complete directory to convert. You can use the standard Windows file selection techniques to indicate which files need converting. Click left, Shift-Click left, CTRL-Click left all work as usual. See your Windows manual on how to select multiple files in a selection box. The good old *.* command from ancient DOS times works here too and selects all the files in the folder.
| |
| · | Convert to: you always have to fully describe the sound family and sound type you want to convert to; Vox Studio cannot guess this. If you don't feel like entering this every time you can preset this as a default using the Tools/Defaults menu. The extension to the output filenames can be the same as the input filename, it can be removed or another one can replace it.
|
| · | The sampling frequency can modify the sample rate of the file, i.e. the number of signal measurements performed per second. According to the Nyquist theorem the highest frequency component in any sound file cannot exceed 1/2 the frequency at which the file is sampled. So, when a file is down-sampled (resampled at a lower sampling rate) Vox Studio automatically filters the file to remove the components above 1/2 the sampling frequency before doing the re-sampling. Naturally, when a file is up-sampled (resampled at a higher sampling rate), there is no way Vox Studio can add frequency components that are not present in the file (or have been removed by previous down-sampling). In other words, down-sampling will necessarily result in a file with a lower bandwidth, but up-sampling will not increase the file's frequency content. Thus, when you do a down-sampling followed by a reverse up-sampling the resulting file will not be the original. If that is the intent, you should reload the original file from disk instead. Downsampling irremediably deteriorates a file and should always be the last thing you do to a file. You can select any of the "standard" telephony or multimedia sample rates from a list, but you can also input a custom "non-standard" sample rate if you need to.
|
| While converting you can also adjust leading and trailing silence, improve intelligibility, normalize the sound amplitude, or filter the sound. These options can be selected using a tabbed section of the dialog box.
| |
| · | Trim leaders and trailers: allows easy production of prompt files with uniform leading and trailing blanks. What used to be a time-consuming manual editing task has now become as easy as selecting a button with your mouse. The rest is done for you automatically.
|
| Automatic silence adjustment is a threshold-activated process and thus requires spotless, clean recordings. If you have background noise in your recordings, Vox Studio may incorrectly detect the beginning and end of sound in your file.
| |
|
| |
| The dialog box allows you to select the threshold level Vox Studio uses to detect the difference between silence and non-silence. The current file will be scanned for sound level and Vox Studio will automatically detect the beginning and end of sound in the file based on the threshold you selected. It will then adjust the length of leading and trailing silence to a fixed number of milliseconds, which can be the same for all your files. The threshold level can be set either as a percentage of the maximum on-screen amplitude or as a threshold value in dB, dBm or dBV.
| |
|
| |
| The silence duration itself is defined in the Tools/Defaults/Convert Active menu. A reasonable typical value would be 300 milliseconds. Note that the Leader/Trailer option is actually capable of adding silence to your file if you request more leading or trailing silence than your file actually contains!
| |
|
| |
| If the voice files you are trimming are not recorded in optimal conditions it may be very useful to perform a "Center" or a "Normalize" operation on your file before you apply the "Leader/Trailer" command. This has the effect of "flattening" the low frequency or DC signal background on the zero line and "pumping up the signal" which makes it so much easier to perform a good threshold detection on your file.
| |
| · | Normalize sound volume: allows easy and rapid production of prompt files with equal (and, for most telephony applications, preferably reasonably high) sound volume levels.
|
| The dialog box allows you to select the maximum sound energy desired. This volume selection can be done in % of the maximum on-screen amplitude, in dB or in dBm/dBV. The current file will be scanned for maximum sound energy levels and the whole file will be multiplied by a factor that will bring the maximum sound energy to the desired level. Vox Studio does not measure peak AMPLITUDE, it measures peak ENERGY over a duration of several milliseconds (about the duration of a spoken syllable). Vox Studio levels are internally calibrated for steady pure sine waves at 800Hz. This may confuse you in the beginning as the best setting in Vox Studio usually is a choice of about 65% of maximum energy, not 100%. That's simply because human voice is neither steady nor a pure sine wave. On a pure steady sine wave a Vox Studio normalization to 100% will produce a signal that just reaches the floor and the ceiling of the display. As a result, if you have recorded speech and you select a maximum average level of energy of say 70% in the Normalize command you can get sound spikes whose amplitude will very briefly be clipped at 100%. For voice recordings, this happens on very short sounds like "T", "P" and "K" which are explosive sounds anyway and where amplitude saturation does not matter very much. This technique allows recording at as high a level as possible, with best possible telephony quality. A similar technique is used when recording with a recording device that has a recording meter and adjusting the level to remain below 0 dB most of the time (green area lights) but allowing some very short spikes to exceed 0 dB (red area lights briefly).
| |
|
| |
| The best practical approach is to calibrate your Vox Studio recording and playback setup once and for all by fine-tuning your recording settings. You can use the monitor function and the graphical display function to do that. Your signal should go from time to time in the red range on the Monitor VU-meter and should use at least 75% of the available amplitude range shown on the graphical waveform display screen. It does not matter too much if your signal peaks sometimes and briefly (we mean briefly) hits the ceiling or the floor. This usually will happen for utterances with T, P and K where a little clipping does not matter so much. You should select a preferred normalization factor that gives best volume and quality results when played back to the specific telephony card you use. Once that is set, your settings should usually never change unless you change the target telephony system. Remember that the Normalize option is there to correct minor variations between recordings; it is not there to correct grossly clipped or grossly under-recorded signals. You should regularly check if you use nearly the full dynamic range you have available by looking at your recorded messages in the graphical display window. Note: when Normalize is selected, a Center function is in fact performed automatically on the signal prior to amplitude normalization.
| |
|
| |
| Finally, although this functionality is described elsewhere in the chapter describing the calibration tool, remember that Vox Studio's dBV or dBm display can be calibrated so that the dB levels you see on-screen actually match the dB levels that the same signal will generate on the target telephone line.
| |
| · | Center around zero: automatically re-centers the sound file around the zero baseline to eliminate DC offsets caused by your sound card, and possibly by very-low-frequency interference. Note: when a Normalize command is issued, a Center command is, in fact, performed automatically on the file prior to amplitude normalization.
|
| Depending on the quality of your sound card, and your recording setup, it may also be very advantageous to perform a centering operation before doing leader/trailer work (which requires rather precise threshold detection). Vox Studio does not automatically perform a centering operation before every leader/trailer command but you may select this option.
| |
|
| |
| You can perform a quick visual check using the graphical waveform display in the main window. Record a file with nothing but silence using your onboard sound card. Save the file and then reload it into Vox Studio. You should see a clean horizontal line exactly superimposed on the zero baseline. If you see two lines your hardware does generate significant offset. If you see only one line the offset generated is small enough that you can disregard it. If you see green rubbish (grass) on the baseline your recording setup picks up unwanted noise (which you will certainly hear in the final recording).
| |
|
| |
| If you manipulate files that were recorded on another workstation, it is a good idea to load the files first and visually check them for a possible DC shift.
| |
|
| |
| DC offsets are very disturbing whenever you need to amplify a signal or do threshold detection. Also too much DC offset can generate audible clicks at the beginning and end of a file.
| |
| · | Intelligibility filter: consists of a Clarity filter which corrects the muffled sound effect often obtained when down-sampling voice files and a Boost option which produces variable signal amplification in order to increase the perceived voice energy content of the recorded file. The Intelligibility Filter can be selected only while doing a Convert Active, Convert Current or a Batch Conversion operation. Intelligibility Filter options are set in the defaults section for the conversion functions. Usually a "weak" clarity filter and "no boost" is the best choice. Use trial-and-error to select the options that suit your voice files best.
|
| · | Low- and high-pass filter: allows individual selection of either a high-pass (low-cut) or a low-pass (high-cut) filter. For the high-pass and low-pass filters the -3 dB cutoff frequencies can be chosen with a 1 Hz resolution. According to the Nyquist theorem the highest frequency component in any sampled file cannot exceed 1/2 the frequency at which the file is sampled. Audible "aliasing" errors occur whenever the Nyquist theorem is overlooked. It makes little sense to select a cutoff frequency above 1/2 the sampling frequency of the file.
|
| · | DTMF filter: removes signal components at and near the frequencies that correspond to one group of the DTMF frequencies. This harsh treatment effectively removes talk-off problems, but also degrades the sound quality somewhat. Use this only on files that really do cause talk-off problems and cannot be rerecorded. The DTMF filter has 3 strengths you can choose from: weak attenuates 10 dB, medium attenuates 20 dB, and strong attenuates 30 dB. Use the weakest filter that solves your talk-off problem.
|