vobsub to srt

Transforming vobsub subtitles (bitmap graphics) into srt (text) at the command line is not straightforward. Because they are graphics, you must use OCR to turn them into text.


When working with mkv, it's best to use mkvtools as libav doesn't handle extracting vobsub without a lot of buffer underrun errors (that may or may not be important)

sudo apt-get install tesseract-ocr
sudo apt-get -y --force-yes install vobsub2srt
sudo apt-get install mkvtoolnix

The basic pattern is: 1) look at the file to see what stream number has the vobsub in it, 2) extract that stream and 3) convert it to srt format.

mkvinfo someFile.mkv
mkvextract tracks someFile.mkv someNumber:someFile
vobsub2srt someFile

You can script this with a loop.  If you know what order the subtitle track is, you can extract it like so (the 3rd track in this case). mkvextract will name the files with a .idx and .sub.

IFS=$(echo -en "\n\b")
for X in `ls *.mkv`; mkvextract tracks $X 2:$X;  done

But if the order of the subtitle stream in the file is random, you'll have to run several loops and increment the stream ID until you've covered all the possibilities.  A peculiarity of the tool will name the subs correctly but leave .temp for the others, that you can then discard.

IFS=$(echo -en "\n\b")

for X in `ls *.mkv`; do Y=`echo $X|sed 's/\.mkv/.temp/'`;mkvextract tracks "$X" 0:"$Y";done

for X in `ls *.mkv`; do Y=`echo $X|sed 's/\.mkv/.temp/'`;mkvextract tracks "$X" 1:"$Y";done

for X in `ls *.mkv`; do Y=`echo $X|sed 's/\.mkv/.temp/'`;mkvextract tracks "$X" 2:"$Y";done

rm *.temp




Then you can convert to SRTs

for X in `ls *.idx`; do Y=`echo $X|sed 's/\.idx//'`;vobsub2srt "$Y";done
rm *.idx *.sub

Comments