摘自网络:
根据PDF文档参考手册,PDF的书签只能用PDFDocEncoding或者UTF-16BE来进行编码。在用LaTeX的hyperref宏包生成PDF文档的过程中,hyperref使用源文档的编码生成书签,结果导致在PDF阅读器中书签不能正确显示。
有两个解决办法,一个是采用\texorpdfstring{}{}命令进行两种不同的编码,一个针对书签,一个针对文本,这种方法使用起来比较繁琐。另一种是让hyperref按照文档编码产生书签,然后将书签编码转换成UTF-16BE编码。用LaTeX生成的书签文件以.out做为扩展名,因此只需转换.out文件即可。
用perl编写编码转换程序,然后写了一个脚本以简化生成书签的过程,其中gbkbm和utf8bm分别用来将GBK和UTF8编码的书签转换成UTD-16BE。用法如下:
$ latex gbkbkmark
$ gbkbm gbkbkmark
$ latex gbkbkmark
$ dvipdfm gbkbkmark
如果使用pdflatex直接产生PDF文档,那么需要将gbkbkmark.tex中hyperref的PDF驱动程序由dvipdfm 改为pdftex,编译过程如下代码:
$ pdflatex gbkbkmark
$ gbkbm gbkbkmark
$ pdflatex gbkbkmark
上面运行过程中的gbkbm gbkbkmark命令的作用是转换书签的编码,其他过程与LaTeX编译过程的标准步骤相同。
utf8bm的作用是将UTF8编码的书签转换成UTF-16BE编码,用法与gbkbm相同。
几个脚本的源代码如下:
======= gbkbm =======
#!/bin/sh
#
# $Id: gbkbm, v 0.90 2009/10/25 23:03:15 $
#
# Convert Hyperref Bookmark Encoding from GBK to UTF-16BE
#
# According to the PDF Reference, outlines are text strings and can therefore only be encoded
# in either PDFDocEncoding or Unicode character encoding (UTF-16BE). This script assumes that
# the bookmark file (.out) is encoded in GBK/GB2312, then convert to UTF-16BE.
#
# Author: Chunhua Li, 2009/10/25 <
#
# Usage: gbkbm example[.out]
#
# Examples of use for latex/pdftex:
#
# 1. Assume that the current encoding is GBK, and ‘dvipdfm’ is to used for PDF driver.
# latex example
# latex example
# gbkbm example
# latex example
# dvipdfm example
#
# 2. Assume that the current encoding is GBK, and ‘pdftex’ is to used for PDF driver.
# pdflatex example
# pdflatex example
# gbkbm example
# pdflatex example
#
GBK2UNI=/usr/local/bin/gbk2utf16be.pl
# Extract the filename and ignore the extention (.out)
FILE=${1%.out}
TMPFILE=`mktemp /tmp/example.XXXXXX` || exit 1
# call utf2utf16be.pl to do the actual conversion
$GBK2UNI $TMPFILE
# Modified the bookmark output file
mv -f $TMPFILE ${FILE}.out
======= gbk2utf16be.pl =======
#!/usr/bin/perl
use Encode;
sub octal {
my ($t) = (@_);
sprintf "\\%03o", ord($t);
}
sub convert {
my ($t) = (@_);
if ($t =~ /[\x80-\xFF]/) {
Encode::from_to($t, "GBK", "UTF-16BE");
$t =~ s/(.)/${\octal($1)}/g;
$t = "\\376\\377" . $t;
}
$t;
}
while () {
$_ =~ s/([^}]*}{)([^}]*)/$1${\convert($2)}/;
print $_;
}
======= utf8bm =========
#!/bin/sh
#
# $Id: utf8bm, v 0.90 2009/10/25 23:03:15 $
#
# Convert Hyperref Bookmark Encoding from UTF8 to UTF-16BE
#
# According to the PDF Reference, outlines are text strings and can therefore only be encoded
# in either PDFDocEncoding or Unicode character encoding (UTF-16BE). This script assumes that
# the bookmark file (.out) is encoded in UTF8, then convert to UTF-16BE.
#
# Author: Chunhua Li, 2009/10/25 <
#
# Usage: utf8bm example[.out]
#
# Examples of use for latex/pdftex:
#
# 1. Assume that the current encoding is UTF8, and ‘dvipdfm’ is to used for PDF driver.
# latex example
# latex example
# utf8bm example
# latex example
# dvipdfm example
#
# 2. Assume that the current encoding is UTF8, and ‘pdftex’ is to used for PDF driver.
# pdflatex example
# pdflatex example
# utf8bm example
# pdflatex example
#
UTF2UNI=/usr/local/bin/utf8to16be.pl
# Extract the filename and ignore the extention (.out)
FILE=${1%.out}
TMPFILE=`mktemp /tmp/example.XXXXXX` || exit 1
# call utf2utf16be.pl to do the actual conversion
$UTF2UNI $TMPFILE
# Modified the bookmark output file
mv -f $TMPFILE ${FILE}.out
========= utf8to16be.pl ==========
#!/usr/bin/perl
use Encode;
sub octal {
my ($t) = (@_);
sprintf "\\%03o", ord($t);
}
sub convert {
my ($t) = (@_);
if ($t =~ /[\x80-\xFF]/) {
Encode::from_to($t, "UTF8", "UTF-16BE");
$t =~ s/(.)/${\octal($1)}/g;
$t = "\\376\\377" . $t;
}
$t;
}
while () {
$_ =~ s/([^}]*}{)([^}]*)/$1${\convert($2)}/;
print $_;
}