Wednesday, December 24, 2008

fox2html: foxmarks.json to HTML converter

This script converts "foxmarks.json" saved by the Firefox plugin, Foxmarks, on a self hosted server to a styled HTML file with interactive folders. An example output is here.

Thursday, September 25, 2008

Duplication elimination

In engineering a computation project, one desires a single point of information entry. It's elegant to come up with coding structures that will facilitate this requirement, that is, a single place in the codes pertinent to the to-be-computed problems. However, while the programming language is an interface between human and machine. It's not generally friendly to the users. The ultimate way of choice is to have human readable/editable files and make programs to generate corresponding codes when needed.

Thursday, August 14, 2008

Pay attention

If you don't see people, you don't see intention.

Sunday, June 1, 2008

adding OCR to djvu file

For each page in the file "cake.djvu", we can use the "tesseract" to process the page image:
djvused -e "select ${page};save-page-with \"cake_page.djvu\"" cake.djvu
convert cake_page.djvu cake.tif
tesseract cake.tif cake_box batch.nochop makebox
tesseract cake.tif cake_txt batch.nochop
This produces the information for the text structure (lines and words) and positioning (coordinate for each character). To convert this information to the hidden-text format for use with djvused, use
perl<<'EOL'>cake_text.txt
open TXT, "<:utf8", "cake_txt.txt";
open BOX, "<:utf8", "cake_box.txt";
$pxn = 1000000;
$pxx = 0;
$pyn = 1000000;
$pyx = 0;
$pagebuf = "";
while ($line = <TXT>) {
chop $line;
@words = split /\s+/, $line;
next if $#words < 0;
$lxn = 1000000;
$lxx = 0;
$lyn = 1000000;
$lyx = 0;
$linebuf = "";
foreach $word (@words) {
$xmin = 1000000;
$xmax = 0;
$ymin = 1000000;
$ymax = 0;
$w = "";
for ($i = 0; $i < length($word); $i ++) {
$c = substr($word, $i, 1);
do {
$cline = <BOX>;
} while (substr($cline, 0, 1) ne $c);
($xn, $yn, $xx, $yx) = substr($cline, 2) =~ /\S+/g;
$w = $w . '\\' if $c eq '"';
$w = $w . '\\' if $c eq '\\';
$w = $w . substr($cline, 0, 1);
$xmin = $xn if ($xmin > $xn);
$xmax = $xx if ($xmax < $xx);
$ymin = $yn if ($ymin > $yn);
$ymax = $yx if ($ymax < $yx);
}
$wline = '(word ' . $xmin . ' ' . $ymin . ' ' . $xmax . ' ' . $ymax . ' "' . $w . '")';
$linebuf = $linebuf . "\n  " . $wline;
$lxn = $xmin if ($lxn > $xmin);
$lxx = $xmax if ($lxx < $xmax);
$lyn = $ymin if ($lyn > $ymin);
$lyx = $ymax if ($lyx < $ymax);
}
$pagebuf = $pagebuf . "\n (line $xmin $ymin $xmax $ymax" . $linebuf . ')';
$pxn = $lxn if ($pxn > $lxn);
$pxx = $lxx if ($pxx < $lxx);
$pyn = $lyn if ($pyn > $lyn);
$pyx = $lyx if ($pyx < $lyx);
}
close BOX;
close TXT;
binmode(STDOUT, ":utf8");
print "(page $pxn $pyn $pxx $pyx", $pagebuf, ')', "\n";
EOL
which generates "cake_text.txt" in the accordant format. The hidden text can be saved back to the djvu file with
djvused -e "select ${page};set-txt \"cake_text.txt\";save" cake.djvu
We just need to repeat this for all the desired pages.

Sunday, May 4, 2008

Information overflow

The growing world of connected is increasing all we have available to take in. It's bound to exceed ones capacity without some form of aids. But, in raw, where is the limit? And, how are we going to cook it? In person, where is the limit? And, how are we going to team up? Any tool building helps?

Thursday, April 24, 2008

Competition

Life is like a game. Those who enjoy the most win.

Wednesday, April 23, 2008

Live like what and when?

People have said things like: "Live like you'll die tomorrow." Maybe, this is for just so that we don't put off things that's really important. Well, whenever and whatever we are doing, we better pause when we can and think: Is there a better thing to do than this?