Wednesday, December 24, 2008
fox2html: foxmarks.json to HTML converter
This script converts "foxmarks.json" saved by the Firefox plugin, Foxmarks, on a self hosted server to a styled HTML file with interactive folders. An example output is here.
Thursday, September 25, 2008
Duplication elimination
In engineering a computation project, one desires a single point of information entry. It's elegant to come up with coding structures that will facilitate this requirement, that is, a single place in the codes pertinent to the to-be-computed problems. However, while the programming language is an interface between human and machine. It's not generally friendly to the users. The ultimate way of choice is to have human readable/editable files and make programs to generate corresponding codes when needed.
Thursday, August 14, 2008
Sunday, June 1, 2008
adding OCR to djvu file
For each page in the file "cake.djvu", we can use the "tesseract" to process the page image:
djvused -e "select ${page};save-page-with \"cake_page.djvu\"" cake.djvuThis produces the information for the text structure (lines and words) and positioning (coordinate for each character). To convert this information to the hidden-text format for use with djvused, use
convert cake_page.djvu cake.tif
tesseract cake.tif cake_box batch.nochop makebox
tesseract cake.tif cake_txt batch.nochop
which generates "cake_text.txt" in the accordant format. The hidden text can be saved back to the djvu file withperl<<'EOL'>cake_text.txt open TXT, "<:utf8", "cake_txt.txt"; open BOX, "<:utf8", "cake_box.txt"; $pxn = 1000000; $pxx = 0; $pyn = 1000000; $pyx = 0; $pagebuf = ""; while ($line = <TXT>) { chop $line; @words = split /\s+/, $line; next if $#words < 0; $lxn = 1000000; $lxx = 0; $lyn = 1000000; $lyx = 0; $linebuf = ""; foreach $word (@words) { $xmin = 1000000; $xmax = 0; $ymin = 1000000; $ymax = 0; $w = ""; for ($i = 0; $i < length($word); $i ++) { $c = substr($word, $i, 1); do { $cline = <BOX>; } while (substr($cline, 0, 1) ne $c); ($xn, $yn, $xx, $yx) = substr($cline, 2) =~ /\S+/g; $w = $w . '\\' if $c eq '"'; $w = $w . '\\' if $c eq '\\'; $w = $w . substr($cline, 0, 1); $xmin = $xn if ($xmin > $xn); $xmax = $xx if ($xmax < $xx); $ymin = $yn if ($ymin > $yn); $ymax = $yx if ($ymax < $yx); } $wline = '(word ' . $xmin . ' ' . $ymin . ' ' . $xmax . ' ' . $ymax . ' "' . $w . '")'; $linebuf = $linebuf . "\n " . $wline; $lxn = $xmin if ($lxn > $xmin); $lxx = $xmax if ($lxx < $xmax); $lyn = $ymin if ($lyn > $ymin); $lyx = $ymax if ($lyx < $ymax); } $pagebuf = $pagebuf . "\n (line $xmin $ymin $xmax $ymax" . $linebuf . ')'; $pxn = $lxn if ($pxn > $lxn); $pxx = $lxx if ($pxx < $lxx); $pyn = $lyn if ($pyn > $lyn); $pyx = $lyx if ($pyx < $lyx); } close BOX; close TXT; binmode(STDOUT, ":utf8"); print "(page $pxn $pyn $pxx $pyx", $pagebuf, ')', "\n"; EOL
djvused -e "select ${page};set-txt \"cake_text.txt\";save" cake.djvuWe just need to repeat this for all the desired pages.
Sunday, May 4, 2008
Information overflow
The growing world of connected is increasing all we have available to take in. It's bound to exceed ones capacity without some form of aids. But, in raw, where is the limit? And, how are we going to cook it? In person, where is the limit? And, how are we going to team up? Any tool building helps?
Thursday, April 24, 2008
Wednesday, April 23, 2008
Live like what and when?
People have said things like: "Live like you'll die tomorrow." Maybe, this is for just so that we don't put off things that's really important. Well, whenever and whatever we are doing, we better pause when we can and think: Is there a better thing to do than this?
Subscribe to:
Posts (Atom)