RSS Feed for personal personal
personal
Convert Docx to LaTeX!
Just stumbled across an interesting link that has info on converting a Microsoft Docx file into a latex file! Harri Kiiskinen over at http://pastcounts.wordpress.com/ wrote up an XSL stylesheet that can match elements in Microsofts OOXML format and print out the latex formatting.
The actual information on doing this all is located here: http://pastcounts.wordpress.com/2011/03/22/using-xsl-to-convert-docx-to-latex/
First, you need to break open the .docx file. It basically is a simple zipped archive, so an ‘unzip testdoc.docx’ should do the trick; you’ll end up with several files and sub-directories, of which only the directory called ‘word’ is necessary for this test.
Second, here’s the XSL transformation to save in a file:
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"><xsl:template match="/w:document">
\documentclass{article}
<xsl:apply-templates/>
</xsl:template><xsl:template match="w:body">
\begin{document}
<xsl:apply-templates/>
\end{document}
</xsl:template><xsl:template match="w:p">
<xsl:apply-templates/><xsl:if test="position()!=last()"><xsl:text></xsl:text></xsl:if>
</xsl:template><xsl:template match="w:r">
<xsl:if test="w:footnoteReference"><xsl:text>\footnote{</xsl:text>
<xsl:call-template name="footnote">
<xsl:with-param name="fid"><xsl:value-of select="//@w:id"/></xsl:with-param>
</xsl:call-template>
<xsl:text>}</xsl:text>
</xsl:if>
<xsl:if test="w:rPr/w:b"><xsl:text>\textbf{</xsl:text></xsl:if>
<xsl:call-template name="pastb"/>
<xsl:if test="w:rPr/w:b"><xsl:text>}</xsl:text></xsl:if>
</xsl:template><xsl:template name="pastb">
<xsl:if test="w:rPr/w:i"><xsl:text>\textit{</xsl:text></xsl:if>
<xsl:call-template name="pasti"/>
<xsl:if test="w:rPr/w:i"><xsl:text>}</xsl:text></xsl:if>
</xsl:template><xsl:template name="pasti">
<xsl:apply-templates select="w:t"/>
</xsl:template><xsl:template name="footnote">
<xsl:param name="fid"/>
<xsl:apply-templates select="document('footnotes.xml')/w:footnotes/w:footnote[@w:id=$fid]"/>
</xsl:template><xsl:template match="//w:footnote">
<xsl:apply-templates select="w:p"/>
</xsl:template></xsl:stylesheet>
You can save that in a file called docxtolatex.xsl in the ‘word’ directory. Then, in that directory, run ‘xsltproc docxtolatex.xsl document.xml’, and you’ll have your screen full of the document, in LaTeX markup.You’ll notice, that this XSLT only converts bold, italics and footnotes. But then again, that’s what I often only need to convert…
So yea..I’ll definitely use this to convert some word docs I have that I’ve been wanting to push into latex format. I also think I might do some additional research into tweaking this XSL so that *.docx files could potentially be converted to LaTeX, in their entirety!
Also — in order to successfully post a copy of the XSL stylesheet above, I found myself needing a script to safely escape all the xml entities….if you’re interested, here’s that script I just slapped together for doing this:
#!/usr/bin/env php
<?php
$handle = @fopen($argv[1], "r");
if ($handle) {
while (($buffer = fgets($handle, 4096)) !== false) {
echo htmlentities($buffer);
}
if (!feof($handle)) {
echo "Error: unexpected fgets() fail\n";
}
fclose($handle);
}
?>
Simply copy the above script into a php file, make it executable, and then run it with an input file as an argument and it’ll spit out whatever XML input you give it the encoded version of the markup.
My xmonad.hs
Seeing as I haven’t written in awhile…I figured I’d at least drop a link to my xmonad.hs file that I’ve put together for my own setup. For those of you who don’t know what XMonad is, it’s a tiling window manager. You should check it out!
I’ve been meaning to write a more meaningful post on how put together my current XMonad setup and what not. Perhaps in the near future!
Busy busy busy and new sites!
Well…I don’t have a lot of time to pour into this post unfortunately, but I have just enough time to mention that I’m still plowing through my capstone proposal so I can finally start working on my final project for my masters degree.
You can get a glance at my work on the proposal here –> http://capstone.geoffreyanderson.net/
As for the other site, I finally got my partner to start designing a better portfolio site for her work. We set up a basic sub-domain for her here –> http://cullenillustration.geoffreyanderson.net/
Check out her stuff, pretty good art!
Welcome back to RIT!
Well, RIT’s officially alive and classes are back on track and going full swing. I decided that there’s plenty of time in a day for me to do various kinds of work: working on my capstone, finishing my last online class, TA’ing Fundamentals of DBMS Architecture (and seeing the coursework I’ve created get used!), working on the databases for various other classes, and sitting in on a class or two!
Phew, it all seems like so much, yet I’m still finding that I’ve got an awful lot of free time (y’know…since I don’t have 16 hours worth of real classes..). Suffice to say, I’m excited for all the things I’ll get to work on this year, and especially getting to spend a lot more time with my partner now that we’ve moved in together at the new apartments across from campus!
So, to break down everything I’m working on this school term (and year):
- Need to (start and) finish my capstone proposal for a potential course in Business Intelligence that could be offered in the IST department at RIT
- Want an ‘A’ in Data Architecture and Management — I should probably get started on the paper we have due next week
- I need to create some more in-class exercises for the Database course I’m TA’ing so our students can get a better grasp of the material. This is also because the course format changed to a “studio style” class (students meet 3x a week for 2 hours) from it’s old lecture format (1.5 hour lecture 2x a week and a 2 hour lab 1x a week). I also need to get the new coursework to our database tutors for this year.
- I need to rebuild the virtual machine for our Information Assurance course at some point and also tweak the database for our Database Performance and Tuning class.
- Try and keep up with the homework assignments for Native Mobile Application Development (since I’m “sitting in” and not auditing/registered for the class)
Beyond that, I’ll try to snap some pictures of the changes happening around campus to share!