- #DBHangOps
- amazon
- aws
- aws ec2
- aws s3
- bash
- cacti
- capstone
- computing
- database
- data_corruption
- dba
- docx
- ganglia
- halloween
- HangOps
- hockey
- holiday
- homework
- IST
- latex
- linux
- memorial day
- Misc
- monitoring
- mysql
- mysql_planet
- nagios
- open source
- opentsdb
- personal
- php
- posse
- programming
- query_killing
- RIT
- rit posse
- sugar
- teachingopensource
- teaching open source
- tiling window manager
- window manager
- work
- xml
- xmonad
- xsl
Currently browsing 'programming'
Deploy image files to Amazon Web Services
Just pushed a copy of the script I’ve been working on for a couple weeks to github. The goal of the script is to push an image file containing an operating system to Amazon’s Web Services Elastic Compute Cloud so you can run the image through Amazon’s service. A lot of the development of this was based around existing documentation and blogs written by many other posters, so I finally synthesized all this information into a “simple” BASH shell script that automates the process. Take a look at it at the following link!
https://github.com/geoffreyanderson/linuxImage2AWS-EBS
I’ll be sure to beef this entry up a little more about what information I referenced to develop the script and probably more on its use!
XSL to extract DOCX comments into plain text
So..this was an impromptu project I slapped together in about 20 minutes to extract comments out of a DOCX file. I ended up doing this because I stored answers to lab questions as comments in a DOCX and one of the graders I work with needed the comments in plain text….so I recalled the XSL for converting DOCX to LaTeX from my last post and wrote up a new stylesheet to extract comments. Hereeee it is!
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xd="http://www.oxygenxml.com/ns/doc/xsl" version="1.0"
xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
<xd:doc scope="stylesheet">
<xd:desc>
<xd:p><xd:b>Created on:</xd:b> Apr 25, 2011</xd:p>
<xd:p><xd:b>Author:</xd:b> Geoffrey Anderson</xd:p>
<xd:p><xd:b>E-mail:</xd:b> geoff@geoffreyanderson.net</xd:p>
<xd:p><xd:b>Website:</xd:b> http://geoffreyanderson.net</xd:p>
</xd:desc>
</xd:doc>
<xsl:variable name="newline">
<xsl:text>
</xsl:text>
</xsl:variable>
<xsl:template match="/">
<xsl:for-each select="/w:comments/w:comment">
################
# Comment #<xsl:number value="position()" format="1"/> #
################
<xsl:for-each select="w:p">
<xsl:value-of select="$newline" />
<xsl:for-each select="w:r">
<xsl:value-of select="w:t"/>
</xsl:for-each>
</xsl:for-each>
----
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
The bad indenting is intentional so that you get output without weird tabbing/formatting. To use this (under Ubuntu, at least) simply unzip the DOCX file:
$ unzip someWordDoc.docx -d someWordDocDir/
And run the above XSL against the comments.xml file under the “word” directory:
$ xsltproc convertDocxCommentsToPlainText.xsl someWordDocDir/word/comments.xml
By doing this, you’ll get output similar to the following:
################
# Comment #1 #
################
text of the first comment
----
################
# Comment #2 #
################
text of the second comment
----
Cheers!
Convert Docx to LaTeX!
Just stumbled across an interesting link that has info on converting a Microsoft Docx file into a latex file! Harri Kiiskinen over at http://pastcounts.wordpress.com/ wrote up an XSL stylesheet that can match elements in Microsofts OOXML format and print out the latex formatting.
The actual information on doing this all is located here: http://pastcounts.wordpress.com/2011/03/22/using-xsl-to-convert-docx-to-latex/
First, you need to break open the .docx file. It basically is a simple zipped archive, so an ‘unzip testdoc.docx’ should do the trick; you’ll end up with several files and sub-directories, of which only the directory called ‘word’ is necessary for this test.
Second, here’s the XSL transformation to save in a file:
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"><xsl:template match="/w:document">
\documentclass{article}
<xsl:apply-templates/>
</xsl:template><xsl:template match="w:body">
\begin{document}
<xsl:apply-templates/>
\end{document}
</xsl:template><xsl:template match="w:p">
<xsl:apply-templates/><xsl:if test="position()!=last()"><xsl:text></xsl:text></xsl:if>
</xsl:template><xsl:template match="w:r">
<xsl:if test="w:footnoteReference"><xsl:text>\footnote{</xsl:text>
<xsl:call-template name="footnote">
<xsl:with-param name="fid"><xsl:value-of select="//@w:id"/></xsl:with-param>
</xsl:call-template>
<xsl:text>}</xsl:text>
</xsl:if>
<xsl:if test="w:rPr/w:b"><xsl:text>\textbf{</xsl:text></xsl:if>
<xsl:call-template name="pastb"/>
<xsl:if test="w:rPr/w:b"><xsl:text>}</xsl:text></xsl:if>
</xsl:template><xsl:template name="pastb">
<xsl:if test="w:rPr/w:i"><xsl:text>\textit{</xsl:text></xsl:if>
<xsl:call-template name="pasti"/>
<xsl:if test="w:rPr/w:i"><xsl:text>}</xsl:text></xsl:if>
</xsl:template><xsl:template name="pasti">
<xsl:apply-templates select="w:t"/>
</xsl:template><xsl:template name="footnote">
<xsl:param name="fid"/>
<xsl:apply-templates select="document('footnotes.xml')/w:footnotes/w:footnote[@w:id=$fid]"/>
</xsl:template><xsl:template match="//w:footnote">
<xsl:apply-templates select="w:p"/>
</xsl:template></xsl:stylesheet>
You can save that in a file called docxtolatex.xsl in the ‘word’ directory. Then, in that directory, run ‘xsltproc docxtolatex.xsl document.xml’, and you’ll have your screen full of the document, in LaTeX markup.You’ll notice, that this XSLT only converts bold, italics and footnotes. But then again, that’s what I often only need to convert…
So yea..I’ll definitely use this to convert some word docs I have that I’ve been wanting to push into latex format. I also think I might do some additional research into tweaking this XSL so that *.docx files could potentially be converted to LaTeX, in their entirety!
Also — in order to successfully post a copy of the XSL stylesheet above, I found myself needing a script to safely escape all the xml entities….if you’re interested, here’s that script I just slapped together for doing this:
#!/usr/bin/env php
<?php
$handle = @fopen($argv[1], "r");
if ($handle) {
while (($buffer = fgets($handle, 4096)) !== false) {
echo htmlentities($buffer);
}
if (!feof($handle)) {
echo "Error: unexpected fgets() fail\n";
}
fclose($handle);
}
?>
Simply copy the above script into a php file, make it executable, and then run it with an input file as an argument and it’ll spit out whatever XML input you give it the encoded version of the markup.