Scav List LaTeX Info

You will probably need to do this process at least twice, once for the main list, another time for ScavOlympics, and possibly more depending on the list.  You should also be familiar with LaTeX, Python, and MySQL (or whatever database software you use).  You should know how to edit code. 

The codes you need are listed below as attachments, released under a Modified BSD License.  What does mean for you?  You can do pretty much whatever you want with them.  I expect you will need to edit the files, may not want to redistribute them, and will not want to change any other licenses you have for your software.

Here's the overview of what I (Samuel Friedman) did.

  1. Insert some additional header / footer information to the raw LaTeX
    1. Depending on the setup, you might be able to skip this step.
    2. I had to change the class from "lists" to "enumerate" to get the numbers to appear for the 2010 list.
  2. Convert the LaTeX list to ASCII. 
    1. Run the list through latex2rtf.  You can use whatever process you want, but I liked the results from this code.
    2. Open the rtf file in OpenOffice Writer or whatever else you want to use.  MS Word can work too if you want.
    3. You will need to remove by hand all the information about the rules and any other part of the list you don't want.  For the main list, we keep only the items.
    4. Save the rtf file as a text file.  You may need to play around with this step depending on your character encoding.
    5. If you use a technique other than above, you may have to modify the regular expression in the file extract_item_data.py
  3. GASH uses a MySQL database to track items.  We also track an items page numbers.  The ASCII file will most likely not contain page numbers.  We deal with page numbers first.
    1. Generate items in the database with the appropriate script, generate_item_insert.py
    2. This script should have the variable first_numbers contain the first item on each page as well as the last number.
    3. Run the script.  This script prints to the screen.  Either port the output to file or change the script to write to your desired file.
    4. Run the commands in the file.
  4. Now that the database contains the basic framework for items and page numbers, we need to populate the database with item descriptions.
    1. Take the name of the ASCII file generated from your LaTeX to ASCII step and edit the Python file so that you open your ASCII file.
    2. Run the script.  This script prints to the screen.  Either port the output to file or change the script to write to your desired file.
    3. Run the commands in the file.
  5. At this point, you should have your database containing the full item text for the part of the list you imported.  We found this worked to 90+% accuracy and took less than 20 minutes
  6. At GASH, we use a special page number and special item numbers for our ScavOlympics items.  You can simply edit the scripts to change the page/item numbers generated.
  7. We do not include at the moment a regexp for extracting the point total because the lovely Judges like saying "[x points where x = (# of hamsters - # of people)/(# of phone booths)]."  We might include later support for integer number of points.
Want to discuss Scav codes?  http://groups.google.com/group/scav-code-discuss

ċ
extract_item_data.py
(3k)
Samuel Friedman,
May 3, 2011, 8:10 AM
ċ
generate_item_insert.py
(3k)
Samuel Friedman,
May 3, 2011, 8:10 AM
Comments