Jack Naylor | Automating paper repos

When new fields emerge, there’s often an explosion of works that become very hard to track very quickly. Particularly in engineering, robotics, computer vision and associated fields, research groups who are involved in these fields like to produce publically accessible repositories which can be contributed to by other researchers.

The Problem

A great example of this is Event-based Vision Resources Repo of the UZH/ETHZ Robotics and Perception Group. What looks to be a very well formatted repo, quickly becomes a nightmare looking at the raw markdown.

When we cite papers usually, we rely on .bib files which link to classes. These uniformly format citations and get rid of the human formatting. They’re also highly mobile and can be shared across papers. What if we could leverage this to better present these repos and clean up the backend?

The Fix

GitHub allows users access to automated scripting whenever a repo is updated via GitHub Actions. In effect, this is a server which can take some new information and update a page accordingly.

This repo demonstrates one way to automate the process. It breaks the process down in a series of steps.

Checkout Repo to the remote machine: This allows for the contents to be read and updated.
Setup pandoc: Pandoc is used to generate markdown from .bib files.
Pull a bibtex class: In this case, IEEE.
Set the bash scripts as executable.
Generate markdown from .bib files: we use a user generated csv file to decide the order of the document. Then run the following command for each .bib/.md file pair:

pandoc -t markdown_strict --citeproc --csl "./ieee.csl" "./src/$filename.md" -o "./src/gen/$filename-output.md" --bibliography "./src/$filename.bib"

Build the completed document: In order, append the frontmatter, contents and each section to the repo README.md file.
Clean up temp files like the bibtex class.
Push the new README.md back to the repo.

You can see the bash scripts under src and the corresponding GitHub actions workflow under .github/workflows/pandoc_fun.yml.

If you’d like to see it in action, we use it in this repo, tracking neural fields with robotics applications.