Storing Structured Documents in Socrata

Structured Document Lighting Talk

Or how to unlock your data from a PDF

Robert Harker's notes and ideas for converting documents normally published as PDF files into structured documents stored on an Open Data Portal.

My words and terms may be wrong, but I believe the concepts are sound. This comes from using FrameMaker to create my class/talk slides used in my over 10 years of teaching experience. It is based on what I learned back in the 90's about Standard Generalized Markup Language, SGML, and Object Identifiers, OID used in the SNMP Internet protocol.

How to store documents in a machine readable form is such an obvious problem that it must have been extensively studied. Many parts of the problem probably have well defined standards to solve them and there are probably multiple Open Source tool suites to implement the standards.

What I am seeking is someone who when I say "structured documents" their eyes light up.

Displaying A Structured Document In HTML

A mock-up of how data stored in a structured document can be displayed in HTML.

Displaying A Structured Document Table In HTML

A mock-up of how data stored in a table embedded in a structured document can be displayed in a HTML table or output as a CSV file.

Schema Outline For Structured Text Document

This is a mock-up schema definition to define how different types of content in a document can be stored in tabular form in a structured document.

structdoc.php: Generate HTML document from a structured document stored in a Socrata table

An outline of a program to retrieve sections of a structured document and render it as HTML.