CPRA Request For hosting E911 Realtime Data On data.smcgov.org
Technical Issues

Robert Harker, Open Data Evangelist
September 11th, 2015

===== E911 call data =====

E911 calls are 911 emergency calls not routed to law enforcement

The county already publishes realtime E911 call data to the web at:

Data published includes:
  Incident type
  Incident number
    Medical aid calls are anonymised to the street level
    Other aid calls include exact street addresses
  Responding organization
  Unit responding(?)
  Other information field (multiple units responding?)
This information is published to the public Internet using a contracted service provided by San Mateo Regional Network, Inc.:
They also proved the actual E911 call center software as manged software as a service, SaaS.

===== Data pipeline =====

Standard Jenkins build job:
  Data submitted -> archive - pre-process/validate -> upload to Socrata ->
      -> Verify data by downloading -> archive download results

===== Proposed Realtime Data Feed Solution =====

Need for County relay between data source and Socrata

===== E911 Realtime Call Feed =====

The firedispach site receives its data as a ??? feed from the central E911 call center software.

The data feed is:
  Data stream:
  Data source:
  Data destination:
  Protocol used:
  Format of data stream:

===== Data structure Of firedispach Web Site =====

What is the name of the table(s) that contains the data:
Are any of them small static translation tables for data in columns:
   Fire station ID number to fire station name, unit type ID to unit type name

What are the column names for the data that is published on firedispatch.com:
Type of incident:
Incident number:
Equipment dispatched:
Department responding:
Other data: looks like additional units dispatched:
X,Y location: can be inferred from map location of a marker

Does this make sense?  Overkill?  Under kill?

===== Data Upgrade Request =====

If possible anonymised street addresses should be upgraded to anonymised block level.

===== Data that is published on firedispatch =====

Here is what I found so far:
Date,Time,Response time(?),Incident Number,Fire Department,
    Incident Type,Street Address,City,"Units Dispatched"
1/1/2015,12:00:00 AM,(25 min),CCF150010001,Central County Fire,
    Medical aid,LAKEVIEW DR,HIL,"E33"
1/1/2015,1:31:11 AM,(12 min),SMF150010004,San Mateo Fire,
    Full assignment response,2036 HARDING AV,"SMO,BC5,E21,E24,E26,PT21"
1/1/2015,1:51:13 AM,(28 min),MNF150010006,Menlo Park Fire,
    Fire alarm - smoke detector,430 E O KEEFE ST,EPA,"E1"

Location to the block level location for Medical Aid is published as data point s on the realtime map.
The text address is only published to the street level.
  Not very useful for a map

===== Source Database Information Security =====

Basic philosophy:
Make all db queries read only of a minimal set of columns from an unprivileged db account.  The DB admin controls access.

To limit the data that my SQLtoREST program can access:
Create a new unprivileged read-only Microsoft SQL sever account.
Explicitly grant read-only access to exactly the columns in the table(s) that are already published.
Perform any address obfuscation such a street level or block level addresses on the SQL server side before the data is returned to my SQLtoREST program.
  Could this be a stored procedure on the server?
Limit on the server side the age of records to less than 14 day old.
For additional safety use a TLS or ssh authenticated (shared public keys) tunnel to only allow access from a County designated data translation host.

===== Previous year of historical E911 call data =====

The previous year of E911 call data is requested to provide a useful dataset for the County and the public to experiment (play) with.

Additionally metadata (information) about how the request was satisfied will be included.

Fulfillment metadata:
  Name of dataset
  Description of dataset
  Date generated
  Internal dataset name
  SQL query or script used to generate the data
  Employee/title legally responsible for the data
  Employee/title actually generating the data
  Any additional information about the dataset the department thinks is useful