Table Sources
Tip
For a list of all table sources currently in the registry, see table_sources.csv on Github.
Table sources are a JSON representations of each CSV dataset within OEPS--one JSON file per CSV. The structure is based on the Tabular Data Resource from Frictionless Data. However, where a schema
property would typically define a primary key, foreign key (for joins), and a list of all fields, all of this information is inferred or standardized elsewhere and need not be stored in these files.
Characteristics of table source CSVs:
- Only have data for one geography level (state, county, tract, or zcta)
- Only have data for one year
- Named with the format
{geography}-{year}
, for example,county-2020
- Has a
HEROP_ID
column as primary key that joins each row to a geography unit. - Has column names that match (exactly) with variable names already defined in the registry.
Each table source is defined by the following attributes:
Property | Description | Comment |
---|---|---|
name |
ID of table source | Will always be in {geography}-{year} format |
title |
Human-readable title | Currently not used anywhere, and set to match name |
description |
Short description | Will always be "This CSV aggregates all OEPS data values from {year} at the {geography} level. |
path |
Path to CSV | Relative to data directory, this path will always be tables/{name}.csv , i.e. tables/{geography}-{year}.csv |
format |
Will always be csv |
|
mediatype |
Will always be text/csv |
|
year |
Year of the data in this CSV | |
bq_dataset_name |
Target dataset during BigQuery upload | To be deprecated. |
bq_table_name |
Target table during BigQuery upload | To be deprecated. |
geodata_source |
Name of geodata source this CSV will join to | Geodata source must already exist in the registry. Importantly, the year of the CSV data may not match the geodata source, as 2015 data should be joined to 2010 geographies (for example). |
Future simplification
Much of the content stored in the attributes described above can be inferred from other information, or is always the same across all table sources, so it's possible that some of these will be removed in the future.
Example table_source
{
"bq_dataset_name": "tabular",
"bq_table_name": "county-2020",
"name": "county-2020",
"path": "tables/county-2020.csv",
"format": "csv",
"mediatype": "text/csv",
"title": "county-2020",
"description": "This CSV aggregates all OEPS data values from 2020 at the county level.",
"year": "2020",
"geodata_source": "counties-2018"
}