| Title: | Explore Socrata Data with Ease |
|---|---|
| Description: | Provides an interface to search, read, query, and retrieve metadata for datasets hosted on 'Socrata' open data portals. Supports all 'Socrata' data types, including spatial data returned as 'sf' objects. |
| Authors: | Ryan Zomorrodi [aut, cre] |
| Maintainer: | Ryan Zomorrodi <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 0.1.1.9001 |
| Built: | 2026-05-29 19:34:44 UTC |
| Source: | https://github.com/ryanzomorrodi/socratadata |
Provides access to the Socrata Discovery API, allowing you to search tens of thousands of government datasets and assets published on the Socrata platform. Governments at all levels publish data on topics including crime, permits, finance, healthcare, research, and performance.
soc_discover( attribution = NULL, categories = NULL, domain_category = NULL, domains = NULL, ids = NULL, names = NULL, only = "dataset", provenance = NULL, query = NULL, tags = NULL, domain_tags = NULL, location = "us", limit = 10000 )soc_discover( attribution = NULL, categories = NULL, domain_category = NULL, domains = NULL, ids = NULL, names = NULL, only = "dataset", provenance = NULL, query = NULL, tags = NULL, domain_tags = NULL, location = "us", limit = 10000 )
attribution |
string; Filter by the attribution or publisher |
categories |
character vector; Filter by categories. |
domain_category |
string; Filter by domain category (requires a specified domain). |
domains |
character vector; Filter to domains. |
ids |
character vector; Filter by an asset IDs. |
names |
character vector; Filter by asset names. |
only |
character vector; Filter to specific asset types. Must be one or more of: |
provenance |
string; Filter by provenance: |
query |
character string; Filter using a a token matching one from an asset's name, description, category, tags, column names, column fieldnames, column descriptions or attribution. |
tags |
character vector; Filter by tags associated with the assets. |
domain_tags |
string; Filter by domain tags associated with the assets (requires a specified domain). |
location |
string; Regional API domain: |
limit |
whole number; Maximum number of results (cannot exceed 10,000). |
A tibble containing metadata for each discovered asset. Columns include:
Asset identifier (four-by-four ID).
Asset parent identifiers.
Asset name.
Attribution or publisher of the asset.
Link to attribution.
Email to contact asset owner.
Type of resource: api, calendar, chart, dataset, federated_href, file, filter, form, href, link, map, measure, story, visualization.
Owner:
Owner ID.
Display name of owner.
Creator:
Creator ID.
Display name of creator.
Provenance of asset (official or community).
Textual description of the asset.
Date asset was created.
Date asset was last updated.
Date asset was published (if published).
Date asset data was last updated
Date asset metadata was last updated
Category labels assigned to the asset.
Tags associated with the asset.
Category label assigned by the domain.
Tags applied by the domain.
Metadata associated with the asset assigned by the domain.
A dataframe with the following columns:
Names of asset columns.
Labels of asset columns.
Description of asset columns.
Datatypes of asset columns.
Permanent URL where the asset can be accessed.
Direct asset link.
Domain of the asset.
License associated with the asset.
Page views in the last week.
Page views in the last month.
Total page views.
Total number of downloads.
https://dev.socrata.com/docs/other/discovery
# Search for crime-related datasets in the Public Safety category results <- soc_discover( query = "crime", categories = "Public Safety", only = "dataset" )# Search for crime-related datasets in the Public Safety category results <- soc_discover( query = "crime", categories = "Public Safety", only = "dataset" )
Retrieves metadata attributes from a tibble returned by soc_read() or using the dataset url, including
dataset-level information and column-level descriptions.
soc_metadata(dataset)soc_metadata(dataset)
dataset |
A tibble returned by |
This function pulls out descriptive metadata such as the dataset's ID, title, attribution, category, creation and update timestamps, description, any domain-specific fields, and field descriptions defined by the data provider.
An object of class soc_meta, which includes:
Asset identifier (four-by-four ID).
Asset name.
Attribution or publisher of the asset.
Link to attribution.
Type of resource: api, calendar, chart, dataset, federated_href, file, filter, form, href, link, map, measure, story, visualization.
Owner:
Owner ID.
Display name of owner.
Provenance of asset (official or community).
Textual description of the asset.
Date asset was created.
Date asset was published (if published).
Date asset data was last updated
Date asset metadata was last updated
Category label assigned by the domain.
Tags applied by the domain.
Metadata associated with the asset assigned by the domain.
A dataframe with the following columns:
Names of asset columns.
Labels of asset columns.
Description of asset columns.
Datatypes of asset columns.
Permanent URL where the asset can be accessed.
License associated with the asset.
url <- "https://soda.demo.socrata.com/dataset/USGS-Earthquakes-2012-11-08/3wfw-mdbc/" data <- soc_read(url, soc_query(limit = 1000L)) metadata <- soc_metadata(data) print(metadata) metadata <- soc_metadata(url) print(metadata)url <- "https://soda.demo.socrata.com/dataset/USGS-Earthquakes-2012-11-08/3wfw-mdbc/" data <- soc_read(url, soc_query(limit = 1000L)) metadata <- soc_metadata(data) print(metadata) metadata <- soc_metadata(url) print(metadata)
Constructs a structured representation of a Socrata Query Language (SOQL) query that can be used with Socrata API endpoints. This function does not execute the query; it creates an object that can be passed to request functions or printed for inspection.
soc_query( select = "*", where = NULL, group_by = NULL, having = NULL, order_by = NULL, limit = NULL )soc_query( select = "*", where = NULL, group_by = NULL, having = NULL, order_by = NULL, limit = NULL )
select |
string; Columns to retrieve. |
where |
string; Filter conditions. |
group_by |
string; Fields to group by. |
having |
string; Conditions to apply to grouped records. |
order_by |
string; Sort order. |
limit |
whole number; The maximum number of records to return. |
An object of class soc_query, which prints in a readable format and can be used to build query URLs.
Use this with a function that executes Socrata requests, e.g., soc_read(url, query = soc_query(...))
query <- soc_query( select = "region, avg(magnitude) as avg_magnitude, count(*) as count", group_by = "region", having = "count >= 5", order_by = "avg_magnitude DESC" ) print(query) earthquakes_by_region <- soc_read( "https://soda.demo.socrata.com/dataset/USGS-Earthquakes-2012-11-08/3wfw-mdbc/", query = query )query <- soc_query( select = "region, avg(magnitude) as avg_magnitude, count(*) as count", group_by = "region", having = "count >= 5", order_by = "avg_magnitude DESC" ) print(query) earthquakes_by_region <- soc_read( "https://soda.demo.socrata.com/dataset/USGS-Earthquakes-2012-11-08/3wfw-mdbc/", query = query )
Downloads and parses a dataset from a Socrata open data portal URL, returning it as a tibble or sf object.
Metadata is also returned as attributes on the returned object.
soc_read( url, query = soc_query(), alias = "label", page_size = 10000, include_synthetic_cols = TRUE, api_key_id = NULL, api_key_secret = NULL, timezone = Sys.timezone() )soc_read( url, query = soc_query(), alias = "label", page_size = 10000, include_synthetic_cols = TRUE, api_key_id = NULL, api_key_secret = NULL, timezone = Sys.timezone() )
url |
string; URL of the Socrata dataset. |
query |
string or |
alias |
string; Use of field alias values. There are three options:
|
page_size |
whole number; Maximum number of rows returned per request. |
include_synthetic_cols |
logical; Should synthetic columns be included? |
api_key_id |
string; API key ID to authenticate requests. (Can also be stored as |
api_key_secret |
string; API key secret to authenticate requests. (Can also be stored as |
timezone |
string; Timezone to set floating_timestamps to. |
A tibble with an additional soc_meta attribute storing metadata.
If the dataset contains a single non-nested geospatial field, it will be returned as an sf object.
soc_read( "https://soda.demo.socrata.com/dataset/USGS-Earthquakes-2012-11-08/3wfw-mdbc/" ) soc_read( "https://soda.demo.socrata.com/dataset/USGS-Earthquakes-2012-11-08/3wfw-mdbc/", soc_query( select = "region, avg(magnitude) as avg_magnitude, count(*) as count", group_by = "region", having = "count >= 5", order_by = "avg_magnitude DESC" ) )soc_read( "https://soda.demo.socrata.com/dataset/USGS-Earthquakes-2012-11-08/3wfw-mdbc/" ) soc_read( "https://soda.demo.socrata.com/dataset/USGS-Earthquakes-2012-11-08/3wfw-mdbc/", soc_query( select = "region, avg(magnitude) as avg_magnitude, count(*) as count", group_by = "region", having = "count >= 5", order_by = "avg_magnitude DESC" ) )