Structured data on the web

Build community indexes from structured data to address FAIR data goals

Build search indexes for your community or domain of interest. Focused and functional to address your specific needs. Gleaner is open source, written in Go and easy to deploy. It is one part of the GleanerIO search architecture, details below.



Gleaner is a tool for extracting JSON-LD from web pages. You provide Gleaner a list of sites to index and it will access and retrieve pages based on the sitemap.xml of the domain(s). Gleaner can then check for well formed and valid structure in documents and process the JSON-LD data graphs into a form usable to drive a search interface. It is part of the bigger picture.

connected world

Open Foundation

Communities of practice can leverage open schema ( along with web architecture approaches to build domain search portals. Enhance and extend with community vocabularies to address specific domain needs. This foundation is also leveraged by Google Data Set Search and is complementary to that service. Web architecture as foundation allows a community to provide a more detailed community experiences, while still leveraging the global reach of commercial search indexes.

Big Picture

Gleaner just is part of a tool chain to address this goal. You need storage and a way to search the graph Gleaner collects. A basic approach is described as well as alternatives people can use that are more native or familiar to them. See: The Big Picture


Where to Get Engaged

Get engaged with RDA, EarthCube and the ESIP Science on Schema group!