🔗 Live Demo: https://proto-data-market-aggregator.chrisns.workers.dev/
A Cloudflare Worker that aggregates and searches across multiple data sources including Snowflake, Databricks, ONS, Defra, Agrimetrics, AWS Marketplace, data.gov.uk, and Datarade. This service provides a unified search interface for government and commercial data marketplaces.
- Unified search interface across multiple data sources
- Real-time data aggregation with parallel processing
- Interleaved results from all sources
- Responsive HTML output with GOV.UK Design System
- Error handling and graceful degradation
- Optimized performance with concurrent API requests
- Intelligent caching strategy for improved response times
- TypeScript-based implementation with strict type checking
The service is optimized for speed and efficiency:
- All data source requests are executed in parallel using Promise.all
- No sequential waiting between API calls
- Reduced total response time
- Graceful handling of slow or failed requests
- Results are interleaved as they become available
- Consistent date formatting across all sources
The service implements an intelligent caching strategy using Cloudflare's caching:
- Aggressive caching enabled for all requests (
cacheEverything: true) - Custom cache keys based on search parameters for efficient cache hits
- Custom cache durations based on response status:
- Successful responses (200-299) are cached for 2 weeks
- Not Found responses (404) are cached for 1 second
- Server errors (500-599) are not cached
- Cache is applied consistently across all data sources
- Improves response times for repeated searches
- Reduces load on upstream APIs
- Provides resilience against temporary API outages
- Caches all content types, not just static assets
Each data source uses a unique cache key format:
- Snowflake:
snowflake-{search_term} - ONS:
ons-{search_term} - Defra:
defra-{search_term} - Agrimetrics:
agrimetrics-{search_term} - Databricks: Uses default URL-based caching
- Searches through Snowflake's data marketplace via their search API
- Returns listings with titles, descriptions, and provider information
- Direct links to data sources
- POST requests with JSON payload for advanced search capabilities
- Searches through Databricks' data marketplace
- Returns dataset listings with detailed information
- Client-side filtering by relevance to search terms
- Supports title, description, and subtitle matching
- Searches through ONS Statistical Research Service metadata
- Returns dataset information with keywords and abstracts
- Links to original data sources
- Supports advanced search parameters
- Searches through environmental data from Defra
- Returns dataset information with tags and descriptions
- Links to the Defra data platform
- Handles HTML responses with embedded JSON data
- Searches through agricultural and environmental datasets
- Returns comprehensive dataset information including:
- Dataset title and description
- Tags and categories
- Creator information
- Access URLs and download links
- Last modified dates
- Supports advanced search parameters
- Direct integration with the Agrimetrics API
- Robust error handling with graceful degradation
- Searches through Datarade's data marketplace
- Returns product information including:
- Product title and description
- Provider/author information
- Direct links to product pages
- Product body text with detailed information
- HTML response parsing with robust error handling
- Graceful handling of malformed responses
- Input validation and URL encoding
- Direct integration with Datarade's search interface
- Searches through AWS Data Exchange listings
- Returns comprehensive dataset information including:
- Dataset title and description
- Categories and badges
- Creator information
- Pricing details
- Direct links to AWS Marketplace listings
- Supports advanced filtering for Data Exchange products
- Handles zlib-compressed responses
- Robust error handling with graceful degradation
- Direct integration with AWS Marketplace Discovery API
- Custom cache configuration for improved performance
- Searches through the UK government's open data portal
- Returns comprehensive dataset information including:
- Dataset title and description
- Organization information
- License details
- Resource URLs and formats
- Last modified dates
- Direct integration with CKAN API
- Supports advanced search parameters
- Robust error handling with graceful degradation
- Custom cache configuration for improved performance
- Searches through OpenDataSoft's data platform
- Returns comprehensive dataset information including:
- Dataset title and description
- Publisher information
- Themes and keywords
- Last modified dates
- Direct links to dataset pages
- Direct integration with OpenDataSoft Explore API v2.1
- Supports advanced search parameters
- Intelligent caching with 2-week duration for successful responses
- Robust error handling with graceful degradation
- Custom cache configuration for improved performance
- UTF-8 encoding support for international queries
- Parallel execution with other data sources
The NHS Business Services Authority (NHSBSA) Open Data Portal integration provides access to healthcare-related datasets. Features include:
- Direct integration with NHSBSA's CKAN-based API
- Returns comprehensive dataset information including:
- Title and description
- Publisher details
- License information
- Direct links to dataset resources
- Last modified dates
- Supports full-text search across all NHSBSA datasets
- Intelligent caching with 2-week duration for successful responses
- Robust error handling and UTF-8 encoding support
- Parallel execution with other data sources
The Open Data Northern Ireland integration provides access to datasets from the Northern Ireland public sector. Features include:
- Direct integration with Open Data NI's CKAN-based API
- Returns comprehensive dataset information including:
- Dataset title and description
- Organization information
- License details
- Resource URLs and formats
- Last modified dates
- Supports full-text search across all Open Data NI datasets
- Intelligent caching with 2-week duration for successful responses
- Robust error handling and UTF-8 encoding support
- Parallel execution with other data sources
The SSEN (Scottish and Southern Electricity Networks) Distribution integration provides access to electricity network data. Features include:
- Direct integration with SSEN's CKAN-based API
- Returns comprehensive dataset information including:
- Dataset title and description
- Organization information
- Author details
- Tags and categories
- Resource URLs and formats
- Last modified dates
- Supports full-text search across all SSEN datasets
- Intelligent caching with 2-week duration for successful responses
- Robust error handling and UTF-8 encoding support
- Parallel execution with other data sources
- Base URL:
https://ckan-prod.sse.datopian.com/api/action/package_search - Query Parameters:
q: The search query
The London Datastore integration provides access to datasets from the Greater London Authority. Features include:
- Direct integration with London Datastore's CKAN-based API
- Returns comprehensive dataset information including:
- Dataset title and description
- Organization information
- License details
- Resource URLs and formats
- Last modified dates
- Direct integration with CKAN API
- Supports advanced search parameters
- Robust error handling with graceful degradation
- Custom cache configuration for improved performance
- Base URL:
https://data.london.gov.uk/api/action/package_search - Query Parameters:
q: The search query
The York Open Data integration provides access to datasets from the City of York Council. Features include:
- Direct integration with York Open Data's CKAN-based API
- Returns comprehensive dataset information including:
- Dataset title and description
- Organization information
- Author and maintainer details
- Groups and categories
- Resource URLs and formats
- Last modified dates
- Supports full-text search across all York Open Data datasets
- Intelligent caching with 2-week duration for successful responses
- Robust error handling and UTF-8 encoding support
- Parallel execution with other data sources
- Base URL:
https://data.yorkopendata.org/api/action/package_search - Query Parameters:
q: The search query
The Health Data Gateway integration provides access to health-related datasets from various UK health organizations. Features include:
- Direct integration with Health Data Gateway's search API
- Returns comprehensive dataset information including:
- Dataset title and description
- Publisher information
- Dataset type and subtype
- Keywords and categories
- Contact information
- Population size data
- Supports full-text search across all Health Data Gateway datasets
- Intelligent caching with 2-week duration for successful responses
- Robust error handling and UTF-8 encoding support
- Parallel execution with other data sources
- Base URL:
https://api.healthdatagateway.org/api/v1/search/datasets - Query Parameters:
view_type: miniperPage: 25page: 1sort: score:desc
- POST request with JSON payload:
{ "query": "search_term" }
All data sources are normalized to a common format:
interface ListingResult {
id: string;
title: string;
description: string;
subtitle: string;
provider: {
title: string;
description: string;
};
url: string;
source: string;
updated: string;
}The integration with AWS Marketplace uses their Discovery API with the following details:
- Base URL:
https://aws.amazon.com/marketplace/api/awsmpdiscovery - Method: POST
- Headers:
Accept: application/jsonAccept-Encoding: deflate, gzipContent-Type: application/x-amz-json-1.1X-Amz-Target: AWSMPDiscoveryService.SearchListings
- Request Body:
SearchText: The search queryMaxResults: 20 (configurable)Filters: Configured for DATA_EXCHANGE productsSort: By relevance, descending order
- Response Format:
interface AWSMarketplaceListingSummary {
Id: string;
DisplayAttributes: {
Title: string;
LongDescription: string;
};
Categories: Array<{
DisplayName: string;
}>;
ProductAttributes: {
Creator: {
DisplayName: string;
};
};
OfferSummary: {
PricingSummary: string;
};
}- Response Handling:
- Handles zlib-compressed responses
- Decompresses using Web Streams API
- Maps to common ListingResult format
- Includes pricing information
- Preserves category information
The integration with Datarade uses their search interface with the following details:
- Base URL:
https://datarade.ai/search/products - Query Parameters:
keywords: The search query
- Response Format:
interface DataradeProduct {
title: string; // Product title from h3.product-card__title
subtitle: string; // Author/provider from span.product-card__subtitle__author
link: string; // Full URL to product page
body: string; // Detailed description from div.product-card__body
}The integration with Agrimetrics uses their catalog API with the following parameters:
- Base URL:
https://app.agrimetrics.co.uk/backend/catalog/api/catalog/data-sets - Query Parameters:
exchange=agrimetrics: Specifies the data exchangetagRelationship=narrower: Defines tag relationship typeextendedText={search_term}: The search queryonlyFeatured=false: Include all resultsonlyOwned=false: Include all public datasetsshowHidden=false: Exclude hidden datasetsshowEditable=false: Exclude editable datasetsidentities=PUBLIC: Show only public datasetsoffset=0: Starting point for resultslimit=13: Maximum number of resultssort=relevance: Sort by relevance
The integration with data.gov.uk uses their CKAN API with the following details:
- Base URL:
https://ckan.publishing.service.gov.uk/api/action/package_search - Method: GET
- Query Parameters:
q: The search query
- Response Format:
interface DataGovUKDataset {
id: string;
title: string;
notes: string;
metadata_modified: string;
organization: {
title: string;
description: string;
};
license_title: string;
resources: Array<{
url: string;
format: string;
}>;
}- Response Handling:
- Maps to common ListingResult format
- Extracts organization information
- Preserves license information
- Includes resource URLs
- Maintains update timestamps
-
Basic Search:
https://proto-data-market-aggregator.chrisns.workers.dev/?search=soil -
Multi-word Search:
https://proto-data-market-aggregator.chrisns.workers.dev/?search=soil%20quality
The response is an HTML page containing interleaved results from all sources. Each result includes:
- Title
- Description
- Source information
- Provider details
- Last updated date (formatted consistently as DD/MM/YYYY HH:mm)
- Direct link to the data
- Node.js and npm
- Wrangler CLI for Cloudflare Workers
- Understanding of Cloudflare's caching mechanisms
- TypeScript knowledge
- Clone the repository
- Install dependencies:
npm install
Run the test suite:
npm testThe test suite includes:
- Unit tests for each data source integration
- Integration tests for the combined search functionality
- Error handling tests
- URL construction tests
- Response formatting tests
- Cache configuration tests
Deploy to Cloudflare Workers:
npm run deployWhen developing new features or modifying existing ones:
- Maintain the parallel request architecture
- Utilize appropriate cache settings for new API integrations
- Consider cache implications when modifying API calls
- Test performance with and without cached responses
- Be mindful of aggressive caching when implementing new endpoints
- Consider cache bypass strategies for time-sensitive data
- Use appropriate cache keys for new data sources
- Ensure consistent date formatting across all sources
- Handle API timeouts gracefully
The service implements graceful degradation:
- If any individual data source fails, other sources continue to function
- Parallel request architecture ensures a single slow API doesn't block other results
- Invalid search parameters are handled gracefully
- Network errors are caught and logged
- Empty results are handled appropriately
- Timeouts and connection issues are handled gracefully
- Caching provides resilience against temporary API failures
- All errors are logged for monitoring
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.