This document provides comprehensive information about the MaStr data source, processing pipeline, and data management in the MaStr Visualizer project.
Source: Marktstammdatenregister (MaStR)
Authority: German Federal Network Agency (Bundesnetzagentur)
Coverage: Complete German energy market infrastructure data
Update Frequency: Regular updates from official source
License: German open data policies - free for public use
The MaStR database contains comprehensive information about:
- Wind Energy: Onshore and offshore wind turbines
- Solar Energy: Rooftop and ground-mounted solar installations
- Energy Storage: Battery and other storage systems
- Biomass: Biogas and biomass power plants
- Hydro: Hydroelectric power plants
- Combustion: Fossil fuel power plants
- Nuclear: Nuclear power plants
- Primary Format: XML downloads from official source
- Processed Format: Structured PostgreSQL tables with spatial data
- Spatial Reference: WGS84 (EPSG:4326)
- Coordinate System: Geographic coordinates (latitude/longitude)
The mastr_lite library is a customized version of the open-mastr Python package, specifically adapted for this visualization project.
- Streamlined Processing: Optimized for visualization use cases
- Spatial Data Focus: Enhanced spatial data handling and indexing
- Performance Optimization: Bulk operations and efficient data loading
- Custom Schema: Tailored database schema for mapping and analytics
Location: backend/mastr_lite/utils/download_mastr.py
Purpose: Downloads XML data from official MaStR source
Features:
- Progress Tracking: Real-time download progress
- Error Handling: Robust error handling and retry logic
- File Management: Automatic file organization and cleanup
- Checksum Verification: Data integrity verification
Usage:
from mastr_lite.utils.download_mastr import MaStrDownloader
# Download latest data
downloader = MaStrDownloader(output_dir="/path/to/downloads")
zip_file = downloader.download_latest()Location: backend/mastr_lite/main.py
Purpose: Processes XML data into structured format
Features:
- XML Parsing: Efficient XML parsing with lxml
- Data Validation: Comprehensive data validation
- Cleansing: Data quality improvement and standardization
- Bulk Loading: High-performance database loading
Processing Pipeline:
- Extract: Parse XML and extract relevant data
- Transform: Cleanse, validate, and transform data
- Load: Bulk insert into PostgreSQL with spatial data
Usage:
from mastr_lite import MaStrProcessor
from mastr_lite.utils import DBConfig
# Configure database
db_config = DBConfig(
DB_HOST="localhost",
DB_PORT=5432,
DB_NAME="mastr_db",
DB_USER="postgres",
DB_PASSWORD="password",
DB_SCHEMA="public"
)
# Process data
processor = MaStrProcessor(db_config=db_config)
processor.process_zip(
zip_file_path="/path/to/mastr_data.zip",
bulk_cleansing=True,
data=['wind', 'solar'] # Process specific data types
)Location: backend/mastr_lite/utils/db.py
Purpose: Manages database connections and operations
Features:
- Connection Management: Efficient database connection pooling
- PostGIS Setup: Automatic PostGIS extension and spatial setup
- Index Creation: Automatic spatial and performance index creation
- Bulk Operations: Optimized bulk insert and update operations
Usage:
from mastr_lite.utils.db import DBHelper
# Initialize database helper
db_helper = DBHelper(db_config=db_config)
# Enable PostGIS
postgis_enabled = db_helper.enable_postgis()
# Create spatial indexes
if postgis_enabled:
db_helper.create_geometry_indexes(srid=4326)from mastr_lite import DBConfig, MaStrDownloader, MaStrProcessor, DBHelper
# Configure database connection
db_config = DBConfig(
DB_HOST="db",
DB_PORT=5432,
DB_NAME="mastr_db",
DB_USER="postgres",
DB_PASSWORD="1234",
DB_SCHEMA="public"
)# Download latest MaStR data
downloader = MaStrDownloader(output_dir="./downloads")
zip_file = downloader.download_latest()# Process the downloaded data
processor = MaStrProcessor(db_config=db_config)
processor.process_zip(
zip_file_path=str(zip_file),
bulk_cleansing=True,
data=['wind', 'solar', 'storage'] # Specify data types to process
)# Set up spatial database features
db_helper = DBHelper(db_config=db_config)
postgis_enabled = db_helper.enable_postgis()
if postgis_enabled:
db_helper.create_geometry_indexes(srid=4326)
else:
logger.warning("Spatial features disabled (PostGIS not available).")Each energy unit type has its own extended table:
wind_extended- Wind energy installationssolar_extended- Solar energy installationsstorage_extended- Energy storage systemsbiomass_extended- Biomass power plantshydro_extended- Hydroelectric power plantscombustion_extended- Combustion power plantsnuclear_extended- Nuclear power plants
EinheitMastrNummer- Unique identifierNameStromerzeugungseinheit- Unit nameBruttoleistung- Capacity in kWBundesland- Federal stateEinheitBetriebsstatus- Operational statusgeom- Spatial geometry (Point)
Wind Energy:
Hersteller- ManufacturerWindAnLandOderAufSee- Location type (Land/Sea)Nabenhoehe- Hub heightRotordurchmesser- Rotor diameter
Solar Energy:
ArtDerSolaranlage- Solar installation typeLage- Location typeModulflaeche- Module areaWechselrichteranzahl- Inverter count
Storage:
Batterietechnologie- Battery technologyEinsatzort- Deployment locationNennleistung- Rated powerNennenergie- Rated energy
All tables include spatial geometry columns:
- SRID: 4326 (WGS84)
- Geometry Type: Point
- Coordinate Order: Longitude, Latitude
- Index: GIST spatial index for performance
-- Automatic spatial index creation
CREATE INDEX idx_wind_extended_geom ON wind_extended USING GIST (geom);-- Common filter columns
CREATE INDEX idx_wind_extended_bundesland ON wind_extended ("Bundesland");
CREATE INDEX idx_wind_extended_status ON wind_extended ("EinheitBetriebsstatus");
CREATE INDEX idx_wind_extended_hersteller ON wind_extended ("Hersteller");- Coordinate Validation: Ensure valid latitude/longitude ranges
- Capacity Validation: Non-negative capacity values
- Status Validation: Valid operational status codes
- Identifier Uniqueness: Unique MaStR numbers
- Spatial Validity: Valid geometry objects
- Null Value Handling: Replace nulls with appropriate defaults
- Duplicate Removal: Remove duplicate entries
- Format Standardization: Standardize text formats
- Range Validation: Validate numeric ranges
- Row Count Verification: Compare with expected counts
- Spatial Coverage: Verify geographic coverage
- Data Completeness: Check for missing critical fields
- Consistency Checks: Cross-table consistency validation
# Process data in batches for memory efficiency
batch_size = 10000
for batch in data_batches:
db_helper.bulk_insert(table_name, batch)- Streaming Processing: Process data in chunks
- Temporary Storage: Use temporary files for large datasets
- Resource Cleanup: Proper cleanup of temporary resources
-- Efficient spatial queries using tile envelope
SELECT * FROM wind_extended
WHERE geom && ST_Transform(ST_TileEnvelope(10, 546, 350), 4326)- GIST Indexes: For spatial queries
- B-tree Indexes: For categorical filtering
- Composite Indexes: For multi-column queries
def update_mastr_data():
"""Automated MaStR data update process."""
# 1. Download latest data
zip_file = MaStrDownloader().download_latest()
# 2. Process new data
processor = MaStrProcessor(db_config)
processor.process_zip(zip_file_path=str(zip_file))
# 3. Update indexes
db_helper = DBHelper(db_config)
db_helper.create_geometry_indexes(srid=4326)
# 4. Update statistics
db_helper.update_table_statistics()- Recommended: Monthly updates
- Minimum: Quarterly updates
- Real-time: Not supported (batch processing only)
- Versioning: Keep previous versions for comparison
- Archiving: Archive old data to separate storage
- Cleanup: Remove obsolete temporary files
# Database backup
pg_dump -U postgres mastr_db > backup_$(date +%Y%m%d).sql
# Data file backup
tar -czf data_backup_$(date +%Y%m%d).tar.gz /path/to/data/files# Check internet connection
ping www.marktstammdatenregister.de
# Verify download directory permissions
ls -la /path/to/downloads
# Check available disk space
df -h# Check XML file integrity
file /path/to/mastr_data.zip
# Verify database connection
psql -U postgres -d mastr_db -c "SELECT version();"
# Check processing logs
tail -f /path/to/processing.log# Monitor database performance
SELECT query, mean_time, calls FROM pg_stat_statements ORDER BY mean_time DESC;
# Check index usage
SELECT indexname, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes
WHERE tablename LIKE '%extended%';- Coordinate Issues: Check for null or invalid coordinates
- Capacity Issues: Verify capacity values are reasonable
- Status Issues: Ensure operational status is valid
- Duplicate Entries: Check for duplicate MaStR numbers
- Spatial Issues: Verify geometry validity
- Format Issues: Standardize text formats
import pandas as pd
# Export to CSV
query = "SELECT * FROM wind_extended LIMIT 1000"
df = pd.read_sql(query, connection)
df.to_csv('wind_data.csv', index=False)# Export spatial data as GeoJSON
query = """
SELECT jsonb_build_object(
'type', 'FeatureCollection',
'features', jsonb_agg(features.feature)
) FROM (
SELECT jsonb_build_object(
'type', 'Feature',
'geometry', ST_AsGeoJSON(geom)::jsonb,
'properties', to_jsonb(t) - 'geom'
) AS feature
FROM wind_extended t
) features;
"""- Vector Tiles:
/api/tiles/{unit_type}/{z}/{x}/{y} - Analytics:
/api/stats/advanced/{unit_type} - Metadata:
/api/metadata/{unit_type}
- Real-time Queries: For interactive applications
- Batch Processing: For data analysis
- Spatial Analysis: For GIS applications
This data processing guide provides comprehensive information for understanding, managing, and maintaining the MaStr data in the visualization system.