I sometimes get questions about how to get started with data or analytics engineering. There are a lot of great resources out there, but I wanted to create a WordPress ecosystem starter project because many people in my circles use WordPress.
This starter project demonstrates how to build a complete data pipeline using the WordPress.org API Meltano extractor, tap-wordpress-org.
The WordPress.org ecosystem contains over 60,000 plugins and 10,000 themes, generating substantial metadata around ratings, installations, etc. and this data is definitely worth poking around in if you’re into WordPress.
Architecture
The stack is straightforward:
- Meltano orchestrates extraction from WordPress.org’s API
- DuckDB stores the data locally for analysis
- Jupyter notebooks handle exploration and visualization
No cloud services, no complex infrastructure. The entire thing runs on a laptop and processes the full plugin dataset in minutes.
Meltano eliminates the usual API client boilerplate – rate limiting, pagination, error handling, and response transformation. DuckDB runs analytical queries really well.
The pattern scales beyond WordPress. The same Meltano project structure works with extracftors GitHub, npm, or any other API that has a Singer tap. You can swap the extractor and keep everything else.
Implementation
The tap-wordpress-org extractor pulls plugin metadata, ratings, and installation counts through WordPress.org’s public API endpoints. DuckDB’s columnar storage makes aggregations fast even with 50k+ plugins.
The extractor supports incremental syncing using the last_updated field, so you can run daily updates without reprocessing the entire dataset. Configuration is minimal – just specify which streams you want (plugins, themes, stats) and point it at a target.
Setup takes one command: make quickstart. The sample data gives you working charts in just a few minutes.
Getting Started
Full source code is available at – https://github.com/mahangu/meltano-wordpress-org-data-starter-project – with step-by-step setup instructions. Get started with:
git clone https://github.com/mahangu/meltano-wordpress-org-data-starter-project
brew install uv (via Homebrew on Mac)
uv sync
make quickstart
% make quickstart
Installing Meltano plugins...
source ../venv/bin/activate && meltano install
2025-07-23T12:00:53.070489Z [info ] Installing 2 plugins
2025-07-23T12:00:53.073957Z [info ] Installing extractor 'tap-wordpress-org'
2025-07-23T12:00:53.081207Z [info ] Installing loader 'target-duckdb'
2025-07-23T12:00:53.092487Z [info ] Installed loader 'target-duckdb'
2025-07-23T12:00:54.102996Z [info ] Installed extractor 'tap-wordpress-org'
2025-07-23T12:00:54.103190Z [info ] Installed 2/2 plugins
✅ Plugins installed!
🔄 Creating sample data from WordPress.org API...
source ../venv/bin/activate && python create_sample_data.py
🔄 Creating sample WordPress.org data...
🧹 Clearing existing plugin data...
📥 Fetching plugin data from WordPress.org API...
📥 Fetching page 1/6 from WordPress.org API...
✅ Got 100 plugins from page 1
📥 Fetching page 2/6 from WordPress.org API...
✅ Got 100 plugins from page 2
📥 Fetching page 3/6 from WordPress.org API...
✅ Got 100 plugins from page 3
📥 Fetching page 4/6 from WordPress.org API...
✅ Got 100 plugins from page 4
📥 Fetching page 5/6 from WordPress.org API...
✅ Got 100 plugins from page 5
📥 Fetching page 6/6 from WordPress.org API...
✅ Got 100 plugins from page 6
📦 Found 600 plugins to insert...
✅ Successfully inserted 600 plugins into the database!
📊 Sample plugin data:
- Elementor Website Builder – More Than Just a Page Builder: Rating 90.0, 7073 reviews, 10000000 active installs
- Yoast SEO – Advanced SEO with real-time guidance and built-in AI: Rating 96.0, 27771 reviews, 10000000 active installs
- Contact Form 7: Rating 80.0, 2134 reviews, 10000000 active installs
- Classic Editor: Rating 98.0, 1207 reviews, 9000000 active installs
- WooCommerce: Rating 90.0, 4560 reviews, 7000000 active installs
✅ Sample data creation complete!
📊 Checking database contents...
source ../venv/bin/activate && python check_data.py
✅ Connected to WordPress.org data database!
📊 Available tables (1):
- plugins: 600 records
🔍 Sample data from plugins:
Columns: ['slug', 'name', 'short_description', 'description', 'version']
Row 1: ('elementor', 'Elementor Website Builder – More Than Just a Page Builder', 'The Elementor Webs...
Row 2: ('wordpress-seo', 'Yoast SEO – Advanced SEO with real-time guidance and built-in AI', 'Improve...
Row 3: ('contact-form-7', 'Contact Form 7', 'Just another contact form plugin. Simple but flexible.', '<p>C...
✅ Database exploration complete!
📓 Starting Jupyter notebook with analysis notebook...
Next Steps
The Jupyter notebook that is launched here has some idea on where you can take this project:
- Extract more data: Use the extraction cell above or run
make extract-allin the terminal- Create custom visualizations: Use matplotlib, seaborn, or plotly to create your own charts
- Export results: Save interesting findings to CSV or other formats
- Build dashboards: Create interactive dashboards using tools like Streamlit
- Set up automation: Use Meltano schedules to keep your data fresh
Questions / Comments
If you have any questions or comments please post below and I will do my best to respond!

Leave a Reply