mahangu@wordpress:~$ cd ~/ && ls -la

mahangu@wordpress:~/blog$ cat

Data/Analytics Engineering Starter Project: Extracting Data from WordPress.org with Meltano

Data/Analytics Engineering Starter Project: Extracting Data from WordPress.org with Meltano


I sometimes get questions about how to get started with data or analytics engineering. There are a lot of great resources out there, but I wanted to create a WordPress ecosystem starter project because many people in my circles use WordPress.

This starter project demonstrates how to build a complete data pipeline using the WordPress.org API Meltano extractor, tap-wordpress-org.

The WordPress.org ecosystem contains over 60,000 plugins and 10,000 themes, generating substantial metadata around ratings, installations, etc. and this data is definitely worth poking around in if you’re into WordPress.

Architecture

The stack is straightforward:

  • Meltano orchestrates extraction from WordPress.org’s API
  • DuckDB stores the data locally for analysis
  • Jupyter notebooks handle exploration and visualization

No cloud services, no complex infrastructure. The entire thing runs on a laptop and processes the full plugin dataset in minutes.

Meltano eliminates the usual API client boilerplate – rate limiting, pagination, error handling, and response transformation. DuckDB runs analytical queries really well.

The pattern scales beyond WordPress. The same Meltano project structure works with extracftors GitHub, npm, or any other API that has a Singer tap. You can swap the extractor and keep everything else.

Implementation

The tap-wordpress-org extractor pulls plugin metadata, ratings, and installation counts through WordPress.org’s public API endpoints. DuckDB’s columnar storage makes aggregations fast even with 50k+ plugins.

The extractor supports incremental syncing using the last_updated field, so you can run daily updates without reprocessing the entire dataset. Configuration is minimal – just specify which streams you want (plugins, themes, stats) and point it at a target.

Setup takes one command: make quickstart. The sample data gives you working charts in just a few minutes.

Getting Started

Full source code is available at – https://github.com/mahangu/meltano-wordpress-org-data-starter-project – with step-by-step setup instructions. Get started with:

git clone https://github.com/mahangu/meltano-wordpress-org-data-starter-project

brew install uv (via Homebrew on Mac)

uv sync

make quickstart

% make quickstart
Installing Meltano plugins...
source ../venv/bin/activate && meltano install
2025-07-23T12:00:53.070489Z [info     ] Installing 2 plugins          
2025-07-23T12:00:53.073957Z [info     ] Installing extractor 'tap-wordpress-org'
2025-07-23T12:00:53.081207Z [info     ] Installing loader 'target-duckdb'
2025-07-23T12:00:53.092487Z [info     ] Installed loader 'target-duckdb'
2025-07-23T12:00:54.102996Z [info     ] Installed extractor 'tap-wordpress-org'
2025-07-23T12:00:54.103190Z [info     ] Installed 2/2 plugins         
✅ Plugins installed!
🔄 Creating sample data from WordPress.org API...
source ../venv/bin/activate && python create_sample_data.py
🔄 Creating sample WordPress.org data...
🧹 Clearing existing plugin data...
📥 Fetching plugin data from WordPress.org API...
📥 Fetching page 1/6 from WordPress.org API...
  ✅ Got 100 plugins from page 1
📥 Fetching page 2/6 from WordPress.org API...
  ✅ Got 100 plugins from page 2
📥 Fetching page 3/6 from WordPress.org API...
  ✅ Got 100 plugins from page 3
📥 Fetching page 4/6 from WordPress.org API...
  ✅ Got 100 plugins from page 4
📥 Fetching page 5/6 from WordPress.org API...
  ✅ Got 100 plugins from page 5
📥 Fetching page 6/6 from WordPress.org API...
  ✅ Got 100 plugins from page 6
📦 Found 600 plugins to insert...
✅ Successfully inserted 600 plugins into the database!

📊 Sample plugin data:
  - Elementor Website Builder – More Than Just a Page Builder: Rating 90.0, 7073 reviews, 10000000 active installs
  - Yoast SEO – Advanced SEO with real-time guidance and built-in AI: Rating 96.0, 27771 reviews, 10000000 active installs
  - Contact Form 7: Rating 80.0, 2134 reviews, 10000000 active installs
  - Classic Editor: Rating 98.0, 1207 reviews, 9000000 active installs
  - WooCommerce: Rating 90.0, 4560 reviews, 7000000 active installs
✅ Sample data creation complete!
📊 Checking database contents...
source ../venv/bin/activate && python check_data.py
✅ Connected to WordPress.org data database!

📊 Available tables (1):
  - plugins: 600 records

🔍 Sample data from plugins:
   Columns: ['slug', 'name', 'short_description', 'description', 'version']
   Row 1: ('elementor', 'Elementor Website Builder – More Than Just a Page Builder', 'The Elementor Webs...
   Row 2: ('wordpress-seo', 'Yoast SEO – Advanced SEO with real-time guidance and built-in AI', 'Improve...
   Row 3: ('contact-form-7', 'Contact Form 7', 'Just another contact form plugin. Simple but flexible.', '<p>C...

✅ Database exploration complete!
📓 Starting Jupyter notebook with analysis notebook...

Next Steps

The Jupyter notebook that is launched here has some idea on where you can take this project:

  1. Extract more data: Use the extraction cell above or run make extract-all in the terminal
  2. Create custom visualizations: Use matplotlib, seaborn, or plotly to create your own charts
  3. Export results: Save interesting findings to CSV or other formats
  4. Build dashboards: Create interactive dashboards using tools like Streamlit
  5. Set up automation: Use Meltano schedules to keep your data fresh

Questions / Comments

If you have any questions or comments please post below and I will do my best to respond!


← prev post
next post →

One response to “Data/Analytics Engineering Starter Project: Extracting Data from WordPress.org with Meltano”

  1. >

    […] Weerasinghe published an unusual and interesting Data/Analytics Engineering Starter Project: Extracting Data from WordPress.org with Meltano. In it, you’ll learn “how to “how to build a complete data pipeline […]

Leave a Reply

Your email address will not be published. Required fields are marked *