As a part of an internal AI tooling exercise at Automattic, I recently co-developed a Meltano extractor with Claude Code, Anthropic’s agentic AI development assistant. You can find tap-wordpress-org on GitHub and on the Meltano Hub.
The Challenge
WordPress.org hosts over 60,000 plugins and 10,000 themes, along with valuable statistics about WordPress usage, PHP versions, and MySQL deployments across millions of websites. While this data is publicly available through various API endpoints, there wasn’t a standardised way to extract it for data pipelines and analytics workflows. I wanted to develop a solution that could:
- Extract data from multiple WordPress.org API endpoints
- Handle incremental updates for frequently changing plugin data
- Transform and normalize the data for analytics
- Integrate seamlessly with modern data stacks
Enter Claude Code and Meltano
Claude Code proved to be a great companion for this project. While I have worked on Meltano Extractors before, with Claude Code this process was not only faster, but more enjoyable. It needed very little input from me to get going.
The Development Process
I essentially just pointed Claude Code at – https://codex.wordpress.org/WordPress.org_API – and said something like:
“let’s make a Meltano Extractor for these APIs using the new Meltano SDK”
and it asked some follow up questions and got going. 🚀 It created, tested, and committed a lot of it on its own, and also helped create/troubleshoot CI/CD setup.
What impressed me most was Claude’s ability to:
- Generate the complete project structure with proper Meltano SDK patterns
- Implement all 8 different streams (plugins, themes, events, patterns, and various stats)
- Handle edge cases like HTML entity decoding and missing fields
- Add features like configurable request delays and incremental syncing
- Fix issues in real-time based on actual API responses
Key Features Implemented
The final tap-wordpress-org extractor includes:
- Eight Data Streams:
- Plugins (with incremental sync support)
- Themes
- WordPress Events
- Block Patterns
- WordPress Version Statistics
- PHP Version Statistics
- MySQL Version Statistics
- Locale Statistics
- Smart Data Handling:
- Automatic HTML entity decoding (e.g.,
&→&) - Graceful handling of missing or null fields
- Configurable request delays to respect API rate limits
- Production-Ready Features:
- Incremental replication for plugins based on last_updated timestamps
- Full Singer protocol compliance
- Comprehensive error handling
- Type-safe schema definitions
Installing and Running the Extractor
Getting started with tap-wordpress-org is straightforward. Here’s how to install Meltano and use the extractor:
Prerequisites
# Install Python 3.8 or higher
python3 --version
# Create a virtual environment
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
Install Meltano and the Tap
# Install Meltano
pip install meltano
# Install the tap directly from GitHub
pip install git+https://github.com/Automattic/tap-wordpress-org.git
Configuration
Create a config.json file to configure the extractor (see more about configuration in the Meltano docs):
{
"stream_selection": ["plugins", "themes", "wordpress_stats"],
"request_delay": 0.3,
"start_date": "2025-01-01T00:00:00Z"
}
Running the Extractor
To extract data and save it as JSONL (JSON Lines format):
# Create an output dir
mkdir output
# Run the tap and save output to a file
python -m tap_wordpress_org.tap --config config.json > output/wordpress_data.jsonl
Sample Data Output
The extractor produces clean, structured data ready for analysis. Here are some examples:
Plugin Data
{
"type": "RECORD",
"stream": "plugins",
"record": {
"name": "Hello Dolly",
"slug": "hello-dolly",
"author": "<a href=\"https://profiles.wordpress.org/matt/\">Matt Mullenweg</a>",
"author_profile": "https://profiles.wordpress.org/matt/",
"requires": "4.6",
"tested": "6.8.1",
"requires_php": false,
"rating": 60,
"num_ratings": 297,
"active_installs": 700000,
"downloaded": 0,
"last_updated": "2025-05-07 4:50pm GMT",
"added": "2008-07-06",
"homepage": "http://wordpress.org/plugins/hello-dolly/",
"short_description": "This is not just a plugin, it symbolizes the hope and enthusiasm...",
"download_link": "https://downloads.wordpress.org/plugin/hello-dolly.1.7.3.zip",
"tags": {}
},
"time_extracted": "2025-07-11T00:00:00.000000+00:00"
}
Theme Data
{
"type": "RECORD",
"stream": "themes",
"record": {
"name": "Twenty Twenty-Five",
"slug": "twentytwentyfive",
"version": "1.2",
"preview_url": "https://wp-themes.com/twentytwentyfive/",
"screenshot_url": "//ts.w.org/wp-content/themes/twentytwentyfive/screenshot.png?ver=1.2",
"rating": 78,
"num_ratings": 9,
"homepage": "https://wordpress.org/themes/twentytwentyfive/",
"requires": "6.7",
"requires_php": "7.2"
},
"time_extracted": "2025-07-11T00:00:00.000000+00:00"
}
WordPress Statistics
{
"type": "RECORD",
"stream": "wordpress_stats",
"record": {
"version": "6.8",
"count": 7500000,
"percent": 45.5
},
"time_extracted": "2025-07-10T03:10:28.063321+00:00"
}
Incremental Sync in Action
The extractor also supports incremental syncing for plugins. After an initial full sync, subsequent runs only fetch plugins updated since the last run:
# First run - gets all plugins updated after start_date
python -m tap_wordpress_org.tap --config config.json > run1.jsonl
# Second run - uses state from previous run to get only new updates
python -m tap_wordpress_org.tap --config config.json --state state.json > run2.jsonl
Lessons Learned
Working with Claude Code on this project taught me that:
- Agentic AI-Assisted development can be powerful: Claude Code understood the requirements and overall generated quality code that would have taken a few hours to write manually, even with the Meltano SDK.
- Iterative development is key: Rather than trying to get everything perfect upfront, Claude Code and I worked iteratively, testing against real API endpoints and refining the implementation. For example, we discovered that the WordPress.org API doesn’t support field filtering for themes, which we only found through actual testing – Claude’s agentic ability to test and make changes mean that it immediately adapted the code accordingly.
- Documentation matters: Claude Code helped create comprehensive todos and documentation (Markdown files) for itself as it went along and I believe this helped maintain a clear context which in turn probably reduced hallucinations and errors.
Looking Forward
The tap-wordpress-org extractor is now available on GitHub and is also listed on the Meltano Hub.
Whether you’re analyzing WordPress ecosystem trends, monitoring plugin security updates, or building competitive intelligence tools, tap-wordpress-org provides a foundation for extracting WordPress.org data in a robust, scalable way.
Get Started Today
Ready to analyze WordPress.org data? Get started with:
# Install the tap
pip install git+https://github.com/Automattic/tap-wordpress-org.git
# Create a config file
echo '{"stream_selection": ["plugins", "themes"], "request_delay": 0.3}' > config.json
# Run the extractor
python -m tap_wordpress_org.tap --config config.json > wordpress_data.jsonl

Leave a Reply