Description
With Drupal 8, we’ve often talked about “getting off the island” in terms of benefiting from much fine PHP work done outside of the Drupal community. We haven’t talked as much about going in the opposite direction - making our own fine work available for use outside of Drupal.
I have been working on a proof-of-concept for a general-purpose ETL library for PHP. This is not a direct port of the Drupal migration system, but is based on the same broad architecture (e.g., the source/process/destination concepts still exist, as extractors/transformers/loaders). Broadly, the architecture looks like this:
- A Task accepts configuration defining a migration process, and implements operations - most notably migrate. The following steps describe the migrate operation.
- The task constructs the configured Extractor, which obtains data from a source such as a SQL query, a CSV file, an XML/JSON API, etc.
- Iterating over the extractor returns one DataRecord (collection of named DataProperty instances) at a time containing source data. The task creates an empty DataRecord representing the destination data.
- The task configuration defines a transform pipeline keyed by destination property names. For each of these properties, a sequence of one or more Transformer classes with corresponding configuration is invoked to determine the destination property value - usually, the first one will be configured to accept one or more source property names, and the results will be fed to subsequent transformers, with the final result assigned to the named property in the destination DataRecord.
- The destination DataRecord is passed to the configured Loader to be loaded into the destination store - a SQL database, a CSV file, etc.
- If an optional KeyMap is configured within the task, it is used to store the mapping from the source record's unique key to the destination record's unique key. This enables keyed relationships to be maintained even if keys change when migrating, as well as enabling rollback.
Beyond the technical considerations - consider the size of the Drupal migration ecosystem (including contrib modules). Now add integrations with other frameworks and CMS's. If successful (let's be optimistic!), this could be a substantial open-source community of its own. How do we build diversity and scalability into an open-source community from the ground up?
In the first half of this session I will present the basic proposed architecture and demonstrate some applications. In the second half, I would like an open discussion with/among attendees about building an open-source community from scratch.
Mike Ryan
Migration specialist @ Virtuoso PerformanceMike has been doing data/content migrations in Drupal since the D6 days, being the primary developer of the D6/D7 contributed migrate module and a major contributor to the D8 migration system.