Blogs

EMAIL: info@example.com

Agaric Collective: Drupal migrations reference: List of configuration options for source plugins

In a previous article we explained the syntax used to write Drupal migration. We also provided references of subfields and content entities’ properties including those provided by the Commerce module. This time we are going to list the configuration options of many migrate source plugins. For example, when importing from a JSON file you need to specify which data fetcher and parser to use. In the case of CSV migrations, the source plugin configuration changes depending on the presence of a headers row. Finding out which options are available might require some Drupal development knowledge. To make the process easier, in today’s article we are presenting a reference of available configuration options for migrate source plugins provided by Drupal core and some contributed modules.

List of configuration options for source plugins

For each migrate source plugin we will present: the module that provides it, the class that defines it, the class that the plugin extends, and any inherited options from the class hierarchy. For each plugin configuration option we will list its name, type, a description, and a note if it is optional.

SourcePluginBase (abstract class)

Module: Migrate (Drupal Core)
Class: DrupalmigratePluginmigratesourceSourcePluginBase
Extends: DrupalCorePluginPluginBase

This abstract class is extended by most migrate source plugins. This means that the provided configuration keys apply to any source plugin extending it.

List of configuration keys:

  1. skip_count: An optional boolean value. If set, do not attempt to count the source. This includes status and import operations.
  2. cache_counts: An optional boolean value. If set, cache the source count. This saves time calculating the number of available records in the source when a count operation is requested.
  3. cache_key: An optional string value. Uniquely named cache key used for cache_counts.
  4. track_changes: An optional boolean value. If set, the Migrate API will keep a hash of the source rows to determine whether the incoming data has changed. If the hash is different the record is re-imported.
  5. high_water_property: An optional array value. If set, only content with a higher value will be imported. The value is usually a timestamp or serial ID indicating what was the last imported record. This key is configured as an associate array. The name key indicates the column in the source that will be used for the comparison. The alias key is optional and if set it serves as a table alias for the column name.
  6. source_module: An optional string value. Identifies the system providing the data the source plugin will read. If not set, the Migrate API tries to read the value from the source plugin annotation. The source plugin itself determines how the value is used. For example, Migrate Drupal’s source plugins expect source_module to be the name of a module that must be installed and enabled in the source database.

The high_water_property and track_changes are mutually exclusive. They are both designed to conditionally import new or updated records from the source. Hence, only one can be configured per migration definition file.

SqlBase (abstract class)

Module: Migrate (Drupal Core)
Class: DrupalmigratePluginmigratesourceSqlBase
Extends: DrupalmigratePluginmigratesourceSourcePluginBase
Inherited configuration options: skip_count, cache_counts, cache_key, track_changes, high_water_property, and source_module.

This abstract class is extended by migrate source plugins whose data may be fetched via a database connection. This means that the provided configuration keys apply to any source plugin extending it.

In addition to the keys provided in the parent class chain, this abstract class provides the following configuration keys:

  1. key: An optional string value. The database key name. Defaults to ‘migrate’.
  2. target: An optional string value. The database target name. Defaults to ‘default’.
  3. database_state_key: An optional string value. Name of the state key which contains an array with database connection information. The Migrate API will consult the States API using the provided key. The returned value should be an associative array with at least two keys: key and target to determine which database connection to use. A third key database can also be included containing database connection information as seen in the snippet below.
  4. batch_size: An optional integer value. Number of records to fetch from the database during each batch. If omitted, all records are fetched in a single query.
  5. ignore_map: An optional boolean value. Source data is joined to the map table by default to improve performance. If set to TRUE, the map table will not be joined. Using expressions in the query may result in column aliases in the JOIN clause which would be invalid SQL. If you run into this, set ignore_map to TRUE.

To explain how these configuration keys are used, consider the following database connections:

'drupal-8-or-9-database-name', 'username' => 'drupal-8-or-9-database-username', 'password' => 'drupal-8-or-9-database-password', 'host' => 'drupal-8-or-9-database-server', 'port' => '3306', 'namespace' => 'Drupal\Core\Database\Driver\mysql', 'driver' => 'mysql', ]; $databases['migrate']['default'] = [ 'database' => 'drupal-6-or-7-database-name', 'username' => 'drupal-6-or-7-database-username', 'password' => 'drupal-6-or-7-database-password', 'host' => 'drupal-6-or-7-database-server', 'port' => '3306', 'namespace' => 'Drupal\Core\Database\Driver\mysql', 'driver' => 'mysql', ];

This snippet can be added to settings.php or settings.local.php. The $databases array is a nested array of at least three levels. The first level defines the database keys: default and migrate in our example. The second level defines the database targets: default in both cases. The third level is an array with connection details for each key/target combination. This documentation page contains more information about database configuration.

Based on the specified configuration values, this is how the Migrate API determines which database connection to use:

  • If the source plugin configuration contains database_state_key, its value is taken as the name of a States API key that specifies an array with the database configuration.
  • Otherwise, if the source plugin configuration contains key, the database configuration with that name is used.
  • Otherwise, load a fallback state key from the States API. The value that it tries to read is the global state key: migrate.fallback_state_key.
  • Otherwise, the database connection named migrate is used by default.
  • If all of the above steps fail, a RequirementsException is thrown.

Note that all values configuration keys are optional. If none is set, the plugin will default to use the connection specified under $databases[‘migrate’][‘default’]. At least, set the key configuration even if the value is migrate. This would make it explicit which connection is being used.

DrupalSqlBase (abstract class)

Module: Migrate Drupal (Drupal Core)
Class: Drupalmigrate_drupalPluginmigratesourceDrupalSqlBase
Extends: DrupalmigratePluginmigratesourceSqlBase
Inherited configuration options: skip_count, cache_counts, cache_key, track_changes, high_water_property, source_module, key, target, database_state_key, batch_size, and ignore_map.

This abstract class provides general purpose helper methods that are commonly needed when writing source plugins that use a Drupal database as a source. For example, check if the given module exists and read Drupal configuration variables. Check the linked class documentation for more available methods.

In addition to the keys provided in the parent class chain, this abstract class provides the following configuration key:

  1. constants: An optional array value used to add module dependencies. The value is an associative array with two possible elements. The first uses the entity_type key to add a dependency on the module that provides the specified entity type. The second uses the module key to directly add a dependency on the specified module. In both cases, the DependencyTrait is used for setting the dependencies.

Warning: A plugin extending this abstract class might want to use this configuration key in the source definition to set module dependencies. If so, the expected keys might clash with other source constants used in the process pipeline. Arrays keys in PHP are case sensitive. Using uppercase in custom source constants might avoid this clash, but it is preferred to use a different name to avoid confusion.

This abstract class is extended by dozens of core classes that provide an upgrade path from Drupal 6 and 7. It is also used by the Commerce Migrate module to read product types, product display types, and shipping flat rates from a Commerce 1 database. The same module follows a similar approach to read data from an Ubercart database. The Paragraphs module also extends it to add and implement Configurable Plugin interface so it can import field collection types and paragraphs types from Drupal 7.

Config

Module: Migrate Drupal (Drupal Core). Plugin ID: d8_config
Class: Drupalmigrate_drupalPluginmigratesourced8Config
Extends: Drupalmigrate_drupalPluginmigratesourceDrupalSqlBase
Inherited configuration options: skip_count, cache_counts, cache_key, track_changes, high_water_property, source_module, key, target, database_state_key, batch_size, ignore_map, and constants.

This plugin allows reading configuration values from a Drupal 8 site by reading its config table.

In addition to the keys provided in the parent class chain, this plugin does not define extra configuration keys. And example configuration for this plugin would be:

source: plugin: d8_config key: migrate skip_count: true

In this case we are setting the key property from SqlBase to use the migrate default database connection. The skip_count from SourcePluginBase indicates that there is no need to count how many records exist in the source database before executing migration operations like importing them.

This plugin is presented to show that Drupal core already offers a way to migrate data from Drupal 8. Remember that there are dozens of other plugins extending DrupalSqlBase. It would be impractical to list them all here. See this API page for a list of all of them.

CSV

Module: Migrate Source CSV. Plugin ID: csv
Class: Drupalmigrate_source_csvPluginmigratesourceCSV
Extends: DrupalmigratePluginmigratesourceSourcePluginBase
Inherited configuration options: skip_count, cache_counts, cache_key, track_changes, high_water_property, and source_module.

This plugin allows reading data from a CSV file. We used this plugin in the CSV migration example of the 31 days of migration series.

In addition to the keys provided in the parent class chain, this plugin provides the following configuration keys:

  1. path: A string value. It contains the path to the CSV file. Starting with the 8.x-3.x branch, stream wrappers are supported. The original article contains more details on specifying the location of the CSV file.
  2. ids: An array of string values. The column names listed are used to uniquely identify each record.
  3. header_offset: An optional integer value. The index of record to be used as the CSV header and the thereby each record’s field name. It defaults to zero (0) because the index is zero-based. For CSV files with no header row the value should be set to null.
  4. fields: An optional associative array value. It contains a nested array of names and labels to use instead of a header row. If set, it will overwrite the column names obtained from header_offset.
  5. delimiter: An optional string value. It contains a one character column delimiter. It defaults to a comma (,). For example, if your file uses tabs as delimiter, you set this configuration to t.
  6. enclosure: An optional string value. It contains one character used to enclose the column values. Defaults to double quotation marks (“).
  7. escape: An optional string value. It contains one character used for character escaping in the column values. It defaults to a backslash ().

Important: The configuration options changed significantly between the 8.x-3.x and 8.x-2.x branches. Refer to this change record for a reference of how to configure the plugin for the 8.x-2.x.

For reference, below is the source plugin configuration used in the CSV migration example:

source: plugin: csv path: modules/custom/ud_migrations/ud_migrations_csv_source/sources/udm_photos.csv ids: [photo_id] header_offset: null fields: - name: photo_id label: 'Photo ID' - name: photo_url label: 'Photo URL'

Spreadsheet

Module: Migrate Spreadsheet. Plugin ID: spreadsheet
Class: Drupalmigrate_spreadsheetPluginmigratesourceSpreadsheet
Extends: DrupalmigratePluginmigratesourceSourcePluginBase
Inherited configuration options: skip_count, cache_counts, cache_key, track_changes, high_water_property, and source_module.

This plugin allows reading data from Microsoft Excel and LibreOffice Calc files. It requires the PhpOffice/PhpSpreadsheet library and many PHP extensions including ext-zip. Check this page for a full list of dependencies. We used this plugin in the spreadsheet migration examples of the 31 days of migration series.

In addition to the keys provided in the parent class chain, this plugin provides the following configuration keys:

  1. file: A string value. It stores the path to the document to process. You can use a relative path from the Drupal root, an absolute path, or stream wrappers.
  2. worksheet: A string value. It contains the name of the one worksheet to process.
  3. header_row: An optional integer value. This number indicates which row contains the headers. Contrary to CSV migrations, the row number is not zero-based. So, set this value to 1 if headers are on the first row, 2 if they are on the second, and so on.
  4. origin: An optional string value. It defaults to A2. It indicates which non-header cell contains the first value you want to import. It assumes a grid layout and you only need to indicate the position of the top-left cell value.
  5. columns: An optional array value. It is the list of columns you want to make available for the migration. In case of files with a header row, use those header values in this list. Otherwise, use the default title for columns: A, B, C, etc. If this setting is missing, the plugin will return all columns. This is not ideal, especially for very large files containing more columns than needed for the migration.
  6. row_index_column: An optional string value. This is a special column that contains the row number for each record. This can be used as a unique identifier for the records in case your dataset does not provide a suitable value. Exposing this special column in the migration is up to you. If so, you can come up with any name as long as it does not conflict with header row names set in the columns configuration. Important: this is an autogenerated column, not any of the columns that comes with your dataset.
  7. keys: An optional associative array value. If not set, it defaults to the value of row_index_column. It contains the “columns” that uniquely identify each record. The keys are column names as defined in the data_rows key. The values is an array with a single member with key ‘type’ and value a column type such as ‘integer’. For files with a header row, you can use the values set in the columns configuration. Otherwise, use default column titles like A, B, C, etc. In both cases, you can use the row_index_column column if it was set.

Note that nowhere in the plugin configuration you specify the file type. The same setup applies for both Microsoft Excel and LibreOffice Calc files. The library will take care of detecting and validating the proper type.

For reference, below is the source plugin configuration used in the LibreOffice Calc migration example:

source: plugin: spreadsheet file: modules/custom/ud_migrations/ud_migrations_sheets_sources/sources/udm_book_paragraph.ods worksheet: 'UD Example Sheet' header_row: 1 origin: A2 columns: - book_id - book_title - 'Book author' row_index_column: 'Document Row Index' keys: book_id: type: string

SourcePluginExtension (abstract class)

Module: Migrate Plus
Class: Drupalmigrate_plusPluginmigratesourceSourcePluginExtension
Extends: DrupalmigratePluginmigratesourceSourcePluginBase
Inherited configuration options: skip_count, cache_counts, cache_key, track_changes, high_water_property, and source_module.

This abstract class provides extra configuration keys. It is extended by the URL plugin (explained later) and by source plugins provided by other modules like Feeds Migrate.

In addition to the keys provided in the parent class chain, this abstract class provides the following configuration keys:

  1. fields: An associative array value. Each element represents a field that will be made available to the migration. The following options can be set:
  2. name: A string value. This is how the field is going to be referenced in the migration. The name itself can be arbitrary. If it contains spaces, you need to put double quotation marks (“) around it when referring to it in the migration.
  3. label: An optional string value. This is a description used when presenting details about the migration. For example, in the user interface provided by the Migrate Tools module. When defined, you do not use the label to refer to the field. Keep using the name.
  4. selector: A string value. This is another XPath-like string to find the field to import. The value must be relative to the location specified by the item_selector configuration. In the example, the fields are direct children of the records to migrate. Therefore, only the property name is specified (e.g., unique_id). If you had nested objects or arrays, you would use a slash (/) character to go deeper in the hierarchy. This will be demonstrated in the image and paragraph migrations.
  5. ids: An associative array value. It contains the “columns” that uniquely identify each record. The keys are column names as defined in the data_rows key. The values is an array with a single member with key ‘type’ and value a column type such as ‘integer’.

See the code snippet for the Url plugin in the next section for an example of how these configuration options are used.

Url (used for JSON, XML, SOAP, and Google Sheets migrations)

Module: Migrate Plus. Plugin ID: url
Class: Drupalmigrate_plusPluginmigratesourceUrl
Extends: Drupalmigrate_plusPluginmigratesourceSourcePluginExtension
Inherited configuration options: skip_count, cache_counts, cache_key, track_changes, high_water_property, source_module, fields, and ids.

This plugin allows reading data from URLs. Using data parser plugins it is possible to fetch data from JSON, XML, SOAP, and Google Sheets. Note that this source plugin uses other plugins provided by Migrate Plus that might require extra configuration keys in addition to the ones explicitly defined in the plugin class. Those will also be listed.

In addition to the keys provided in the parent class chain, this plugin provides the following configuration keys:

  1. urls: An array of string values. It contains the source URLs to retrieve. If the file data fetcher plugin is used, the location can be a relative path from the Drupal root, an absolute path, a fully-qualified URL, or a stream wrapper. See this article for details on location of the JSON file. If the HTTP data fetcher plugin is used, the value can use any protocol supported by curl. See this article for more details on importing remote JSON files.
  2. data_parser_plugin: A string value. It indicates which data parser plugin to use. Possible values provided by the Migrate Plus module are json, xml, simple_xml, and soap. If the Migrate Google Sheets module is installed, it is possible to set the value to google_sheets. Review the relevant articles in the 31 days of migration series to know more about how each of them are used.

The data parser plugins provide the following configuration keys:

  1. item_selector: A string value. It indicates where in the source file lies the array of records to be migrated. Its value is an XPath-like string used to traverse the file hierarchy. Note that a slash (/) is used to separate each level in the hierarchy.
  2. data_fetcher_plugin: A string value. It indicates which data parser plugin to use. Possible values provided by the Migrate Plus module are file and http.

The HTTP data fetcher plugins provide the following configuration keys:

  1. headers: An associative array value. The key/value pairs represent HTTP headers to be sent when the request is made.
  2. authentication: An associative array value. One of the elements in the array should be the plugin key which indicates the authentication plugin to use. Possible values provided by the Migrate Plus module are basic, digest, and oauth2. Other elements in the array will depend on the selected authentication plugin.

The basic and digest authentication plugins provide the following configuration keys:

  1. username: A string value.
  2. password: A string value.

The OAuth2 authentication plugin requires the sainsburys/guzzle-oauth2-plugin composer package to work. It provides the following configuration keys:

  1. base_uri: A string value.
  2. grant_type: A string value. Possible values are authorization_code, client_credentials, urn:ietf:params:oauth:grant-type:jwt-bearer, password, and refresh_token. Each of these might require extra configuration values.

The client credentials grant type requires the following configuration keys:

  1. token_url: A string value.
  2. client_id: A string value.
  3. client_secret: A string value.

For configuration keys required by other grant types, refer to the classes that implement them. Read this article on adding HTTP request headers and authentication parameters for example configurations.

There are many combinations possible to configure this plugin. In the 31 days of migration series there are many example configurations. For reference, below is the source plugin configuration used in the local JSON node migration example:

source: plugin: url data_fetcher_plugin: file data_parser_plugin: json urls: - modules/custom/ud_migrations/ud_migrations_json_source/sources/udm_data.json item_selector: /data/udm_people fields: - name: src_unique_id label: 'Unique ID' selector: unique_id - name: src_name label: 'Name' selector: name - name: src_photo_file label: 'Photo ID' selector: photo_file - name: src_book_ref label: 'Book paragraph ID' selector: book_ref ids: src_unique_id: type: integer

EmbeddedDataSource

Module: Migrate (Drupal Core). Plugin ID: embedded_data
Class: DrupalmigratePluginmigratesourceEmbeddedDataSource
Extends: DrupalmigratePluginmigratesourceSourcePluginBase
Inherited configuration options: skip_count, cache_counts, cache_key, track_changes, high_water_property, and source_module.

This plugin allows the definition of data to be imported right inside the migration definition file. We used this plugin in many of the examples of the 31 days of migration series. It is also used in many core tests for the Migrate API itself.

In addition to the keys provided in the parent class chain, this abstract class provides the following configuration keys:

  1. data_rows: An array of all the records to be migrated. Each record might contain an arbitrary number of key-value pairs representing “columns” of data to be imported.
  2. ids: An associative array. It contains the “columns” that uniquely identify each record. The keys are column names as defined in the data_rows key. The values is an array with a single member with key ‘type’ and value a column type such as ‘integer’.

Many examples of 31 days of migration series use this plugin. You can get the example modules from this repository. For reference, below is the source plugin configuration used in the first migration example:

source: plugin: embedded_data data_rows: - unique_id: 1 creative_title: 'The versatility of Drupal fields' engaging_content: 'Fields are Drupal''s atomic data storage mechanism...' - unique_id: 2 creative_title: 'What is a view in Drupal? How do they work?' engaging_content: 'In Drupal, a view is a listing of information. It can a list of nodes, users, comments, taxonomy terms, files, etc...' ids: unique_id: type: integer

This plugin can also be used to create default content when the data is known in advance. We often present Drupal site building workshops. To save time, we use this plugin to create nodes which are later used when explaining how to create Views. Check this repository for an example of this. Note that it uses a different directory structure to store the migrations as explained in this blog post.

ContentEntity

Module: Migrate Drupal (Drupal Core). Plugin ID: content_entity
Class: Drupalmigrate_drupalPluginmigratesourceContentEntity
Extends: DrupalmigratePluginmigratesourceSourcePluginBase
Inherited configuration options: skip_count, cache_counts, cache_key, track_changes, high_water_property, and source_module.

This plugin returns content entities from a Drupal 8 or 9 installation. It uses the Entity API to get the data to migrate. If the source entity type has custom field storage fields or computed fields, this class will need to be extended and the new class will need to load/calculate the values for those fields.

In addition to the keys provided in the parent class chain, this plugin provides the following configuration key:

  1. entity_type: A string value. The entity type ID of the entities being migrated. This is calculated dynamically by the deriver so it is only needed if the deriver is not utilized, i.e., a custom source plugin.
  2. bundle: An optionals string value. If set and the entity type is bundleable, only return entities of this bundle.
  3. include_translations: An optional boolean value. If set, entity translations are included in the returned data. It defaults to TRUE.

For reference, this is how this plugin is configured to get all nodes of type article in their default language only:

source: plugin: content_entity:node bundle: article include_translations: false

Note: this plugin was brought into core in this issue copied from the Drupal 8 migration (source) module. The latter can be used if the source database does not use the default connection.

Table

Module: Migrate Plus. Plugin ID: table
Class: Drupalmigrate_plusPluginmigratesourceTable
Extends: DrupalmigratePluginmigratesourceSqlBase
Inherited configuration options: skip_count, cache_counts, cache_key, track_changes, high_water_property, source_module, key, target, database_state_key, batch_size, and ignore_map.

This plugin allows reading data from a single database table. It uses one of the database connections for the site as defined by the options. See this test for an example on how to use this plugin.

In addition to the keys provided in the parent class chain, this plugin provides the following configuration key:

  1. table_name: An string value. The table to read values from.
  2. id_fields: An associative array value. IDMap compatible array of fields that uniquely identify each record.
  3. fields: An array value. The elements are fields (“columns”) present on the source table.

EmptySource (migrate module)

Module: Migrate (Drupal Core). Plugin ID: empty
Class: DrupalmigratePluginmigratesourceEmptySource
Extends: DrupalmigratePluginmigratesourceSourcePluginBase
Inherited configuration options: skip_count, cache_counts, cache_key, track_changes, high_water_property, and source_module.

This plugin returns an empty row by default. It can be used as a placeholder to defer setting the source plugin to a deriver. An example of this can be seen in the migrations for Drupal 6 and Drupal 7 entity reference translations. In both cases, the source plugin will be determined by the EntityReferenceTranslationDeriver.

In addition to the keys provided in the parent class chain, this plugin does not define extra configuration keys. If the plugin is used with source constants, a single row containing the constant values will be returned. For example:

source: plugin: empty constants: entity_type: node field_name: body

The plugin will return a single row containing ‘entity_type’ and ‘field_name’ elements, with values of ‘node’ and ‘body’, respectively. This is not very useful. For the most part, the plugin is used to defer the definition to a deriver as mentioned before.

EmptySource (migrate_drupal module)

Module: Migrate Drupal (Drupal Core). Plugin ID: md_empty
Class: Drupalmigrate_drupalPluginmigratesourceEmptySource
Extends: DrupalmigratePluginmigratesourceEmptySource
Inherited configuration options: skip_count, cache_counts, cache_key, track_changes, high_water_property, and source_module.

By default, this plugin returns an empty row with Drupal specific config dependencies. If the plugin is used with source constants, a single row containing the constant values will be returned. These can be seen in the user_picture_field.yml and d6_upload_field.yml migrations.

In addition to the keys provided in the parent class chain, this abstract class provides the following configuration keys:

  1. constants: An optional array value used to add module dependencies. The value is an associative array with one possible element. An entity_type key is used to add a dependency on the module that provides the specified entity type.

Available configuration for other migrate source plugins

In Drupal core itself there are more than 100 migrate source plugins, most of which come from the Migrate Drupal module. And many more are made available by contributed modules. It would be impractical to document them all here. To get a list by yourself, load the plugin.manager.migrate.source service and call its getFieldStorageDefinitions() method. This will return all migrate source plugins provided by the modules that are currently enabled on the site. This Drush command would get the list:

# List of migrate source plugin definitions. $ drush php:eval "print_r(Drupal::service('plugin.manager.migrate.source')->getDefinitions());" # List of migrate source plugin ids. $ drush php:eval "print_r(array_keys(Drupal::service('plugin.manager.migrate.source')->getDefinitions()));"

To find out which configuration options are available for any source plugin consider the following:

  • Find the class that defines the plugin and find out which configuration values are read. Some plugins even include the list in the docblock of the class. Search for a pattern similar to $this->configuration[‘option_name’] or $configuration[‘option_name’]. The plugins can be found in the Drupalmodule_namePluginmigratesource namespace. The class itself would be in a file under the /src/Plugin/migrate/source/ directory of the module.
  • Look up in the class hierarchy. The source plugin can use any configuration set directly in its definition class and any parent class. There might be multiple layers of inheritance.
  • Check if the plugin offers some extension mechanism that would allow it to use other types of plugins. For example, the data fetcher, data parses, and authentication plugins provided by Migrate Plus. The Url migrate source plugin does this.

What did you learn in today’s article? Did you know that migrate source plugins can inherit configuration keys from their class hierarchy? Were you aware that there are so many source plugins? Other than the ones listed here, which source plugins have you used? Please share your answers in the comments. Also, we would be grateful if you shared this article with your friends and colleagues.

Read more and discuss at agaric.coop.