Agaric Collective: Migrating CSV files into Drupal

Today we will learn how to migrate content from a Comma-Separated Values (CSV) file into Drupal. We are going to use the latest version of the Migrate Source CSV module which depends on the third-party library league/csv. We will show how configure the source plugin to read files with or without a header row. We will also talk about a new feature that allows you to use stream wrappers to set the file location. Let’s get started.

Getting the code

You can get the full code example at https://github.com/dinarcon/ud_migrations The module to enable is UD CSV source migration whose machine name is ud_migrations_csv_source. It comes with three migrations: udm_csv_source_paragraph, udm_csv_source_image, and udm_csv_source_node.

You can get the Migrate Source CSV module is using composer: composer require drupal/migrate_source_csv. This will also download its dependency: the league/csv library. The example assumes you are using 8.x-3.x branch of the module, which requires composer to be installed. If your Drupal site is not composer-based, you can use the 8.x-2.x branch. Continue reading to learn the difference between the two branches.

Understanding the example set up

This migration will reuse the same configuration from the introduction to paragraph migrations example. Refer to that article for details on the configuration: the destinations will be the same content type, paragraph type, and fields. The source will be changed in today’s example, as we use it to explain JSON migration. The end result will again be nodes containing an image and a paragraph with information about someone’s favorite book. The major difference is that we are going to read from JSON.

Note that you can literally swap migration sources without changing any other part of the migration. This is a powerful feature of ETL frameworks like Drupal’s Migrate API. Although possible, the example includes slight changes to demonstrate various plugin configuration options. Also, some machine names had to be changed to avoid conflicts with other examples in the demo repository.

Migrating CSV files with a header row

In any migration project, understanding the source is very important. For CSV migrations, the primary thing to consider is whether or not the file contains a row of headers. Other things to consider are what characters to use as delimiter, enclosure, and escape character. For now, let’s consider the following CSV file whose first row serves as column headers:

unique_id,name,photo_file,book_ref
1,Michele Metts,P01,B10
2,Benjamin Melançon,P02,B20
3,Stefan Freudenberg,P03,B30

This file will be used in the node migration. The four columns are used as follows:

  • unique_id is the unique identifier for each record in this CSV file.
  • name is the name of a person. This will be used as the node title.
  • photo_file is the unique identifier of an image that was created in a separate migration.
  • book_ref is the unique identifier of a book paragraph that was created in a separate migration.

The following snippet shows the configuration of the CSV source plugin for the node migration:

source:
  plugin: csv
  path: modules/custom/ud_migrations/ud_migrations_csv_source/sources/udm_people.csv
ids: [unique_id]

The name of the plugin is csv. Then you define the path pointing to the file itself. In this case, the path is relative to the Drupal root. Finally, you specify an ids array of columns names that would uniquely identify each record. As already stated, the unique_id column servers that purpose. Note that there is no need to specify all the columns names from the CSV file. The plugin will automatically make them available. That is the simplest configuration of the CSV source plugin.

The following snippet shows part of the process, destination, and dependencies configuration of the node migration:

process:
  field_ud_image/target_id:
    plugin: migration_lookup
    migration: udm_csv_source_image
    source: photo_file
destination:
  plugin: 'entity:node'
  default_bundle: ud_paragraphs
migration_dependencies:
  required:
    - udm_csv_source_image
    - udm_csv_source_paragraph
optional: []

Note that the source for the setting the image reference is photo_file. In the process pipeline you can directly use any column name that exists in the CSV file. The configuration of the migration lookup plugin and dependencies point to two CSV migrations that come with this example. One is for migrating images and the other for migrating paragraphs.

Migrating CSV files without a header row

Now let’s consider two examples of CSV files that do not have a header row. The following snippets show the example CSV file and source plugin configuration for the paragraph migration:

B10,The definite guide to Drupal 7,Benjamin Melançon et al.
B20,Understanding Drupal Views,Carlos Dinarte
B30,Understanding Drupal Migrations,Mauricio Dinarte
source:
  plugin: csv
  path: modules/custom/ud_migrations/ud_migrations_csv_source/sources/udm_book_paragraph.csv
  ids: [book_id]
  header_offset: null
  fields:
    - name: book_id
    - name: book_title
- name: 'Book author'

When you do not have a header row, you need to specify two more configuration options. header_offset has to be set to null. fields has to be set to an array where each element represents a column in the CSV file. You include a name for each column following the order in which they appear in the file. The name itself can be arbitrary. If it contained spaces, you need to put quotes () around it. After that, you set the ids configuration to one or more columns using the names you defined.

In the process section you refer to source columns as usual. You write their name adding quotes if it contained spaces. The following snippet shows how the process section is configured for the paragraph migration:

process:
  field_ud_book_paragraph_title: book_title
field_ud_book_paragraph_author: 'Book author'

The final example will show a slight variation of the previous configuration. The following two snippets show the example CSV file and source plugin configuration for the image migration:

P01,https://agaric.coop/sites/default/files/pictures/picture-15-1421176712.jpg
P02,https://agaric.coop/sites/default/files/pictures/picture-3-1421176784.jpg
P03,https://agaric.coop/sites/default/files/pictures/picture-2-1421176752.jpg
source:
  plugin: csv
  path: modules/custom/ud_migrations/ud_migrations_csv_source/sources/udm_photos.csv
  ids: [photo_id]
  header_offset: null
  fields:
    - name: photo_id
      label: 'Photo ID'
    - name: photo_url
label: 'Photo URL'

For each column defined in the fields configuration, you can optionally set a label. This is a description used when presenting details about the migration. For example, in the user interface provided by the Migrate Tools module. When defined, you do not use the label to refer to source columns. You keep using the column name. You can see this in the value of the ids configuration.

The following snippet shows part of the process configuration of the image migration:

process:
  psf_destination_filename:
    plugin: callback
    callable: basename
source: photo_url

CSV file location

When setting the path configuration you have three options to indicate the CSV file location:

  • Use a relative path from the Drupal root. The path should not start with a slash (/). This is the approach used in this demo. For example, modules/custom/my_module/csv_files/example.csv.
  • Use an absolute path pointing to the CSV location in the file system. The path should start with a slash (/). For example, /var/www/drupal/modules/custom/my_module/csv_files/example.csv.
  • Use a stream wrapper. This feature was introduced in the 8.x-3.x branch of the module. Previous versions cannot make use of them.

Being able to use stream wrappers gives you many options for setting the location to the CSV file. For instance:

  • Files located in the public, private, and temporary file systems managed by Drupal. This leverages functionality already available in Drupal core. For example: public://csv_files/example.csv.
  • Files located in profiles, modules, and themes. You can use the System stream wrapper module or apply this core patch to get this functionality. For example, module://my_module/csv_files/example.csv.
  • Files located in remote servers including RSS feeds. You can use the Remote stream wrapper module to get this functionality. For example, https://understanddrupal.com/csv-files/example.csv.

CSV source plugin configuration

The configuration options for the CSV source plugin are very well documented in the source code. They are included here for quick reference:

  • path is required. It contains the path to the CSV file. Starting with the 8.x-3.x branch, stream wrappers are supported.
  • ids is required. It contains an array of column names that uniquely identify each record.
  • header_offset is optional. The index of record to be used as the CSV header and the thereby each record’s field name. It defaults to zero (0) because the index is zero-based. For CSV files with no header row the value should be set to null.
  • fields is optional. It contains a nested array of names and labels to use instead of a header row. If set, it will overwrite the column names obtained from header_offset.
  • delimiter is optional. It contains one character column delimiter. It defaults to a comma (,). For example, if your file uses tabs as delimiter, you set this configuration to t.
  • enclosure is optional. It contains one character used to enclose the column values. Defaults to double quotation marks ().
  • escape is optional. It contains one character used for character escaping in the column values. It defaults to a backslash (****).

Important: The configuration options changed significantly between the 8.x-3.x and 8.x-2.x branches. Refer to this change record for a reference of how to configure the plugin for the 8.x-2.x.

And that is how you can use CSV files as the source of your migrations. Because this is such a common need, it was considered to move the CSV source plugin to Drupal core. The effort is currently on hold and it is unclear if it will materialize during Drupal 8’s lifecycle. The maintainers of the Migrate API are focusing their efforts on other priorities at the moment. You can read this issue to learn about the motivation and context for offering functionality in Drupal core.

Note: The Migrate Spreadsheet module can also be used to migrate data from CSV files. It also supports Microsoft Office Excel and LibreOffice Calc (OpenDocument) files. The module leverages the PhpOffice/PhpSpreadsheet library.

What did you learn in today’s blog post? Have you migrated from CSV files before? Did you know that it is now possible to read files using stream wrappers? Please share your answers in the comments. Also, I would be grateful if you shared this blog post with others.

This blog post series, cross-posted at UnderstandDrupal.com as well as here on Agaric.coop, is made possible thanks to these generous sponsors: Drupalize.me by Osio Labs has online tutorials about migrations, among other topics, and Agaric provides migration trainings, among other services. Contact Understand Drupal if your organization would like to support this documentation project, whether it is the migration series or other topics.

Read more and discuss at agaric.coop.

Spinning Code: SC DUG August 2019

After a couple months off, SC DUG met this month with a presentation on super cheap Drupal hosting.

Chris Zietlow from Mindgrub, Will Jackson from Kanopi Studios, and I all gave short talks very cheap ways to host Drupal 8.

Chris opened by talking about using AWS Micro servers. Will shared a solution using a Raspberry Pi for a fully wireless server. I closed the discussion with a review of using Drupal Tome on Netlify.

We all worked from a loose set of rules to help keep us honest and prevent overlapping:

Rules for Cheap D8 Hosting Challenge

The goal is to figure out the cheapest D8 hosting that would actually function for a project, even if it is deeply irresponsible to actually use.

Rules

  1. It has to actually work for D8 (so modern PHP version, working database, etc),
  2. You do not actually have to spend the money, but you do need to know all the steps required to make it work.
  3. It needs to honor the TOS for any networks and services you use (no illegal network taps – legal hidden taps are fair game).
  4. You have to share your idea with the other players so we don’t have two people propose the same solution (first-come-first-serve on ideas).

Reporting

Be prepared to talk for about 5 minutes on how your solution would work.  Your talk needs to include:

  1. Estimated Monthly cost for the first year.
  2. Steps required to make it work.
  3. Known weaknesses.

If you have a super cheap hosting solution for Drupal 8 we’d love to hear about it.

Agaric Collective: Introduction to paragraphs migrations in Drupal

Today we will present an introduction to paragraphs migrations in Drupal. The example consists of migrating paragraphs of one type, then connecting the migrated paragraphs to nodes. A separate image migration is included to demonstrate how they are different. At the end, we will talk about behavior that deletes paragraphs when the host entity is deleted. Let’s get started.

Example mapping for paragraph reference field.

Getting the code

You can get the full code example at https://github.com/dinarcon/ud_migrations The module to enable is UD paragraphs migration introduction whose machine name is ud_migrations_paragraph_intro. It comes with three migrations: ud_migrations_paragraph_intro_paragraph, ud_migrations_paragraph_intro_image, and ud_migrations_paragraph_intro_node. One content type, one paragraph type, and four fields will be created when the module is installed.

Note: Configuration placed in a module’s config/install directory will be copied to Drupal’s active configuration. And if those files have a dependencies/enforced/module key, the configuration will be removed when the listed modules are uninstalled. That is how the content type, the paragraph type, and the fields are automatically created and deleted.

You can get the Paragraph module is using composer: composer require drupal/paragraphs. This will also download its dependency: the Entity Reference Revisions module. If your Drupal site is not composer-based, you can get the code for both modules manually.

Understanding the example set up

The example code creates one paragraph type named UD book paragraph (ud_book_paragraph). It has two “Text (plain)” fields: Title (field_ud_book_paragraph_title) and Author (field_ud_book_paragraph_author). A new UD Paragraphs (ud_paragraphs) content type is also created. This has two fields: Image (field_ud_image) and Favorite book (field_ud_favorite_book) containing references to images and book paragraphs imported in separate migrations. The words in parenthesis represent the machine names of the different elements.

The paragraph migration

Migrating into a paragraph type is very similar to migrating into a content type. You specify the source, process the fields making any required transformation, and set the destination entity and bundle. The following code snippet shows the source, process, and destination sections:

source:
  plugin: embedded_data
  data_rows:
    - book_id: 'B10'
      book_title: 'The definite guide to Drupal 7'
      book_author: 'Benjamin Melançon et al.'
    - book_id: 'B20'
      book_title: 'Understanding Drupal Views'
      book_author: 'Carlos Dinarte'
    - book_id: 'B30'
      book_title: 'Understanding Drupal Migrations'
      book_author: 'Mauricio Dinarte'
  ids:
    book_id:
      type: string
process:
  field_ud_book_paragraph_title: book_title
  field_ud_book_paragraph_author: book_author
destination:
  plugin: 'entity_reference_revisions:paragraph'
  default_bundle: ud_book_paragraph

The most important part of a paragraph migration is setting the destination plugin to entity_reference_revisions:paragraph. This plugin is actually provided by the Entity Reference Revisions module. It is very important to note that paragraphs entities are revisioned. This means that when you want to create a reference to them, you need to provide two IDs: target_id and target_revision_id. Regular entity reference fields like files, images, and taxonomy terms only require the target_id. This will be further explained with the node migration.

The other configuration that you can optionally set in the destination section is default_bundle. The value will be the machine name of the paragraph type you are migrating into. You can do this when all the paragraphs for a particular migration definition file will be of the same type. If that is not the case, you can leave out the default_bundle configuration and add a mapping for the type entity property in the process section.

You can execute the paragraph migration with this command: drush migrate:import
ud_migrations_paragraph_intro_paragraph
. After running the migration, there is not much you can do to verify that it worked. Contrary to other entities, there is no user interface, available out of the box, that lists all paragraphs in the system. One way to verify if the migration worked is to manually create a View that shows paragraphs. Another way is to query the database directly. You can inspect the tables that store the paragraph fields’ data. In this example, the tables would be:

  • paragraph__field_ud_book_paragraph_author for the current author.
  • paragraph__field_ud_book_paragraph_title for the current title.
  • paragraph_r__8c3a9563ac for all the author revisions.
  • paragraph_r__3fa7e9863a for all the title revisions.

Each of those tables contains information about the bundle (paragraph type), the entity id, the revision id, and the migrated field value. Table names are derived from the machine names of the fields. If they are too long, the field name will be hashed to produce a shorter table name. Having to query the database is not ideal. Unfortunately, the options available to check if a paragraph migration worked are limited at the moment.

The node migration

The node migration will serve as the host for both referenced entities: images and paragraphs. The image migration is very similar to the one explained in a previous article. This time, the focus will be the paragraph migration. Both of them are set as dependencies of the node migration, so they need to be executed in advance. The following snippet shows how the source, destinations, and dependencies are set:

source:
  plugin: embedded_data
  data_rows:
    - unique_id: 1
      name: 'Michele Metts'
      photo_file: 'P01'
      book_ref: 'B10'
    - unique_id: 2
      name: 'Benjamin Melançon'
      photo_file: 'P02'
      book_ref: 'B20'
    - unique_id: 3
      name: 'Stefan Freudenberg'
      photo_file: 'P03'
      book_ref: 'B30'
  ids:
    unique_id:
      type: integer
destination:
  plugin: 'entity:node'
  default_bundle: ud_paragraphs
migration_dependencies:
  required:
    - ud_migrations_paragraph_intro_image
    - ud_migrations_paragraph_intro_paragraph
  optional: []

Note that photo_file and book_ref both contain the unique identifier of records in the image and paragraph migrations, respectively. These can be used with the migration_lookup plugin to map the reference fields in the nodes to be migrated. ud_paragraphs is the machine name of the target content type.

The mapping of the image reference field follows the same pattern than the one explained in the article on migration dependencies. Using the migration_lookup plugin, you indicate which is the migration that should be searched for the images. You also specify which source column contains the unique identifiers that match those in the image migration. This operation will return a single value: the file ID (fid) of the image. This value can be assigned to the target_id subfield of field_ud_image to establish the relationship. The following code snippet shows how to do it:

field_ud_image/target_id:
  plugin: migration_lookup
  migration: ud_migrations_paragraph_intro_image
  source: photo_file

Paragraph field mappings

Before diving into the paragraph field mapping, let’s think about what needs to be done. Paragraphs are revisioned entities. To make a reference to them, you need two IDs: their entity id and their entity revision id. These two values need to be assigned to two subfields of the paragraph reference field: target_id and target_revision_id respectively. You have to come up with a process pipeline that complies with this requirement. There are many ways to do it, and the specifics will depend on your field configuration. In this example, the paragraph reference field allows an unlimited number of paragraphs to be associated, but only of one type: ud_book_paragraph. Another thing to note is that even though the field allows you to add as many paragraphs as you want, the example migrates exactly one paragraph.

With those considerations in mind, the mapping of the paragraph field will be a two step process. First, use the migration_lookup plugin to get a reference to the paragraph. Second, use the fetched values to set the paragraph reference subfields. The following code snippet shows how to do it:

pseudo_mbe_book_paragraph:
  plugin: migration_lookup
  migration: ud_migrations_paragraph_intro_paragraph
  source: book_ref
field_ud_favorite_book:
  plugin: sub_process
  source:
    - '@pseudo_mbe_book_paragraph'
  process:
    target_id: '0'
    target_revision_id: '1'

The first step is a normal migration_lookup procedure. The important difference is that instead of getting a single value, like with images, the paragraph lookup operation will return an array of two values. The format is like [3, 7] where the 3 represents the entity id and the 7 represents the entity revision id of the paragraph. Note that the array keys are not named. To access those values, you would use the index of the elements starting with zero (0). This will be important later. The returned array is stored in the pseudo_mbe_book_paragraph pseudofield.

The second step is to set the target_id and target_revision_id subfields. In this example, field_ud_favorite_book is the machine name paragraph reference field. Remember that it is configured to accept an arbitrary number of paragraphs, and each will require passing an array of two elements. This means you need to process an array of arrays. To do that, you use the sub_process plugin to iterate over an array of paragraph references. In this example, the structure to iterate over would be like this:

[
  [3, 7]
]

Let’s dissect how to do the mapping of the paragraph reference field. The source configuration of the sub_process plugin contains an array of paragraph references. In the example, that array has a single element: the '@pseudo_mbe_book_paragraph' pseudofield. The quotes () and at sign (@) are required to reuse an element that appears before in the process pipeline. Then, in the process configuration, you set the subfields for the paragraph reference field. It is worth noting that at this point you are iterating over a list of paragraph references, even if that list contains only one element. If you had more than one paragraph to migrate, whatever you defined in process will apply to all of them.

The process configuration is an array of subfield mappings. The left side of the assignment is the name of the subfield you want to set. The right side of the assignment is an array index of the paragraph reference being processed. Remember that this array does not have named-keys, so you use their numerical index to refer to them. The example sets the target_id subfield to the element in the 0 index and the target_revision_id subfield to the element in the one 1 index. Using the example data, this would be target_id: 3 and target_revision_id: 7. The quotes around the numerical indexes are important. If not used, the migration will not find the indexes and the paragraphs will not be associated. The end result of this operation will be something like this:

'field_ud_favorite_book' => array (1) [
  array (2) [
    'target_id' => string (1) "3"
    'target_revision_id' => string (1) "7"
  ]
]

There are three ways to run the migrations: manually, executing dependencies, and using tags. The following code snippet shows the three options:

# 1) Manually.
$ drush migrate:import ud_migrations_paragraph_intro_image
$ drush migrate:import ud_migrations_paragraph_intro_paragraph
$ drush migrate:import ud_migrations_paragraph_intro_node

# 2) Executing depenpencies.
$ drush migrate:import ud_migrations_paragraph_intro_node --execute-dependencies

# 3) Using tags.
$ drush migrate:import --tag='UD Paragraphs Intro'

And that is one way to map paragraph reference fields. In the end, all you have to do is set the target_id and target_revision_id subfields. The process pipeline that gets you to that point can vary depending on how your paragraphs are configured. The following is a non-exhaustive list of things to consider when migrating paragraphs:

  • How many paragraphs types can be referenced?
  • How many paragraphs instances are being migrated? Is this a multivalue field?
  • Do paragraphs have translations?
  • Do paragraphs have revisions?

Do migrated paragraphs disappear upon node rollback?

Paragraphs migrations are affected by a particular behavior of revisioned entities. If the host entity is deleted, and the paragraphs do not have translations, the whole paragraph gets deleted. That means that deleting a node will make the referenced paragraphs’ data to be removed. How does this affect your migration workflow? If the migration of the host entity is rollback, then the paragraphs will be removed, the migrate API will not know about it. In this example, if you run a migrate status command after rolling back the node migration, you will see that the paragraph migration indicated that there are no pending elements to process. The file migration for the images will report the same, but in that case, the images will remain on the system.

In any migration project, it is common that you do rollback operations to test new field mappings or fix errors. Thus, chances are very high that you will stumble upon this behavior. Thanks to Damien McKenna for helping me understand this behavior and tracking it to the rollback() method of the EntityReferenceRevisions destination plugin. So, what do you do to recover the deleted paragraphs? You have to rollback both migrations: node and paragraph. And then, you have to import the two again. The following snippet shows how to do it:

# 1) Rollback both migrations.
$ drush migrate:rollback ud_migrations_paragraph_intro_node
$ drush migrate:rollback ud_migrations_paragraph_intro_paragraph

# 2) Import both migrations againg.

$ drush migrate:import ud_migrations_paragraph_intro_paragraph
$ drush migrate:import ud_migrations_paragraph_intro_node

What did you learn in today’s blog post? Have you migrated paragraphs before? If so, what challenges have you found? Did you know paragraph reference fields require two subfields to be set? Did you that deleting the host entity also deletes referenced paragraphs? Please share your answers in the comments. Also, I would be grateful if you shared this blog post with others.

This blog post series, cross-posted at UnderstandDrupal.com as well as here on Agaric.coop, is made possible thanks to these generous sponsors. Contact Understand Drupal if your organization would like to support this documentation project, whether it is the migration series or other topics.

Read more and discuss at agaric.coop.

Community Working Group posts: Announcing Drupal Event Code of Conduct Training

The Drupal Community Working Group is happy to announce that we’ve teamed up with Otter Tech to offer live, monthly, online Code of Conduct enforcement training for Drupal Event organizers and volunteers through the end of 2019. 

The training is designed to provide “first responder” skills to Drupal community members who take reports of potential Code of Conduct issues at Drupal events, including meetups, camps, conventions, and other gatherings. The workshops will be attended by Code of Conduct enforcement teams from other open source events, which will allow cross-pollination of knowledge with the Drupal community.

Each monthly online workshop is the same; community members only have to attend one monthly workshop of their choice to complete the training.  We strongly encourage all Drupal event organizers to consider sponsoring one or two persons’ attendance at this workshop.

The monthly online workshops will be presented by Sage Sharp, Otter Tech’s CEO and a diversity and inclusion leader in the open source community. From the official description of the workshop, it will include:

  • Practice taking a report of a potential Code of Conduct violation (an incident report)
  • Practice following up with the reported person
  • Instructor modeling on how to take a report and follow up on a report
  • One practice scenario for a report given at an event
  • One practice scenario for a report given in an online community
  • Discussion on bias, microaggressions, personal conflicts, and false reporting
  • Frameworks for evaluating a response to a report
  • 40 minutes total of Q&A time

In addition, we have received a Drupal Community Cultivation Grant to help defray the cost of the workshop for those that need assistance. The standard cost of the workshop is $350, Otter Tech has worked with us to allow us to provide the workshop for $300. To register for the workshop, first let us know that you’re interested by completing this sign-up form – everyone who completes the form will receive a coupon code for $50 off the regular price of the workshop.

For those that require additional assistance, we have a limited number of $100 subsidies available, bringing the workshop price down to $200. Subsidies will be provided based on reported need as well as our goal to make this training opportunity available to all corners of our community. To apply for the subsidy, complete the relevant section on the sign-up form. The deadline for applying for the subsidy is end-of-business on Friday, September 6, 2019 – those selected for the subsidy will be notified after this date (in time for the September 9, 2019 workshop).

The workshops will be held on:

  • September 9 (Monday) at 3 pm to 7 pm U.S. Pacific Time / 8 am to 12 pm Australia Eastern Time
  • October 23 (Wednesday) at 5 am to 9 am U.S. Pacific Time / 2 pm to 6 pm Central European Time
  • November 14 (Thursday) at 6 pm to 10 pm U.S. Pacific Time / 1 pm to 5 pm Australia Eastern Time
  • December 4 (Wednesday) at 9 am to 1 pm U.S. Pacific Time / 6 pm to 10 pm Central European Time

Those that successfully complete the training will be (at their discretion) listed on Drupal.org (in the Drupal Community Workgroup section) as a means to prove that they have completed the training. We feel that moving forward, the Drupal community now has the opportunity to have professionally trained Code of Conduct contacts at the vast majority of our events, once again, leading the way in the open source community.

We are fully aware that the fact that the workshops will be presented in English limit who will be able to attend. We are more than interested in finding additional professional Code of Conduct workshops in other languages. Please contact us if you can assist.
 

Eelke Blok: Fighting content spam on Drupal with Akismet

Once, the Drupal community had Mollom, and everything was good. It was a web service that would let you use an API to scan comments and other user-submitted content and it would let your site know whether it thought it was spam, or not, so it could safely publish the content. Or not. It was created by our very own Dries Buytaert and obviously had a Drupal module. It was the service of choice for Drupal sites struggling with comment spam. Unfortunately, Mollom no longer exists. But there is an alternative, from the WordPress world: Akismet.

Agaric Collective: Migrating addresses into Drupal

Today we will learn how to migrate addresses into Drupal. We are going to use the field provided by the Address module which depends on the third-party library commerceguys/addressing. When migrating addresses you need to be careful with the data that Drupal expects. The address components can change per country. The way to store those components also varies per country. These and other important consideration will be explained. Let’s get started.

Example address field process mapping.

Getting the code

You can get the full code example at https://github.com/dinarcon/ud_migrations The module to enable is UD address whose machine name is ud_migrations_address. The migration to execute is udm_address. Notice that this migration writes to a content type called UD Address and one field: field_ud_address. This content type and field will be created when the module is installed. They will also be removed when the module is uninstalled. The demo module itself depends on the following modules: address and migrate.

Note: Configuration placed in a module’s config/install directory will be copied to Drupal’s active configuration. And if those files have a dependencies/enforced/module key, the configuration will be removed when the listed modules are uninstalled. That is how the content type and fields are automatically created and deleted.

The recommended way to install the Address module is using composer: composer require drupal/address. This will grab the Drupal module and the commerceguys/addressing library that it depends on. If your Drupal site is not composer-based, an alternative is to use the Ludwig module. Read this article if you want to learn more about this option. In the example, it is assumed that the module and its dependency were obtained via composer. Also, keep an eye on the Composer Support in Core Initiative as they make progress.

Source and destination sections

The example will migrate three addresses from the following countries: Nicaragua, Germany, and the United States of America (USA). This makes it possible to show how different countries expect different address data. As usual, for any migration you need to understand the source. The following code snippet shows how the source and destination sections are configured:

source:
  plugin: embedded_data
  data_rows:
    - unique_id: 1
      first_name: 'Michele'
      last_name: 'Metts'
      company: 'Agaric LLC'
      city: 'Boston'
      state: 'MA'
      zip: '02111'
      country: 'US'
    - unique_id: 2
      first_name: 'Stefan'
      last_name: 'Freudenberg'
      company: 'Agaric GmbH'
      city: 'Hamburg'
      state: ''
      zip: '21073'
      country: 'DE'
    - unique_id: 3
      first_name: 'Benjamin'
      last_name: 'Melançon'
      company: 'Agaric SA'
      city: 'Managua'
      state: 'Managua'
      zip: ''
      country: 'NI'
  ids:
    unique_id:
      type: integer
destination:
  plugin: 'entity:node'
  default_bundle: ud_address

Note that not every address component is set for all addresses. For example, the Nicaraguan address does not contain a ZIP code. And the German address does not contain a state. Also, the Nicaraguan state is fully spelled out: Managua. On the contrary, the USA state is a two letter abbreviation: MA for Massachusetts. One more thing that might not be apparent is that the USA ZIP code belongs to the state of Massachusetts. All of this is important because the module does validation of addresses. The destination is the custom ud_address content type created by the module.

Available subfields

The Address field has 13 subfields available. They can be found in the schema() method of the AddresItem class. Fields are not required to have a one-to-one mapping between their schema and the form widgets used for entering content. This is particularly true for addresses because input elements, labels, and validations change dynamically based on the selected country. The following is a reference list of all subfields for addresses:

  1. langcode for language code.
  2. country_code for country.
  3. administrative_area for administrative area (e.g., state or province).
  4. locality for locality (e.g. city).
  5. dependent_locality for dependent locality (e.g. neighbourhood).
  6. postal_code for postal or ZIP code.
  7. sorting_code for sorting code.
  8. address_line1 for address line 1.
  9. address_line2 for address line 2.
  10. organization for company.
  11. given_name for first name.
  12. additional_name for middle name.
  13. family_name for last name:

Properly describing an address is not trivial. For example, there are discussions to add a third address line component. Check this issue if you need this functionality or would like to participate in the discussion.

Address subfield mappings

In the example, only 9 out of the 13 subfields will be mapped. The following code snippet shows how to do the processing of the address field:

field_ud_address/given_name: first_name
field_ud_address/family_name: last_name
field_ud_address/organization: company
field_ud_address/address_line1:
  plugin: default_value
  default_value: 'It is a secret ;)'
field_ud_address/address_line2:
  plugin: default_value
  default_value: 'Do not tell anyone :)'
field_ud_address/locality: city
field_ud_address/administrative_area: state
field_ud_address/postal_code: zip
field_ud_address/country_code: country

The mapping is relatively simple. You specify a value for each subfield. The tricky part is to know the name of the subfield and the value to store in it. The format for an address component can change among countries. The easiest way to see what components are expected for each country is to create a node for a content type that has an address field. With this example, you can go to /node/add/ud_address and try it yourself. For simplicity sake, let’s consider only 3 countries:

  • For USA, city, state, and ZIP code are all required. And for state, you have a specific list form which you need to select from.
  • For Germany, the company is moved above first and last name. The ZIP code label changes to Postal code and it is required. The city is also required. It is not possible to set a state.
  • For Nicaragua, the Postal code is optional. The State label changes to Department. It is required and offers a predefined list to choose from. The city is also required.

Pay very close attention. The available subfields will depend on the country. Also, the form labels change per country or language settings. They do not necessarily match the subfield names. Moreover, the values that you see on the screen might not match what is stored in the database. For example, a Nicaraguan address will store the full department name like Managua. On the other hand, a USA address will only store a two-letter code for the state like MA for Massachusetts.

Something else that is not apparent even from the user interface is data validation. For example, let’s consider that you have a USA address and select Massachusetts as the state. Entering the ZIP code 55111 will produce the following error: Zip code field is not in the right format. At first glance, the format is correct, a five-digits code. The real problem is that the Address module is validating if that ZIP code is valid for the selected state. It is not valid for Massachusetts. 55111 is a ZIP code for the state of Minnesota which makes the validation fail. Unfortunately, the error message does not indicate that. Nine-digits ZIP codes are accepted as long as they belong to the state that is selected.

Finding expected values

Values for the same subfield can vary per country. How can you find out which value to use? There are a few ways, but they all require varying levels of technical knowledge or access to resources:

  • You can inspect the source code of the address field widget. When the country and state components are rendered as select input fields (dropdowns), you can have a look at the value attribute for the option that you want to select. This will contain the two-letter code for countries, the two-letter abbreviations for USA states, and the fully spelled string for Nicaraguan departments.
  • You can use the Devel module. Create a node containing an address. Then use the devel tab of the node to inspect how the values are stored. It is not recommended to have the devel module in a production site. In fact, do not deploy the code even if the module is not enabled. This approach should only be used in a local development environment. Make sure no module or configuration is committed to the repo nor deployed.
  • You can inspect the database. Look for the records in a table named node__field_[field_machine_name], if migrating nodes. First create some example nodes via the user interface and then query the table. You will see how Drupal stores the values in the database.

If you know a better way, please share it in the comments.

The commerceguys addressing library

With version 8 came many changes in the way Drupal is developed. Now there is an intentional effort to integrate with the greater PHP ecosystem. This involves using already existing libraries and frameworks, like Symfony. But also, making code written for Drupal available as external libraries that could be used by other projects. commerceguysaddressing is one example of a library that was made available as an external library. That being said, the Address module also makes use of it.

Explaining how the library works or where its fetches its database is beyond the scope of this article. Refer to the library documentation for more details on the topic. We are only going to point out some things that are relevant for the migration. For example, the ZIP code validation happens at the validatePostalCode() method of the AddressFormatConstraintValidator class. There is no need to know this for a migration project. But the key thing to remember is that the migration can be affected by third-party libraries outside of Drupal core or contributed modules. Another example, is the value for the state subfield. Address module expects a subdivision as listed in one of the files in the resources/subdivision directory.

Does the validation really affect the migration? We have already mentioned that the Migrate API bypasses Form API validations. And that is true for address fields as well. You can migrate a USA address with state Florida and ZIP code 55111. Both are invalid because you need to use the two-letter state code FL and use a valid ZIP code within the state. Notwithstanding, the migration will not fail in this case. In fact, if you visit the migrated node you will see that Drupal happily shows the address with the data that you entered. The problems arrives when you need to use the address. If you try to edit the node you will see that the state will not be preselected. And if you try to save the node after selecting Florida you will get the validation error for the ZIP code.

This validation issues might be hard to track because no error will be thrown by the migration. The recommendation is to migrate a sample combination of countries and address components. Then, manually check if editing a node shows the migrated data for all the subfields. Also check that the address passes Form API validations upon saving. This manual testing can save you a lot of time and money down the road. After all, if you have an ecommerce site, you do not want to be shipping your products to wrong or invalid addresses. 😉

Technical note: The commerceguys/addressing library actually follows ISO standards. Particularly, ISO 3166 for country and state codes. It also uses CLDR and Google’s address data. The dataset is stored as part of the library’s code in JSON format.

Migrating countries and zone fields

The Address module offer two more fields types: Country and Zone. Both have only one subfield value which is selected by default. For country, you store the two-letter country code. For zone, you store a serialized version of a Zone object.

What did you learn in today’s blog post? Have you migrated address before? Did you know the full list of subcomponents available? Did you know that data expectations change per country? Please share your answers in the comments. Also, I would be grateful if you shared this blog post with others.

This blog post series, cross-posted at UnderstandDrupal.com as well as here on Agaric.coop, is made possible thanks to these generous sponsors. Contact Understand Drupal if your organization would like to support this documentation project, whether it is the migration series or other topics.

Read more and discuss at agaric.coop.

Web Wash: Use Taxonomy Terms as Webform Options in Drupal 8

Webform has a pretty robust system for managing lists of options. When you create a select box, you can define its options just for that element, or use a predefined list. If you go to Structure, Webforms, Configurations and click on Options, you can see all the predefined options and you can create your own.

If you want to learn how to create your own predefined options check out our tutorial; How to Use Webform Predefined Options in Drupal 8.

One thing to be aware of is that all of these options are stored as config files, which makes perfect sense, it is configuration.

But what if you want editors to manage the options?

Depending on how you deploy Drupal sites if you change an option only on the production site, your change will be overridden the next time you deploy to production because you import all new configuration changes.

To work around this, you could look at using Webform Config Ignore.

Another way of managing options is by using the Taxonomy system. An editor would simply manage all the terms from the Taxonomy page and nothing will be stored in config files.

In this tutorial, you’ll learn how to create a select element which uses a taxonomy vocabulary instead of the standard options.