Software localization

Rails Validation: Pitfalls in Validating Uniqueness with Active Record

Rails provides an easy way to validate records for uniqueness, but relying solely on Rails validation could be quite troublesome. Here's a solution.

Software localization blog category featured image | Phrase

The tagging of translation keys was added as a feature in the very early stages of Phrase's product development. The feature was continuously improved over time and its performance – very reliable. Until a customer contacted us one day due to a validation error.

The Problem

The tagging feature had been used rather extensively by this customer to automatically tag all uploaded keys with a GitHub pull request, leading to a validation error with an existing tag. So we investigated the issue and discovered that the error was caused by a tag name which was not unique to the project. So how could that happen? We had used Active Record uniqueness validation and only Rails without skipping the validation.

Code State

The last state was that the Tag should have a name field and belong to a project as well as have an Active Record validation, so that a tag was unique to the project_id and name:

State of Codebase

class Tag < ActiveRecord::Base

  has_and_belongs_to_many :translation_keys

  ...

  belongs_to :project

  belongs_to :account

  ...

  validates_uniqueness_of :name, scope: :project_id

  ...

end

Our code to create new tags looked something like this:

class Keys::TagService

  def self.create_or_find_tag_by_name_for_project(project, tag_name, user, ...)

    ...

    tag = TagRepository.tags_for_project(project).find_or_initialize_by(name: tag_name)

    if tag.new_record? && tag.save

      ...

    end

    tag

  end

end

State within Database

Checking our database revealed that there were 40k records that were not unique.

SELECT COUNT(*) FROM

  (SELECT COUNT(*)

   FROM tags

   GROUP BY project_id, name

   HAVING COUNT(*) > 1) AS g

Furthermore, we saw that all of the duplicated records were system generated tags by our upload feature. Uploads are processed asynchronously by concurrent workers. By using our client's push command an upload for each locale is triggered. On any upload the user can allocate a tag, the keys should also be tagged. This combination dramatically increases the probability of two processes trying to create a tag at the same time.

What happened?

If uniqueness validation is enabled, Rails will look for existing records before performing Tag.create, Tag.save, Tag.update ... operations. If a record was found the validation fails and the transaction will be rolled back, if not the record will be saved. Example of how Rails validation fails:

pry(main)> Tag.create(name: "test", project_id: 1)

   (0.2ms)  BEGIN

  ...

  Tag Exists (0.2ms)  SELECT  1 AS one FROM `tags` WHERE (`tags`.`name` = BINARY 'test' AND `tags`.`project_id` = 1) LIMIT 1

   (0.1ms)  ROLLBACK

And here's an example of a successful Rails validation:

pry(main)> Tag.create(name: "test", project_id: 1)

   (0.2ms)  BEGIN

  ### Some other validations ###

  ..

  Tag Exists (0.3ms)  SELECT  1 AS one FROM `tags` WHERE (`tags`.`name` = BINARY 'test' AND `tags`.`project_id` = 1) LIMIT 1

  ..

  ### some before_save and before_create hooks ###

  ..

  SQL (0.2ms)  INSERT INTO `tags` (`name`, `project_id`, `slug`, `created_at`, `updated_at`) VALUES ('test', 1, 'test', '2017-02-13 12:56:07', '2017-02-13 12:56:07')

   (14.7ms)  COMMIT

Through this example you can see that there is no lock on the table between the SELECT and the INSERT statement. In this time gap another concurrent process can also create a record with the same name and the same project_id without any failing validation. As you can see hooks are also triggered between the validation and INSERT statement. The increase of the number and the complexity of hooks, will also lead to an increased probability of the creation of non unique records.

Solution and Learnings

The solution in preventing the creation non unique records would therefore be setting up a unique database index. There was already an index on project_id and name but the index was not unique. To make this index unique multiple without downtime, these steps are needed:

Make sure no new duplicated records can be created
Cleanup Database
Remove old index
Create a new unique index

Preventing the creation of non unique Records between Cleanup and index migration

Adding a unique index to a table which contains non unique entries will raise an exception. Therefore, we will need to cleanup the database before adding the index. If new duplicated entries are introduced between the cleanup and the adding of the index, the migration will fail. So we first have to make sure that no new duplicated records can be created. You can solve this by using temporary tables, but we chose another approach. As MySQL ignores NULL values for unique indexes, we add a new column that we will set to project_id and the name concatenated to the new records. The default value of this column should be NULL. On this field we are able to add a unique index.

class AddTmpFieldToTags < ActiveRecord::Migration

  def change

    add_column :tags, :tmp_field, :string, null: true, default: nil

    add_index :tags, [:tmp_field], unique: true

  end

end

When we are not creating any tag record skipping hooks, we can add a before_save hook that sets the tmp_field to the concatenation of project_id and name. This index will prevent the creation of non unique tag entries. Now the application will raise an exception every time while trying to create a duplicated entry, in order to avoid this we should change our code for creating tags to handle this issue. Rails will raise a RecordNotUnique exception in this case so we can catch this and select and return the existing tag. Here you should make sure to reload, so as to prevent strange query caching issues.

...

  def self.create_or_find_tag_by_name_for_project(project, tag_name, user)

    tag = TagRepository.tags_for_project(project).find_or_initialize_by(name: tag_name)

    begin

      if tag.new_record? && tag.save

        ...

      end

    rescue ActiveRecord::RecordNotUnique

      tag = TagRepository.tags_for_project(project).reload.find_by(name: tag_name)

    end

    tag

  end

...

Cleanup and Index change

After the changes are deployed, we can cleanup the database. Here you can go for migration or a script. We decided to do this by a migration. During the cleanup it is important to set the correct associations, because we do not want to lose the association between tags and the translation keys. After the cleanup is complete the index can be changed to be unique.

 class AddUniqueIndexOnProjectIdAndNameToTags < ActiveRecord::Migration def change remove_index :tags, [:project_id, :name] cleanup add_index :tags, [:project_id, :name], unique: true end private def cleanup dub_tags = ActiveRecord::Base.connection.exec_query("SELECT name, project_id, COUNT(*) FROM tags GROUP BY name, project_id HAVING COUNT(*) > 1").to_hash

    dub_tags.each do |tag|

      unify_tag(tag["name"], tag["project_id"])

    end

  end

  def unify_tag(name, project_id)

    tags = Tag.where(project_id: project_id, name: name).order(:id)

    return if tags.count < 2

    Tag.transaction do

      keys_to_tag = []

      tag_to_keep = tags.to_a.first

      tags.each_with_index do |tag, index|

        keys_to_tag = keys_to_tag.concat(tag.translation_keys.to_a)

        tag.destroy if tag.id != tag_to_keep.id

      end

      tag_to_keep.translation_keys = keys_to_tag.uniq { |key| key.id }

      tag_to_keep.save

    end

  end

end

After this migration has run. The temporary field and the hook to fill it can be removed.