Pitfalls In The Validation For Uniqueness Using Rails ActiveRecord

ActiveRecord Validation in Rails provide an easy way to validate records for uniqueness. With the help of a real world example we show why relying solely on Rails validation could become a problem and demonstrate a way to cleanup your database without downtime.

The Problem

The tagging of translation keys was added as a product feature in the very early stages of the PhraseApp development. Although the feature was constantly improved over time and the performance was very reliable, one day a customer contacted us due to a validation error. The tagging feature had been used rather extensively by this customer to automatically tag all uploaded keys with a GitHub Pull Request, leading to a validation error with an existing tag. So we investigated the issue and discovered that the error was caused by a tag name which was not unique to the project. So how could that happened? We had used ActiveRecord uniqueness validation and only used Rails without skipping the validation.

Code State

The last state was that the Tag should have a name field and belong to a project as well as have an ActiveRecord validation, so that a tag was unique to the project_id and name:

State of Codebase

Our code to create new tags looked something like this:

State within Database

Checking our database revealed that there were 40k records that were not unique.

Furthermore, we saw that all of the duplicated records were system generated tags by our upload feature. Uploads are processed asynchronous by concurrent workers. By using our client’s push command an upload for each locale is triggered. On any upload the user can allocate a tag, the keys should also be tagged. This combination dramatically increases the probability of two processes trying to create a tag at the same time.

What happend?

If uniqueness validation is enabled, Rails will look for existing records before performing Tag.create, Tag.save, Tag.update ... operations. If a record was found the validation fails and the transaction will be rolled back, if not the record will be saved.

Example Validation fail:

Example success

Through this example you can see that there is no lock on the table between the SELECT and the INSERT statement. In this time gap another concurrent process can also create a record with the same name and the same project_id without any failing validation. As you can see hooks are also triggered between the validation and INSERT statement. The increase of the number and the complexity of hooks, will also lead to an increased probability of the creation of non unique records.

Solution and Learnings

The solution in preventing the creation non unique records would therefore be setting up a unique database index. There was already an index on project_id and name but the index was not unique.

To make this index unique multiple without downtime, these steps are needed:

  1. Make sure no new duplicated records can be created
  2. Cleanup Database
  3. Remove old index
  4. Create a new unique index

Preventing the creation of non unique Records between Cleanup and index migration

Adding a unique index to a table which contains non unique entries will raise an exception. Therefore, we will need to cleanup the database before adding the index. If new duplicated entries are introduced between the cleanup and the adding of the index, the migration will fail. So we first have to make sure that no new duplicated records can be created. You can solve this by using temporary tables, but we chose another approach. As MySQL ignores NULL values for unique indexes, we add a new column that we will set to project_id and the name concatenated to the new records. The default value of this column should be NULL. On this field we are able to add a unique index.

When we are not creating any tag record skipping hooks, we can add a before_save hook that sets the tmp_field to the concatenation of project_id and name. This index will prevent the creation of non unique tag entries.

Now the application will raise an exception every time while trying to create a duplicated entry, in order to avoid this we should change our code for creating tags to handle this issue. Rails will raise a RecordNotUnique exception in this case so we can catch this and select and return the existing tag. Here you should make sure to reload, so as to prevent strange query caching issues.

Cleanup and Index change

After the changes are deployed, we can cleanup the database. Here you can go for migration or a script. We decided to do this by a migration. During the cleanup it is important to set the correct associations, because we do not want to lose the association between tags and the translation keys. After the cleanup is complete the index can be changed to be unique.

After this migration has run. The temporary field and the hook to fill it can be removed.

Lessons Learned

  • When to use a unique index with the Rails uniqueness validation
  • How to conduct a cleanup and add a unique index with zero downtime on MySQL.


Also published on Medium.

 

Comments