The ETL technique is the most common approach when it comes to the migration of database. It means main three phase of the procedure that are extract, transform, and load. From the original database; schemas, constraints, and data are extracted and this is the extraction phase. In the transformation phase, the extracted objects are transformed before they are stored in the proper format. Loading the schemas, data, and constraints into the final target database comprises the loading phase.
Firstly, the ETL process involves the extraction of database objects from the source. This is considered the most important aspect of ETL approach to database migration. The reason is because the extraction of database objects is the foundation for the success of other parts of the migration process. XML and JSON are the most appropriate formats for intermediate schema. But the CSV (comma separated values) are the most appropriate formats for temporary storage. However, moving to the next migration phase depends on the inclusion of validation rules especially during complicated databases.
There are series of rules that guide the transformation stage. Before the database objects can be loaded into the target database, preparation is necessary. But the most significant functions of the transformation are:
- Mapping types without direct equivalents
- Ensuring compliance with destination rules by cleaning data
- Character sets which are absent in target but present in the source system.
However, there are several transformation modules that are involved in database migration process between several systems. These are:
- Empty values may be treated as NULL by certain DBMS while others can distinguish the differences.
- In some DBMS, Boolean type is defined as synonym to bit (0,1) while it is defined as enum(‘t’,’f’) in other systems but it will require appropriate conversion.
- Normally, influenced columns of child table are filled with regular indexes. These indexes are a core requirement. For the same purpose, some DBMS may require unique index as well.
- Different DBMS systems have distinguished limitations to:
- Number of columns
- Indexes
- Length of primary keys, and so on.
The schemas, constraints, and data are then imported to the destination database. Usually, data is updated during a time frame – be it daily, weekly, or monthly. But the requirements have a tendency to overwrite or update such existing data. It is important to disable constraints and triggers when these existing tables are updated. By doing these, data integrity is preserved and performance of the ETL process is increased. For complex systems, history and audit trail of respective changes made to the system are required.
When handling large data warehouses, database migration can be tedious, complicated, and time-consuming. This happens for selected simple data as well. The most common occurrence during this migration process is data loss and data corruption which is largely due to human factor.
Thus, the minimization of these errors is very important by a specified automation routine that can reduce the errors. Intelligent Converters, a software converter company has provided solutions for database migration based on ETL technique between different systems – SQLite, Firebird, Microsoft Access, Oracle, MySQL, Server, FoxPro, and SAAS platforms.