JGuiGen Replication System - Version 15 - 6/29/2006

Written by G. Hale Pringle Ed. D.

For the purposes of this document, Data Replication is defined as synchronizing selected tables on two or more database servers. The JGuiGen Replication system gets new and changed rows from one database and sends local new and changed rows to another server. At the end of the process the two tables are the same with the exception of a few system fields that should be different.  The objective is to take a snapshot of each system and update the other system to match this snapshot.

Beyond Simple Replication

When you have remote sites that must stand alone (satellite offices, road warriors, sites with intermittent Internet connections, etc.) the usual solution is to replicate the data between sites.  This may be as crude as retyping data or as complex as Oracle’s Replication System which requires experienced Data Base Administrators at all of the sites.

 

JGuiGen has a built-in replication system that is much better than retyping data, but is not as “strong” as the systems developed by major database venders.

 

Having remote sites and evolving software means that you must take care of five major concerns.

 

1.       Initial Distribution of software and data – We will be using Sun’s JAWS, but there are other options (such as the software distribution system developed by the Eclipse project.)  With JAWS, a user attaches to the Internet and automatically gets a copy of the software.  After the first time they will run with the existing copy unless they find a newer copy when they link to the Internet.  In that case the new copy is automatically installed.

2.       Updating the software – JAWS does this very elegantly.

3.       Updating the schema of the remote databases to include changes required by “new” versions of the software.  This is not part of the JAWS system.

4.       Running one-time procedures required by updated software and modified data schemas.  Again this is critical, but is not covered by JAWS or other systems that focus on distributing “versions” of software.

5.       Verifying the accuracy of replicated data.

 

The JGuiGen Replication system addresses all five of these concerns.

INDEX

I. Background
II. Type of replication
III. Objectives
IV. Overview - dummy walk through in English
V. Criteria for Evaluating Replication Systems - Notes about how JGuiGen stands up.
.... Architectural Requirements
....Functional Requirements
....Administrative Criteria
....Resource Management
....Change Management
....Recovery
....Vendor Health
VI. Minimal JGuiGen Replication
VII. System Details
....Assumptions
....Security
....JGReplicate Table
....Over-Booking the JGReplicate Table
VIII. Rules - Requirements that must be met.
IX. Walk Through - following the process step-by-step
XX. Avoiding the IsRemoved Field Requirement
XXI. Moving a site to a new primary server
XXII. Special Notes about the JGReplicate Table - fields used by various types of records
XXIII. Reports that will be available in the JGuiGen Replication system.
XXIV. General Notes
XXV. Discussion of the "PROBLEM WINDOW" during replication. XXVI. Planned JUnit Tests
XXVII. Code Steps - coding that is underway.
Back to Top

I. BACKGROUND

I have designed and worked on homegrown data replication systems since the mid 1990s. This document is my latest and gr8t3%t (nope - just couldn't say it) thinking and it builds on the existing work. There are some differences but this is basically what I spent those years developing and running. The systems I developed connected several dozen sites in 22 different states.

Desktop applications (not web-based) that work with databases fall into three rough categories.

  1. Applications that work solely on one PC using a local database.
  2. Applications that work using some type of client/server system where one or more desktop machines share one database. (This category escalates up into the massive clustered systems.)
  3. Applications that are hybrids of category 1 and 2. These applications have at least one user who works with a stand-alone system and needs to share at least some data with the users on another system. The answer is always some type of replication service, even if this means someone typing the data into the other system. (Oh yes! This happens. More than we want to think.)

The advantage of the hybrid is that users can work off-line. They do not need to be connected to the Internet or to some central database server. The major disadvantage is latency. Some of the data is out-of-date and there is some effort required to move correct data to and from the remote sites when they do come on-line. Regardless. there are times when the hybrid is the only viable solution.

JGuiGen can create applications that work in any of the three modes described above. It handles stand-alone systems and large multi-user client server systems. With the addition of the replication system is can synchronize these two groups. (Note: It has always been able to connect a user to a remote database via the Internet. This replication system just allows that user to work when they are not connected. They can then connect and bring everything up-to-date.)

Back to Top

II. JGuiGen REPLICATION - WHAT TYPE OF REPLICATION IS IT?

There are several different types of replication. Some are download only. Some require a full dump of one site's information and others involve real-time or near real-time synchronization. Using definitions from Marie Barretta's book "Data Replication" JGuiGen uses a modified master/slave asynchronous multiple central site replication scheme. The master/slave designation indicates that data from one site is considered to "own" a record. They are the only ones that are allowed to make changes to that record. (Note: the ownership can be passed around so that someone else can edit the record, but only one site owns a given record at any given time.) The "modified" in this definition come from the fact that JGuiGen allows the non-master users to fill in empty text data in fields that someone else owns. The asynchronous part indicates that the data is not updated in real-time or near real-time. Users can run independently for long periods of time (they probably shouldn't, but they could) and then synchronize their data with the primary site. The "multiple central sites" refers to a system where some users can replicate with a central site and that central site synchronizes with another central site that is "higher" up the chain. All data is eventually replicated, it just may take a couple of days.

Note:  At present JGuiGen Replication is designed to work at the table level.  A while table is owned by one master site.  Since JGuiGen already includes a data dictionary table that contains information about individual columns within each table, it could be expanded to do column level master/slave replication without requiring a major overhaul.  The previous systems I designed had column level controls, but it doesn’t seem necessary in this first pass.

There are two main reasons for including Data Replication with JGuiGen.

  1. JGuiGen ships with the standalone database HSQL. In order to use JGuiGen with one of the other backend databases (e.g. mySQL, PostgreSQL, Oracle), the data_dict, userlogdata, and jguigen_i18n tables need to be created and the data from the HSQL database needs to be moved into them. This replication system can be used to do this.
  2. The other major reason for data replication is to support business applications that want to deploy stand-alone applications via JAWS, or the Eclipse program update system, or something like these and then have users able to run the application in stand-alone mode. The users usually need to be able to synchronize their data with a home/primary site. This replication system also can be used to do this.

Back to Top

III. OBJECTIVES

When I started this project in the mid 1990's I had a few objectives. As the project went on I discovered a number of other things that needed to be covered. Here is a fairly complete list.

  1. Accurate - All rows synchronized, all columns correct
  2. Secure - Resistant to attacks from outside and inside the system.
  3. Easy to use and easy to automate. It should handle (as well as possible) users that don't care about the whole thing. (I view apathy as an "attack" on the integrity of the data. We need to know when a site is not sending and receiving the current data that other sites are using to run our business.)
  4. Reliable - it should just work
  5. Reliable - It should be tolerant of interrupted replication sessions
  6. Reliable - It should be amenable to warm servers to minimize the impact of single point of failure problems.
  7. If there is a problem, the replication system should not make it worse.
  8. Fast - minimal redundant transfer of data (Note: Since whole rows are sent, there is transmission of data that wasn't changed, but this is a tradeoff with accuracy and complexity. Tracking changes at the individual row/column level is very complex or requires logging which is hard to add to legacy system.)
  9. Flexible - system should allow for clusters of users when necessary.
  10. Low impact on users - minimal down time. (This system does not require that users be off the system during replication.)
  11. Low impact on users - There should be no impact on local users when the Internet is unavailable.
  12. Low requirements on existing applications and tables. (The only major one is requiring that rows never be actually deleted. This can be waived if a stored procedure updates a table called JG_Killed_Rows when a row is deleted in a table that will be replicated.)
  13. Time zone and server clock differences tolerant
  14. Scalable - Through a system of secondary primary sites build a system that can get very large. (Obviously this causes a longer lag in the time required for everyone to get new/changed data.)
  15. Manageable - Management should be aware of what is going on.
  16. Manageable - On special note is Management-by-Exception. The system should actively tell administrators when there are problems.
  17. Manageable - Management should be able to override data ownership when necessary.
  18. Manageable - Management should be able to suspend replication of selected tables when necessary.
  19. Can be used to propagate table schema changes across sites.
  20. Can be used to fire one-time-methods or classes that do work needed to populate tables that have had their schema modified. For example if a legacy system had a column called "NAME" which contained first and last names and a schema change created a new "FIRST_NAME" column and a "LAST_NAME" column, you might want to run a method that tried to split the existing data in the NAME column and placed it into the FIRST_NAME and LAST_NAME columns.
  21. Should be able to restore sites that come on-line using a backup that is days or weeks old
  22. Should be able to populate a new site completely with current data
  23. Should be able to work with applications distributed with JAWS.
  24. Should be able to act as an general purpose table replication system.
  25. Cross-platform capable - should be able to handle situations where different sites are using different back-end databases (this is a tough one)
  26. Real-world aware - Features like the ability of any site to enter empty data should be encouraged but optional.
  27. NOT DONE - system should be transaction safe. It is mostly safe but does not guarantee all or nothing transaction delivery. This would be very difficult to add to the system as it now exists.
  28. NOT DONE YET- the system should allow sites to get "selected" data so that they just get what they need.
  29. NOT DONE YET- the system should have a "compare" feature which allows one site to be compared with another to verify tha accuracy of a site's data.
  30. NOT DONE YET - the system should access rules for transforming data from disparate system and creating enhanced (summarized) data as part of the replication process.
  31. NOT DONE YET- systems might have the ability to reliably update summary data or totals or balances as part of the replication.
  32. NOT DONE YET – column level control of data ownership.

Back to Top

IV. OVERVIEW

How the system works is repeated several times below (from different points 0f view), but I would like to start with an English conversation that shows the concept behind the system. Note that in this conversation Sites 3 and 17 doe all of the talking. Primary is very dumb, it just sends rows, updates records and inserts records when Sites 3 and 17 ask it to.

Back to Top

V. EVALUATING REPLICATION SYSTEM and JGuiGen Replication
One section of Ms. Buretta's book listed "General Requirements for Creating and Evaluation Replication Systems." I am going to go down this list and talk about the JGuiGen replication system. I will mark her statements as "Replication Criteria: ....." This gives you a good starting point to make your own evaluation about the suitability of JGuiGen Replication for your own uses.

NOTE: As of 6/27/2006, this document is being written as if this system is operational. It is under active development, but is not fully functional. The code that it is based on exists and in most cases the development consists of recreating the functionality in java. I will update this document as development proceeds.

NOTE 2 This section is written in light of setting up a full-blown, secure, multi sever system. For test purposes and non-critical situations, JGuiGen Replication is much simpler. See the note at the end of the Evaluation section.

ARCHITECTURAL REQUIREMENTS

  1. Replication Criteria: The system should be architected as middleware whose purpose to provide data replication services. This allows the system to be plugged and unplugged with minimal impact on the rest of the system. - JGuiGen Replication meets this criterion. The replication system does not require JGuiGen or its applications to run. It does require an XML file (or LDAP system) with system settings. A data_dict table with some information about the tables to be replicated and a JGReplicate table to store replication information. There is an options JG_Killed_Rows table if your system actually deletes rows.
  2. Replication Criteria: The system should support location/distribution transparency. A database or process should not need to know where objects are. - This is not truly applicable to a database only replication system like JGuiGen Replication. Each site needs to know how to connect to its own database and how to connect to one remote site. JGuiGen Replication does handle time zone differences between primary and target sites.
  3. Replication Criteria: The system should support a high degree of database and process transparency. - JGuiGen meets this criterion for data replication. It will move data from and to any of the databases that JGuiGen supports and can add other database types as long as they have JDBC support.

JGuiGen applications already have almost all of these items in place from the beginning.

NOTE JGuiGen Replication is currently under development and the levels described here are all in place but have not been fully implemented. The system can be run in a much looser fashion when simplicity is more important than data security.

Replication Criteria: Question about system security. How robust are the security implementations? - JGuiGen is very secure against internal attacks and relies on SSL to secure transmission of data. It would be easy to add a process called Tumbling (a form or port knocking) if more security was needed.

Back to Top

FUNCTIONAL REQUIREMENTS

Back to Top

ADMINISTRATIVE CRITERIA

Back to Top

RESOURCE MANAGEMENT

Back to Top

CHANGE MANAGEMENT

CHANGE MANAGEMENT OF DATABASE SYSTEMS

Back to Top

RECOVERY

Back to Top

VENDOR HEALTH

There were several questions about the financial health of the vendor. This is Open Source software. A user has the full source code and can modify it or take over its development at any time. The "vendor" health issues are not particularly relevant.

Given that this is a small open source project developed by one person, it would seem that JGuiGen Replication does a fairly decent job of meeting this standard set of evaluation criteria.

****END OF EVALUTAION SECTION

Back to Top

VI. MINIMAL JGuiGen REPLICATION

As mentioned in a note above, JGuiGen Replication does not require all of these bells and whistles. For a test drive or a non-critical small application all you need is the items in the following list.

  1. The tables to be replicated need to exist with the appropriate schema at the target site. (They obviously must exist as the primary site, but I assume you have something that already exists that you want to replicate.)
  2. The seven variables in each table to be replicated need to exist.
  3. The JGReplicate table needs to exist at the primary site and the target site.
  4. Minimal information needs to be installed in an "M" record on the primary site. (Site number =1, Enter a name, Process = "Y", Next-available-site = 2, Next-available-site- lock = "N") JGuiGen supplies a GUI for entering this data.
  5. Table records ("T" records) in JGReplicate on the primary site. These need a table_name, Process = "Y". Again, JGuiGen supplies a GUI for entering this data.
  6. A primary site with a port open to the Internet or an intranet.
  7. Settings in the JGuiGenIni.xml file to attach to a local database and a remote database.
  8. The JGuiGen application with it's menu option - Maintenance --> Replicate

This is how we have users who want to use a larger database system than HSQL populate their new database from the HSQL records that ship with JGuiGen.

Back to Top

VII. SYSTEM DETAILS

Replication involves databases running at multiple sites. For each database system using JGuiGen replication there is one master primary site. There can be secondary primary sites. For example a remote office could call into the main office, but all sites local to the remote office would get their replications from the remote office. All sites connect to a primary and update it. Secondary primaries connect to the master primary. The remote sites also get new and updated rows from their primary site. Each site has a unique site number across the whole system.

Every table that will be replicated by JGuiGen must have seven variables in it These are used by this replication system.

Back to Top

Assumptions

Back to Top

Security

Back to Top

JGReplicate Table

There is a special table at each site called JGReplicate. It contains one row for each table to be updated and log entries that track replications attempted and completed.
Note: In the following rows the [] notations indicate which record types use the field.

The schema of the JGReplicate table is

NOTE:  When a site tries to replicate and their Previous-local-high-date-time does NOT match the Previous-local-high-date-time at the primary server an error record is written and an email is sent to the System administrators.  This usually means that two different locations are trying to use the same site number.  The only exception is when the One-time-mismatch-of-times-allowed flag is set to “Y”. 

Back to Top

Over-Booking the JGReplicate Table

This table is "over-booked". It has at least nine different things that it takes care of.

There are two main values from the previous replication kept in the "Table" row for each table in the JGReplicate table

  1. The first major value is the date/time of the last replication. When a table is being replicated, one of the first steps is to ask the primary site for all rows at the primary site that have been inserted or changed since this date. This date/time will be exactly the same for each table even though the system processes tables one at a time. We basically want a snapshot at the time we are calling MARK. "Send me everything new up on tine the MARK time."
    Note: To originally fill a site with data this date/time is set to a date/time before the system existed. This will cause all of the rows from the primary site to be sent.
    Note: The JGuiGen replication system is not transaction safe. It is possible with this system if two sites are replicating at the same time to get part of one site's new data, but not all of it. For example if site 2 asks for all updates on a table and gets lots of rows to process, site 3 could then process the same table and another table before site 2 finishes. This would result in site 2 not having site 3's data for the first table and having site 3's data for the second table. The next replication takes care of this.
  2. The second important value is the date/time of the most recently updated record in the local table as of the last replication. Any rows with a lastChangedDate date/time stamp higher than this date/time will have been inserted or changed locally since the last replication.

Back to Top

VIII. RULES-