JGuiGen Replication System - Version 15 - 6/29/2006

Written by G. Hale Pringle Ed. D.

For the purposes of this document, Data Replication is defined as synchronizing selected tables on two or more database servers. The JGuiGen Replication system gets new and changed rows from one database and sends local new and changed rows to another server. At the end of the process the two tables are the same with the exception of a few system fields that should be different.  The objective is to take a snapshot of each system and update the other system to match this snapshot.

Beyond Simple Replication

When you have remote sites that must stand alone (satellite offices, road warriors, sites with intermittent Internet connections, etc.) the usual solution is to replicate the data between sites.  This may be as crude as retyping data or as complex as Oracle’s Replication System which requires experienced Data Base Administrators at all of the sites.

 

JGuiGen has a built-in replication system that is much better than retyping data, but is not as “strong” as the systems developed by major database venders.

 

Having remote sites and evolving software means that you must take care of five major concerns.

 

1.       Initial Distribution of software and data – We will be using Sun’s JAWS, but there are other options (such as the software distribution system developed by the Eclipse project.)  With JAWS, a user attaches to the Internet and automatically gets a copy of the software.  After the first time they will run with the existing copy unless they find a newer copy when they link to the Internet.  In that case the new copy is automatically installed.

2.       Updating the software – JAWS does this very elegantly.

3.       Updating the schema of the remote databases to include changes required by “new” versions of the software.  This is not part of the JAWS system.

4.       Running one-time procedures required by updated software and modified data schemas.  Again this is critical, but is not covered by JAWS or other systems that focus on distributing “versions” of software.

5.       Verifying the accuracy of replicated data.

 

The JGuiGen Replication system addresses all five of these concerns.

INDEX

I. Background
II. Type of replication
III. Objectives
IV. Overview - dummy walk through in English
V. Criteria for Evaluating Replication Systems - Notes about how JGuiGen stands up.
.... Architectural Requirements
....Functional Requirements
....Administrative Criteria
....Resource Management
....Change Management
....Recovery
....Vendor Health
VI. Minimal JGuiGen Replication
VII. System Details
....Assumptions
....Security
....JGReplicate Table
....Over-Booking the JGReplicate Table
VIII. Rules - Requirements that must be met.
IX. Walk Through - following the process step-by-step
XX. Avoiding the IsRemoved Field Requirement
XXI. Moving a site to a new primary server
XXII. Special Notes about the JGReplicate Table - fields used by various types of records
XXIII. Reports that will be available in the JGuiGen Replication system.
XXIV. General Notes
XXV. Discussion of the "PROBLEM WINDOW" during replication. XXVI. Planned JUnit Tests
XXVII. Code Steps - coding that is underway.
Back to Top

I. BACKGROUND

I have designed and worked on homegrown data replication systems since the mid 1990s. This document is my latest and gr8t3%t (nope - just couldn't say it) thinking and it builds on the existing work. There are some differences but this is basically what I spent those years developing and running. The systems I developed connected several dozen sites in 22 different states.

Desktop applications (not web-based) that work with databases fall into three rough categories.

  1. Applications that work solely on one PC using a local database.
  2. Applications that work using some type of client/server system where one or more desktop machines share one database. (This category escalates up into the massive clustered systems.)
  3. Applications that are hybrids of category 1 and 2. These applications have at least one user who works with a stand-alone system and needs to share at least some data with the users on another system. The answer is always some type of replication service, even if this means someone typing the data into the other system. (Oh yes! This happens. More than we want to think.)

The advantage of the hybrid is that users can work off-line. They do not need to be connected to the Internet or to some central database server. The major disadvantage is latency. Some of the data is out-of-date and there is some effort required to move correct data to and from the remote sites when they do come on-line. Regardless. there are times when the hybrid is the only viable solution.

JGuiGen can create applications that work in any of the three modes described above. It handles stand-alone systems and large multi-user client server systems. With the addition of the replication system is can synchronize these two groups. (Note: It has always been able to connect a user to a remote database via the Internet. This replication system just allows that user to work when they are not connected. They can then connect and bring everything up-to-date.)

Back to Top

II. JGuiGen REPLICATION - WHAT TYPE OF REPLICATION IS IT?

There are several different types of replication. Some are download only. Some require a full dump of one site's information and others involve real-time or near real-time synchronization. Using definitions from Marie Barretta's book "Data Replication" JGuiGen uses a modified master/slave asynchronous multiple central site replication scheme. The master/slave designation indicates that data from one site is considered to "own" a record. They are the only ones that are allowed to make changes to that record. (Note: the ownership can be passed around so that someone else can edit the record, but only one site owns a given record at any given time.) The "modified" in this definition come from the fact that JGuiGen allows the non-master users to fill in empty text data in fields that someone else owns. The asynchronous part indicates that the data is not updated in real-time or near real-time. Users can run independently for long periods of time (they probably shouldn't, but they could) and then synchronize their data with the primary site. The "multiple central sites" refers to a system where some users can replicate with a central site and that central site synchronizes with another central site that is "higher" up the chain. All data is eventually replicated, it just may take a couple of days.

Note:  At present JGuiGen Replication is designed to work at the table level.  A while table is owned by one master site.  Since JGuiGen already includes a data dictionary table that contains information about individual columns within each table, it could be expanded to do column level master/slave replication without requiring a major overhaul.  The previous systems I designed had column level controls, but it doesn’t seem necessary in this first pass.

There are two main reasons for including Data Replication with JGuiGen.

  1. JGuiGen ships with the standalone database HSQL. In order to use JGuiGen with one of the other backend databases (e.g. mySQL, PostgreSQL, Oracle), the data_dict, userlogdata, and jguigen_i18n tables need to be created and the data from the HSQL database needs to be moved into them. This replication system can be used to do this.
  2. The other major reason for data replication is to support business applications that want to deploy stand-alone applications via JAWS, or the Eclipse program update system, or something like these and then have users able to run the application in stand-alone mode. The users usually need to be able to synchronize their data with a home/primary site. This replication system also can be used to do this.

Back to Top

III. OBJECTIVES

When I started this project in the mid 1990's I had a few objectives. As the project went on I discovered a number of other things that needed to be covered. Here is a fairly complete list.

  1. Accurate - All rows synchronized, all columns correct
  2. Secure - Resistant to attacks from outside and inside the system.
  3. Easy to use and easy to automate. It should handle (as well as possible) users that don't care about the whole thing. (I view apathy as an "attack" on the integrity of the data. We need to know when a site is not sending and receiving the current data that other sites are using to run our business.)
  4. Reliable - it should just work
  5. Reliable - It should be tolerant of interrupted replication sessions
  6. Reliable - It should be amenable to warm servers to minimize the impact of single point of failure problems.
  7. If there is a problem, the replication system should not make it worse.
  8. Fast - minimal redundant transfer of data (Note: Since whole rows are sent, there is transmission of data that wasn't changed, but this is a tradeoff with accuracy and complexity. Tracking changes at the individual row/column level is very complex or requires logging which is hard to add to legacy system.)
  9. Flexible - system should allow for clusters of users when necessary.
  10. Low impact on users - minimal down time. (This system does not require that users be off the system during replication.)
  11. Low impact on users - There should be no impact on local users when the Internet is unavailable.
  12. Low requirements on existing applications and tables. (The only major one is requiring that rows never be actually deleted. This can be waived if a stored procedure updates a table called JG_Killed_Rows when a row is deleted in a table that will be replicated.)
  13. Time zone and server clock differences tolerant
  14. Scalable - Through a system of secondary primary sites build a system that can get very large. (Obviously this causes a longer lag in the time required for everyone to get new/changed data.)
  15. Manageable - Management should be aware of what is going on.
  16. Manageable - On special note is Management-by-Exception. The system should actively tell administrators when there are problems.
  17. Manageable - Management should be able to override data ownership when necessary.
  18. Manageable - Management should be able to suspend replication of selected tables when necessary.
  19. Can be used to propagate table schema changes across sites.
  20. Can be used to fire one-time-methods or classes that do work needed to populate tables that have had their schema modified. For example if a legacy system had a column called "NAME" which contained first and last names and a schema change created a new "FIRST_NAME" column and a "LAST_NAME" column, you might want to run a method that tried to split the existing data in the NAME column and placed it into the FIRST_NAME and LAST_NAME columns.
  21. Should be able to restore sites that come on-line using a backup that is days or weeks old
  22. Should be able to populate a new site completely with current data
  23. Should be able to work with applications distributed with JAWS.
  24. Should be able to act as an general purpose table replication system.
  25. Cross-platform capable - should be able to handle situations where different sites are using different back-end databases (this is a tough one)
  26. Real-world aware - Features like the ability of any site to enter empty data should be encouraged but optional.
  27. NOT DONE - system should be transaction safe. It is mostly safe but does not guarantee all or nothing transaction delivery. This would be very difficult to add to the system as it now exists.
  28. NOT DONE YET- the system should allow sites to get "selected" data so that they just get what they need.
  29. NOT DONE YET- the system should have a "compare" feature which allows one site to be compared with another to verify tha accuracy of a site's data.
  30. NOT DONE YET - the system should access rules for transforming data from disparate system and creating enhanced (summarized) data as part of the replication process.
  31. NOT DONE YET- systems might have the ability to reliably update summary data or totals or balances as part of the replication.
  32. NOT DONE YET – column level control of data ownership.

Back to Top

IV. OVERVIEW

How the system works is repeated several times below (from different points 0f view), but I would like to start with an English conversation that shows the concept behind the system. Note that in this conversation Sites 3 and 17 doe all of the talking. Primary is very dumb, it just sends rows, updates records and inserts records when Sites 3 and 17 ask it to.

Back to Top

V. EVALUATING REPLICATION SYSTEM and JGuiGen Replication
One section of Ms. Buretta's book listed "General Requirements for Creating and Evaluation Replication Systems." I am going to go down this list and talk about the JGuiGen replication system. I will mark her statements as "Replication Criteria: ....." This gives you a good starting point to make your own evaluation about the suitability of JGuiGen Replication for your own uses.

NOTE: As of 6/27/2006, this document is being written as if this system is operational. It is under active development, but is not fully functional. The code that it is based on exists and in most cases the development consists of recreating the functionality in java. I will update this document as development proceeds.

NOTE 2 This section is written in light of setting up a full-blown, secure, multi sever system. For test purposes and non-critical situations, JGuiGen Replication is much simpler. See the note at the end of the Evaluation section.

ARCHITECTURAL REQUIREMENTS

  1. Replication Criteria: The system should be architected as middleware whose purpose to provide data replication services. This allows the system to be plugged and unplugged with minimal impact on the rest of the system. - JGuiGen Replication meets this criterion. The replication system does not require JGuiGen or its applications to run. It does require an XML file (or LDAP system) with system settings. A data_dict table with some information about the tables to be replicated and a JGReplicate table to store replication information. There is an options JG_Killed_Rows table if your system actually deletes rows.
  2. Replication Criteria: The system should support location/distribution transparency. A database or process should not need to know where objects are. - This is not truly applicable to a database only replication system like JGuiGen Replication. Each site needs to know how to connect to its own database and how to connect to one remote site. JGuiGen Replication does handle time zone differences between primary and target sites.
  3. Replication Criteria: The system should support a high degree of database and process transparency. - JGuiGen meets this criterion for data replication. It will move data from and to any of the databases that JGuiGen supports and can add other database types as long as they have JDBC support.

JGuiGen applications already have almost all of these items in place from the beginning.

NOTE JGuiGen Replication is currently under development and the levels described here are all in place but have not been fully implemented. The system can be run in a much looser fashion when simplicity is more important than data security.

Replication Criteria: Question about system security. How robust are the security implementations? - JGuiGen is very secure against internal attacks and relies on SSL to secure transmission of data. It would be easy to add a process called Tumbling (a form or port knocking) if more security was needed.

Back to Top

FUNCTIONAL REQUIREMENTS

Back to Top

ADMINISTRATIVE CRITERIA

Back to Top

RESOURCE MANAGEMENT

Back to Top

CHANGE MANAGEMENT

CHANGE MANAGEMENT OF DATABASE SYSTEMS

Back to Top

RECOVERY

Back to Top

VENDOR HEALTH

There were several questions about the financial health of the vendor. This is Open Source software. A user has the full source code and can modify it or take over its development at any time. The "vendor" health issues are not particularly relevant.

Given that this is a small open source project developed by one person, it would seem that JGuiGen Replication does a fairly decent job of meeting this standard set of evaluation criteria.

****END OF EVALUTAION SECTION

Back to Top

VI. MINIMAL JGuiGen REPLICATION

As mentioned in a note above, JGuiGen Replication does not require all of these bells and whistles. For a test drive or a non-critical small application all you need is the items in the following list.

  1. The tables to be replicated need to exist with the appropriate schema at the target site. (They obviously must exist as the primary site, but I assume you have something that already exists that you want to replicate.)
  2. The seven variables in each table to be replicated need to exist.
  3. The JGReplicate table needs to exist at the primary site and the target site.
  4. Minimal information needs to be installed in an "M" record on the primary site. (Site number =1, Enter a name, Process = "Y", Next-available-site = 2, Next-available-site- lock = "N") JGuiGen supplies a GUI for entering this data.
  5. Table records ("T" records) in JGReplicate on the primary site. These need a table_name, Process = "Y". Again, JGuiGen supplies a GUI for entering this data.
  6. A primary site with a port open to the Internet or an intranet.
  7. Settings in the JGuiGenIni.xml file to attach to a local database and a remote database.
  8. The JGuiGen application with it's menu option - Maintenance --> Replicate

This is how we have users who want to use a larger database system than HSQL populate their new database from the HSQL records that ship with JGuiGen.

Back to Top

VII. SYSTEM DETAILS

Replication involves databases running at multiple sites. For each database system using JGuiGen replication there is one master primary site. There can be secondary primary sites. For example a remote office could call into the main office, but all sites local to the remote office would get their replications from the remote office. All sites connect to a primary and update it. Secondary primaries connect to the master primary. The remote sites also get new and updated rows from their primary site. Each site has a unique site number across the whole system.

Every table that will be replicated by JGuiGen must have seven variables in it These are used by this replication system.

Back to Top

Assumptions

Back to Top

Security

Back to Top

JGReplicate Table

There is a special table at each site called JGReplicate. It contains one row for each table to be updated and log entries that track replications attempted and completed.
Note: In the following rows the [] notations indicate which record types use the field.

The schema of the JGReplicate table is

NOTE:  When a site tries to replicate and their Previous-local-high-date-time does NOT match the Previous-local-high-date-time at the primary server an error record is written and an email is sent to the System administrators.  This usually means that two different locations are trying to use the same site number.  The only exception is when the One-time-mismatch-of-times-allowed flag is set to “Y”. 

Back to Top

Over-Booking the JGReplicate Table

This table is "over-booked". It has at least nine different things that it takes care of.

There are two main values from the previous replication kept in the "Table" row for each table in the JGReplicate table

  1. The first major value is the date/time of the last replication. When a table is being replicated, one of the first steps is to ask the primary site for all rows at the primary site that have been inserted or changed since this date. This date/time will be exactly the same for each table even though the system processes tables one at a time. We basically want a snapshot at the time we are calling MARK. "Send me everything new up on tine the MARK time."
    Note: To originally fill a site with data this date/time is set to a date/time before the system existed. This will cause all of the rows from the primary site to be sent.
    Note: The JGuiGen replication system is not transaction safe. It is possible with this system if two sites are replicating at the same time to get part of one site's new data, but not all of it. For example if site 2 asks for all updates on a table and gets lots of rows to process, site 3 could then process the same table and another table before site 2 finishes. This would result in site 2 not having site 3's data for the first table and having site 3's data for the second table. The next replication takes care of this.
  2. The second important value is the date/time of the most recently updated record in the local table as of the last replication. Any rows with a lastChangedDate date/time stamp higher than this date/time will have been inserted or changed locally since the last replication.

Back to Top

VIII. RULES-

NOTE: Need a way to display a message to a local user to indicate that their site has gone too long without replication. This can be added to JGuiGen itself, but a small process is needed that checks every x minutes and pops-up a message or checks when a machine is booted up.

Note: Check into DCE Security requirements more

Back to Top

IX. WALK THROUGH - rows are inserted/changed

Row is added at the primary site.

Row is changed on the primary site.

Row is added at a remote site

Row is changed at a remote site.

Note: As long as the row_id is automatically set and the lastChangedDate column is automatically updated, JGuiGen can replicate tables that were created for applications other than JGuiGen. See the section of JG_Killed_Rows.

Row is deleted anywhere - this is the same as a change since the "isRemoved" flag is a column in the table and it is changed. lastChangedDate field is updated to current date/time. See the section of JG_Killed_Rows.

A site attaches to the base database.

  1. The system checks with the local JGReplicate table to get the local site number
  2. The system checks in the JGuiGenIni.xml file (or LDAP system) for the email address of the system administrator.
  3. The system checks in the JGuiGenIni.xml file for the addresses of the local database and the primary site that it will be connecting to. In more secure environments, it gets this information from encrypted files on the local system or a local LDAP system.
    Note: JGuiGen can obtain the database user name and database password from encrypted files. If it does not find an account name it should use to connect to the databases in the JGuiGenIni.xml file, it looks for them in encrypted files. If it can't find them it creates an error record ("E" record) and emails the system administrator, puts up an error message and quits.
  4. It connects to both databases.
  5. It sends any error records ("E" records) that exist locally and have not been sent to the primary (Process = "Y" and Done = "N") Change Done to "Y".
  6. It asks the primary site for it's site number, name, and database type. It stores this away. It will use this to update JG_Owner and JG_Unique_id number on the primary site if it finds that they are empty.
  7. If it does not have a site number it requests one from the primary site and then creates a Site record (“S” record) on the primary site. It is presumed that the "new" site got its table schema from JAWS or something else current. It requests all of the "update schema" records from the primary system and stores them. It does not process them. (We might include a way to "check" if the change already exists.)
  8. It updates its personal Site record on the primary site ("S" record), bumps the replication started count, Enter a date/time that replication started and bumps the count of how many times it started a replication and didn't finish.
  9. It updates the M record at the local site. The updates indicate that a replication started. They will be updated when the replication finishes.
  10. If remote management-by-exception is turned on, it checks the primary site for all sites that have error-1-email-sent = "N" and has "S" records that indicate they have not replicated during the last X days. It gets value of X from the primary site "M" record. If any offending sites are found, tt creates an error record, sends an email to system administrators and marks the error-1-email-sent column to "Y" so the email won't get sent again.
  11. If remote management-by-exception is turned on, it checks the primary site for all sites that have error-2-email-sent = "N" and have "S" records that indicate they have more than Y consecutive replication failures. They get the value of Y from the primary server. After the error record is created and the email is, the error-2-email-sent column is set to "Y" in the S record.
  12. Each replication site sends an email when they start to replicate and their own count of unsuccessful replications exceeds a number supplied by the primary site..
  13. It finds the highest "update" record it has in the local JGReplicate file and asks the primary site for any updates that are higher than this. ("U" records)
  14. It tries to apply these updates. If it fails it emails the administrator, creates an error record ("E" record), shows an error message and quits.
  15. When it finishes, it updates the remote "S" record and the local "M" with the update number it just completed. It emails if it can't do the update.
  16. It asks the primary site for a list of tables to be replicated and marks these tables in the local JGReplicate table. It adds new rows to the local JGReplicate table if the primary site sends a table name that is currently not in the local JGReplicate table. These rows are designated with a "T" in the flag column. These new rows are set with very old dates so that the first replication will populate the table with all existing records.
  17. The next thing that happens when JGuiGen starts to process is that the schema of each table to be replicated is compared with the schema at the primary server. The schema for the primary site and the remote site must match. The columns do not need to be in exactly the same order, but each column on the primary site must exist on the remote site and vice versa. The column type and size must be the same for each column. (Note: There can be minor discrepancies here. For example some systems have a Boolean data type and other use an integer for the same purpose. JGuiGen tries to takes these differences into account.) If a schema does not match the system creates an error record ("E" record), sends an email to the system administrator, and displays a message describing the difference. The notes field in the JGReplicate field is updated with an error message and the replication is aborted.
  18. Once the schema check passes, the local Done field is set to "Y" in the Update record ("U" record).
  19. The system now checks if any of the Table ("T" records) rows in JGReplicate are marked as DONE ="Y". If so, the system did NOT complete the last replication (Done is set to "N" after all tables have been processed). The system then starts with the first table that was not done last time. This avoids doing several tables over and over and then always crashing on a particular table. If the system always started with the first table, some tables (those updated before the bad table) would contain newer and newer data and the data in the others would never change.

NOTE:  Should we try to finish the previous replication first?  Probably not since the later tables will now have “new data” and if there are dependencies we have ordered the tables so that the earlier tables should be done first. Even processing the one table first could cause a problem.

  1. If the first "not done" table replicates, then either the table has been repaired or the last failure was due to a lost connection. All of the rows are marked as NOT DONE and the replication starts from the table listed in the first "T" record. (This avoids having the last tables being several days more current than the first tables.)
  2. When the replication system starts working on a table it updates the count in the rep_count_started field for that table. ("T" records)
  3. The system creates an error record ("E" record) and sends an email to the database administrators that the replication stared in an unclean environment. (Any of the tables are marked with a Done = "Y".) The Notes column for that table in the JGReplicate table would be updated with an error message.
  4. It loops through the rows in the JGReplicate table with a flag of "T". For each table listed there it uses the following overall process:
  5. Processing an individual table uses the following steps.

1.      For each table the system gets the previous-primary-high-date/time using the time zone on the primary machine and the previous-high-local-date/rime (using the local time zone) on the local machine.

2.      The system is connected to the primary server. It establishes the current PRIMARY-MARK time by updating the "S" row in the JGReplicate table and then querying for the lastChangedDate column.

3.      It then queries for all rows at the primary site for one table where the date/time stamp is higher than the previous-primary-high-date-time (previous PRIMARY-MARK) and less than or equal to the current PRIMARY-MARK.

4.      The current PRIMARY-MARK is the current-primary-high-date/time and it is updated in the local JGReplicate table. It will become the previous-primary-high-date/time when the replication is finished. The next time the site asks for data on this table it will want everything newer than this date/time and less than or equal to a new PRIMARY-MARK time it will establish then. Note: This time is from the primary site and may be several time zones away from the local site time.

5.      The system updates a local record and gets the lastChangedDate for that update. This becomes the current MARK time. This time is used to update the current-local-high-date-time in the local JGReplicate table.

6.      Once the system has the date/times and a record set of updated rows from the primary server it loops through the rows. If it finds an empty JG_Owner or JG_Unique_id column it uses the site number from the primary site, calculates what these should be and updates the primary site. Next it checks the local database for a row with the JG_Unique_id equal to the JG_Unique_id value in the row sent from the primary site.

1.      If the row does NOT exist locally it is a new row.

        • add the row.
        • set the lastChangedDate and the JG_DateWhenSent to the local date/time. Set the JG_SentBy column to the site number for the primary server.
        • Later when we ask for changed local records ignore record that came from the primary server and the two dates are still the same.

2.      If the row exists and someone else owns the row then it is a changed row (JG_Unique_id exists on local machine).

        • Update the site row with all the new values unless local site is not empty and the base's version is empty. (If the primary site indicated that secondary sites were not allowed to fill empty columns then all of the owner's values are used.)
        • If the local site has non-empty values and they are allowed  to overwrite blank value- set the lastChangedDate to the current date/time but do not set the JG_DateWhenSent to the same time.  If these conditions were not met set the lastChangedDate to the current date/time and set the JG_DateWhenSent column to the same time. In either case set the JG_SentBy column to the primary Site's site number. Note: The reason for having different dates in the lastChangedDate field and the JG_DateWhenSent fields is that shortly we will update the primary server with all local rows that have dates between the PREVIOUS-MARK and the CURRENT-MARK. The only rows we don't send are those that came from the primary site and the two dates are the same. Since we want the row to be picked up by this query we make sure the two dates are different.

3.      If the row exists and the local site owns the row, then someone else changed the row.

        • Update local data where local site column is empty and theirs is not (unless this has been disallowed).
        • Change lastChangedDate and JG_DateWhenSent to current date/time. Change the JG_SentBy column to the primary Site's site number.
        • If there is data in the downloaded row that conflicts with data in the local row (it was blank and someone edited, but someone locally also edited it with different info), set the lastChangedDate to the current date/time and the JG_DateWhenSent to some other time. See the note in the previous paragraph to see why we do this and why it results in our row being sent to the primary server with our data in it.

At this point we have all of the changes the Primary Site knows about.

7.      The local site selects all local rows that have a JG_Unique_id = 0 (New Rows) OR where the lastChangedDate is greater than the previous-high-local-date/time and less than or equal to the MARK value determined earlier. (Again, note that there will be local rows that were just added or changed from the primary server that have date/time stamps that are included in this query. We will get rid of them by comparing where they came from and if the lastChangedDate equals the JG_DateWhenSent.)

8.      The local site sends inserts of all of the rows with a JG_Unique_id = 0 (New Rows). It fills in the JG_Unique_id, the JG_Owner, the lastChangedDate, the JG_DateWhenSent and the JG_SentBy columns.

9.      The local site sends updates for all of the rows with a lastChangedDate greater the previous-local-high-date-time and the MARK time. It fills in the JG_Unique_id, the JG_Owner, the lastChangedDate, the JG_DateWhenSent and the JG_SentBy columns.

Loop through rows (Be sure to include rows where IsRemoved Flag is set with the value that shows the row should be treated as if it was deleted.) Query the JG_Killed_Rows tables for deleted rows in the right time frame for this table if this variation is being used.

0.      If the JG_Unique_id is zero - this is a NEW row

        • Update the JG_Unique_id to the local site# * 10,000,000,000 + row_id*10 in the local table.
        • If the JG_Owner field is empty, update it to the local site number.
        • Insert the row on main site - the lastChangedDate on the primary machine is set using the JDBC function {fn CURDATE()}. This will set the lastChangedDate to the current date/time on the primary database machine. Use the same function to update the JG_DateWhenSent column. Fill JG_SentBy and JG_Owner with the local site's site number.
        • The lastChangedDate locally is left with its current value.

1.      The other rows are rows the local site changed.

1.      *** If local site owns the row,

          • The row should already contain any changes made by some other site.
          • Send an update of all non-empty columns and set the lastChangedDate and the JG_DateWhenSent on the primary server using the JDBC CURDATE() function. This will set the date/time stamp to the current date/time on the primary server. If the primary indicates the no one but the owner can enter data then send the empty text fields as well. Update the JG_SentBy column to this site's site number.
          • Leave the lastChangedDate on local machine alone.

2.      *** If local site does not own the row.
Request a copy from the primary - We need this row to be sure about which fields are empty on the primary server so that we can update them.

          • Update local row with their data if the two diff. (Except where ours is not blank and the corresponding column from the primary site is empty)
          • Build an update where ours is not empty and theirs is
          • Update the primary site and set the lastChangedDate and JG_DateWhenSent to the current date/time using the CURDATE() function on the primary site. Update the JG_SentBy column to the local site's site number.
          • Leave the lastChangedDate column alone at the local site unless values were changed and set it to the current local date/time.

10.  The local machine uses the same MARK time for each of the tables being processed and saves that as the previous-high-local-date/time. This will the highest change we just applied from the primary server. During the next replication anything higher than this will have been changed locally since the replication that was just finished.

The local system saves the current-primary-high-date/time as the previous-primary-high-date/time. The next time the site replicates and asks the primary site for new/changed records it wants records that are higher than this value. NOTE: See the section called "Problem Window."

The DONE field is marked and the rep-count-finished is bumped by one. The log rows are updated in the local JGReplicate table and the JGReplicate table at the primary site.

  1. The system loops through another table.
  2. When all are done - mark all of the DONE columns as "N" (not done). The local system is "clean" and ready for another update. The steps used to complete a table were listed above.
  3. Updates its personal Site record on the primary site ("S" record), bumps the replication finished count, Enter a date/time that replication ended and sets the count of how many times it started a replication and didn't finish to zero. Appends a line to the Notes text field that shows the date/time of this replication. The error-1-email-sent through error-4-email-sent columns are set to "N". The site has successfully completed and these columns won't be used until it is either late replicating or tries and fails several times. Then the columns need to be "N" so the management-by-exception routine will create error rows and send emails.

To change the owner of a row.

Back to Top

XX. AVOIDING ISREMOVED FIELD REQUIREMENT

There are notes on How JGuiGen Replication handles situations where existing application actually delete rows from a table that is part of the replication system.

  1. A new table must be added to the local database. The table is called JG_Killed_Rows and it has the following structure:
  2. A stored procedure must be installed that is activated by a trigger when the row is actually deleted. This stored procedure must insert a new row into the JG_Killed_Rows table.
  3. When a replication occurs, the JGuiGen replication software will query this table for rows with a date_time_killed that match the time period being replicated. If it finds a row, it will attempt to delete the row with that JG_Unique_id on the primary system.
  4. The primary system will need the same type of stored procedure and a JG_Killed_Rows table if it is not the master primary site. It will need to propagate the delete up the chain and down to other replicating sites.

XXI. MOVING A SITE TO A NEW PRIMARY SERVER

Here are the steps needed to do this.

SPECIAL NOTE ABOUT SECONDARY PRIMARY SERVERS

I would think that it would be prudent to stop users from replicating to a secondary server while it updates from its own primary server. It shouldn't really be a problem, but I feel better if source for other site's data runs in a fairly pure envrionment while it gets its down data.

These servers will need to set the Process field in the Master record ("M" record to "N") and wait for replicating sites to finish before they start to replicate. They cannot have active users or remote sites replicating when they themselves connect to a remote server and replicate from it.

Back to Top

XXII. SPECIAL NOTES ABOUT THE JGREPLICATE TABLE

This list.needs to be fleshed out from notes that are scattered through this document.

Notes on getting a new site number

If a site needs a site number

  1. It sends an update to the primary to change the Next-site-number-lock field to "Y" in the master site record ("M" record) and the next-site-number-lock field = "N".
  2. If the update succeeds it queries for the next-site-number.
  3. If the primary site is the master primary site it sends an update to bump the Next-site-number-1 and set the next-site-number-lock to "N"
  4. If the primary site is a secondary primary it checks it's five possible site number for one that is not zero. It makes the same update except it sets the Next-site-number it used to zero. (The secondary site will get replacement numbers when it replicates from the master primary server.)
  5. If the original update fails, it waits 10 seconds and tries again. After three tries it aborts, creates an Error record ("E" record)c and sends an email to the administrators.
  6. If the next-site-number fields are all zero, it aborts, creates an error record ("E" record), and sends an email to the administrators

Notes about the Master Record ("M" records) in JGReplicate

There is one master record at each site. This contains overall information about the current site.

Notes in Site Records ("S" records) in JGReplicate

Note: There will be no Site records at target/remote sites. These only exist on primary sites and contain information about the sites that are replicating from this primary site.

Notes about the Table Record ("T" records) in JGReplicate

Notes about the Update Records ("U" records) in JGReplicate

Notes about the Error Records ("E" records) in JGReplicate

Notes about the Log Records ("L" records) in JGReplicate

Notes about the JUnit Records ("J" records) in JGReplicate

Back to Top

XXIV. REPORTS

I was going to list the reports we will be creating in the JGuiGen application's report generator, but I realized that there will basically be one report for each of the sections just listed. The fields will be those listed as relevant for each type of record (M, S, L, E, etc.)

Back to Top

XXIV. GENERAL NOTES

  1. Email to System administrators tell about errors. Note there should always be error records created locally and on the primary where an email is sent.
  2. Any site can take over as the primary site. Mark the "M" record as "Y" in the "primary_site" field. You will need to update the JGuiGenIni.xml files to show where the connection is and update the SSL certificates. and then have remote PC's attach to the new primary site. See the longer description about this process available elsewhere.
  3. Any site can redo scrambled data by setting the previous-primary-high-date/time values in the JGReplicate table back
  4. If a site crashes and restores from a backup that is several days/weeks old, the previous-primary-high-date/time will be from before their crash and they will get all of their own records back as well as new records from other sites.   (Note:  There is a field in the site record that must be set to allow them to do this.  The field allows a one time  replication where the date at the primary site does not match the date at the target site.)
  5. If there are dependencies and triggers in the databases (payment records that require the invoice being paid exists). then the order of the tables in the JGReplicate table should be set very carefully. We are not doing transaction level processing so it is possible for a remote site that lost its link during replication to have part of another site's data but not all of it. An email is sent to the system administrator if a failure occurs due to this issue.
  6. The owner of a row can be passed around. A salesperson might enter a proposed bid and transfer ownership to a higher level authority. This person could then make changes that the salesperson could not override. Until the salesperson finishes with the bid the higher authority could see the proposed bid and fill in empty spaces, but they couldn't change what the salesman had entered until the salesperson was finished with it and transferred ownership.
  7. Special note on JGuiGen-Specific tables. The three tables marked as "JGuiGen-specific" are treated a little differently. Since we plan to let sites use our Sourceforge table to update their primary sites table schema these tables are special. Local sites can add columns and these will be ignored. The remote sites MUST have all of the columns the primary site has. Table replication will ignore data sent for columns that do not exist on the primary server.

Back to Top

XXV. DISCUSSION OF THE "PROBLEM WINDOW" DURING REPLCICATION

We are letting users access the database being replicated while the replication process is going on. This is done to minimize the impact on users. (Think of road warriors who can't use their laptop while the replication is going on and it is 3:00 AM. Replication won't happen and we all know it.) This causes some problems. The most glaring is that changes made by the users can have approximately the same date/time stamps and records being updated by the replication service. I call the period between the time that the replication service started actually changing rows and when it stops changes rows in a replicated table the "problem window." Here is how JGuiGen Replication handles this.

There are two major concerns with letting users use the system during replication.

Back to Top

XXVI. PLANNED JUNIT TESTS

Setup

Setup

Tests


Setup

Tests

Tests

Back to Top

XXVII. CODE STEPS [square brackets represent specific info on updated HSQL to MySQL]

Steps deferred