JGuiGen
Replication System - Version 15 - 6/29/2006
Written by G. Hale Pringle Ed. D.
For the purposes of this document, Data Replication is defined as
synchronizing selected tables on two or more database servers. The JGuiGen
Replication system gets new and changed rows from one database and sends
local new and changed rows to another server. At the end of the process
the two tables are the same with the exception of a few system fields that
should be different. The objective is
to take a snapshot of each system and update the other system to match this
snapshot.
Beyond Simple Replication
When you have remote sites that must stand alone (satellite
offices, road warriors, sites with intermittent Internet connections, etc.) the
usual solution is to replicate the data between sites. This may be as crude as retyping data or as
complex as Oracle’s Replication System which requires experienced Data Base
Administrators at all of the sites.
JGuiGen has a built-in replication system that is much better than
retyping data, but is not as “strong” as the systems developed by major
database venders.
Having remote sites and evolving software means that you must take
care of five major concerns.
1.
Initial Distribution of software and data – We will be using
Sun’s JAWS, but there are other options (such as the software distribution
system developed by the Eclipse project.)
With JAWS, a user attaches to the Internet and automatically gets a copy
of the software. After the first time
they will run with the existing copy unless they find a newer copy when they
link to the Internet. In that case the
new copy is automatically installed.
2.
Updating the software – JAWS does this very elegantly.
3.
Updating the schema of the remote databases to include changes
required by “new” versions of the software.
This is not part of the JAWS system.
4.
Running one-time procedures required by updated software and
modified data schemas. Again this is
critical, but is not covered by JAWS or other systems that focus on
distributing “versions” of software.
5.
Verifying the accuracy of replicated data.
The
JGuiGen Replication system addresses all five of these concerns.
INDEX
I.
Background
II.
Type of replication
III.
Objectives
IV.
Overview - dummy walk through in English
V.
Criteria for Evaluating Replication Systems - Notes about how JGuiGen stands
up.
....
Architectural Requirements
....Functional
Requirements
....Administrative
Criteria
....Resource
Management
....Change
Management
....Recovery
....Vendor
Health
VI.
Minimal JGuiGen Replication
VII.
System Details
....Assumptions
....Security
....JGReplicate
Table
....Over-Booking
the JGReplicate Table
VIII.
Rules - Requirements that must be met.
IX.
Walk Through - following the process step-by-step
XX.
Avoiding the IsRemoved Field Requirement
XXI.
Moving a site to a new primary server
XXII.
Special Notes about the JGReplicate Table - fields used by various types of
records
XXIII.
Reports that will be available in the JGuiGen Replication system.
XXIV.
General Notes
XXV.
Discussion of the "PROBLEM WINDOW" during replication. XXVI.
Planned JUnit Tests
XXVII.
Code Steps - coding that is underway.
Back
to Top
I. BACKGROUND
I have designed and worked on homegrown data replication systems since the
mid 1990s. This document is my latest and gr8t3%t (nope - just couldn't say it)
thinking and it builds on the existing work. There are some differences but
this is basically what I spent those years developing and running. The systems
I developed connected several dozen sites in 22 different states.
Desktop applications (not web-based) that work with databases fall into
three rough categories.
- Applications that work solely
on one PC using a local database.
- Applications that work using
some type of client/server system where one or more desktop machines share
one database. (This category escalates up into the massive clustered
systems.)
- Applications that are hybrids
of category 1 and 2. These applications have at least one user who works
with a stand-alone system and needs to share at least some data with the
users on another system. The answer is always some type of replication
service, even if this means someone typing the data into the other system.
(Oh yes! This happens. More than we want to think.)
The advantage of the hybrid is that users can work off-line. They do not
need to be connected to the Internet or to some central database server. The
major disadvantage is latency. Some of the data is out-of-date and there is
some effort required to move correct data to and from the remote sites when
they do come on-line. Regardless. there are times when the hybrid is the only
viable solution.
JGuiGen can create applications that work in any of the three modes
described above. It handles stand-alone systems and large multi-user client
server systems. With the addition of the replication system is can synchronize
these two groups. (Note: It has always been able to connect a user to a remote
database via the Internet. This replication system just allows that user to
work when they are not connected. They can then connect and bring everything
up-to-date.)
Back
to Top
II. JGuiGen REPLICATION - WHAT TYPE OF REPLICATION IS IT?
There are several different types of replication. Some are download only.
Some require a full dump of one site's information and others involve real-time
or near real-time synchronization. Using definitions from Marie Barretta's book
"Data Replication" JGuiGen uses a modified master/slave asynchronous
multiple central site replication scheme. The master/slave designation
indicates that data from one site is considered to "own" a record. They
are the only ones that are allowed to make changes to that record. (Note: the
ownership can be passed around so that someone else can edit the record, but
only one site owns a given record at any given time.) The "modified"
in this definition come from the fact that JGuiGen allows the non-master users
to fill in empty text data in fields that someone else owns. The asynchronous
part indicates that the data is not updated in real-time or near real-time.
Users can run independently for long periods of time (they probably shouldn't,
but they could) and then synchronize their data with the primary site. The
"multiple central sites" refers to a system where some users can
replicate with a central site and that central site synchronizes with another
central site that is "higher" up the chain. All data is eventually
replicated, it just may take a couple of days.
Note: At present JGuiGen Replication
is designed to work at the table level.
A while table is owned by one master site. Since JGuiGen already includes a data dictionary table that
contains information about individual columns within each table, it could be
expanded to do column level master/slave replication without requiring a major
overhaul. The previous systems I
designed had column level controls, but it doesn’t seem necessary in this first
pass.
There are two main reasons for including Data Replication with JGuiGen.
- JGuiGen ships with the
standalone database HSQL. In order to use JGuiGen with one of the other
backend databases (e.g. mySQL, PostgreSQL, Oracle), the data_dict,
userlogdata, and jguigen_i18n tables need to be created and the data from
the HSQL database needs to be moved into them. This replication system can
be used to do this.
- The other major reason for
data replication is to support business applications that want to deploy
stand-alone applications via JAWS, or the Eclipse program update system,
or something like these and then have users able to run the application in
stand-alone mode. The users usually need to be able to synchronize their
data with a home/primary site. This replication system also can be used to
do this.
Back
to Top
III. OBJECTIVES
When I started this project in the mid 1990's I had a few objectives. As the
project went on I discovered a number of other things that needed to be
covered. Here is a fairly complete list.
- Accurate - All rows
synchronized, all columns correct
- Secure - Resistant to
attacks from outside and inside the system.
- Easy to use and easy to
automate. It should handle (as well as possible) users that don't care
about the whole thing. (I view apathy as an "attack" on the
integrity of the data. We need to know when a site is not sending and
receiving the current data that other sites are using to run our
business.)
- Reliable - it should just
work
- Reliable - It should be
tolerant of interrupted replication sessions
- Reliable - It should be
amenable to warm servers to minimize the impact of single point of failure
problems.
- If there is a problem, the
replication system should not make it worse.
- Fast - minimal redundant
transfer of data (Note: Since whole rows are sent, there is transmission
of data that wasn't changed, but this is a tradeoff with accuracy and
complexity. Tracking changes at the individual row/column level is very
complex or requires logging which is hard to add to legacy system.)
- Flexible - system should
allow for clusters of users when necessary.
- Low impact on users -
minimal down time. (This system does not require that users be off the
system during replication.)
- Low impact on users - There
should be no impact on local users when the Internet is unavailable.
- Low requirements on existing
applications and tables. (The only major one is requiring that rows never
be actually deleted. This can be waived if a stored procedure updates a
table called JG_Killed_Rows when a row is deleted in a table that will be
replicated.)
- Time zone and server clock
differences tolerant
- Scalable - Through a system
of secondary primary sites build a system that can get very large.
(Obviously this causes a longer lag in the time required for everyone to
get new/changed data.)
- Manageable - Management
should be aware of what is going on.
- Manageable - On special note
is Management-by-Exception. The system should actively tell administrators
when there are problems.
- Manageable - Management
should be able to override data ownership when necessary.
- Manageable - Management
should be able to suspend replication of selected tables when necessary.
- Can be used to propagate
table schema changes across sites.
- Can be used to fire
one-time-methods or classes that do work needed to populate tables that
have had their schema modified. For example if a legacy system had a
column called "NAME" which contained first and last names and a
schema change created a new "FIRST_NAME" column and a
"LAST_NAME" column, you might want to run a method that tried to
split the existing data in the NAME column and placed it into the
FIRST_NAME and LAST_NAME columns.
- Should be able to restore
sites that come on-line using a backup that is days or weeks old
- Should be able to populate a
new site completely with current data
- Should be able to work with
applications distributed with JAWS.
- Should be able to act as an
general purpose table replication system.
- Cross-platform capable -
should be able to handle situations where different sites are using
different back-end databases (this is a tough one)
- Real-world aware - Features
like the ability of any site to enter empty data should be encouraged but
optional.
- NOT DONE - system should be
transaction safe. It is mostly safe but does not guarantee all or nothing
transaction delivery. This would be very difficult to add to the system as
it now exists.
- NOT DONE YET- the system
should allow sites to get "selected" data so that they just get
what they need.
- NOT DONE YET- the system
should have a "compare" feature which allows one site to be
compared with another to verify tha accuracy of a site's data.
- NOT DONE YET - the system
should access rules for transforming data from disparate system and
creating enhanced (summarized) data as part of the replication process.
- NOT DONE YET- systems might
have the ability to reliably update summary data or totals or balances as
part of the replication.
- NOT DONE YET – column level
control of data ownership.
Back
to Top
IV. OVERVIEW
How the system works is repeated several times below (from different points
0f view), but I would like to start with an English conversation that shows the
concept behind the system. Note that in this conversation Sites 3 and 17 doe
all of the talking. Primary is very dumb, it just sends rows, updates records
and inserts records when Sites 3 and 17 ask it to.
- Buzz Click - someone just
attached to the Primary site's database system.
- Secondary site - Hello
Primary. I'm new. Here is a query to see if you are a primary and if you
are active and can give me a new site number.
- Primary - sends data.
- Secondary Site - I see that
you are a Primary and have a site number you can give me. Here is an
update to add a Log record to show that I've talked to you, but I don't
have a site number yet.
- Primary - processes insert
- Secondary Site - Here is a
query for the next available site number.
- Primary - sends site 17 as a
response.
- Site 17 - Hi! I'm in. Here
is an update for my log entry to say that Site 17 asked for a new site
number.
- Primary - handles update
- Site 17 - Here is a query
for all previous Schema updates. I shouldn't need them but I'll put them
in my JGReplicate table.
- Primary Site - sends U
records.
- Buzz Click - someone else
has attached to the Primary site's database system.
- Site 3 - Hello Primary. Here
is a query to see if you are a primary and active today.
- Primary site - responds to
request to Site 3.
- Site 3 - Hello there - I am
sending you an Error record that indicates that my last update did not
complete. I'll also send an email to system administrators.
- Primary site - handles
insert.
- Site 3 - Hi there again. I
see that you are a primary site and I have your type of database (you are
SQL-SERVER), along with your site number and name. I'll update any of your
JG_Unique_id's or JG_Owner fields that I find empty. I also see that you
have remote management-by-exception reporting turned off so I won't do
those functions. Here is an update to update my Site record ("S"
record) to indicate I started a new replication and to bump my
"count-of-unsuccessful-replications.
- Primary site - handles
update
- Site 3 – Here is a query for
the time I used when I last finished a replication. If your version and mine don’t match I
will need authorization from you before I can continue. (I know that this stops cloned machines
with my site number from replicating and screwing things up.)
- Primary site – returns data
- Site 3 – Got it! It matches what I have. Do you have any schema updates for me
("U" records) that have an update number greater than 22?
- Primary Site - Sends data if
it exists.
- Site 3 - Okay I put that
change in my JGReplicate table, and made the change to my local system.
Here is a query for the list of tables that you are actively replicating
today.
- Primary site - responds to
request to Site 3.
- Site 17 - Hello Primary this
is Site 17. Here is an insert to update your JGReplicate table to create a
new Site record ("S" record) that describes me. It indicates
that I came and started this conversation.
- Primary - handles insert
- Site 17 - Here is an update
to change the Log record I started when didn't have a site number to
reflect that site number 17 was the number I was given.
- Primary - handles update
- Site 17 - Let me start over.
Are you an active primary site, what is your site number, name and
database type and should I do remote management-by-exception reports?
- Primary site - responds to
request to Site 17.
******* I'm going to quit reporting Site 17's conversations here.
- Site 3 - Well here I am
again. I see you added a new table to the list. I've added that to my
JGReplicate table. Can we compare the schema in my tables against yours?
Here is a query for the structure of the first table.
- Primary site - Sends the
schema for the data_dict table Site 3 asked for.
- Site 3 - Okay that matched.
Here is a query for the schema of the JGuiGen_I18N table.
*********several back and forth discussions about the schema of other
tables.
- Site 3 - Hello again
Primary. It is now Tuesday at 10:17 PM. I last finished a replication with
you on Sunday at 11:04 PM my time. (I tried on Monday but we got
disconnected.) The highest date/time I got from you when we started our
last replication was Sunday at 9:15 PM your time. I'm going to update a
record on your site and determine the current time where you are. Here is
an update and a query.
- Primary -updates the
"S" record in the JGReplicate table sends "Tuesday at 5:30
pm."
- Site 3 - Okay I got it. I'm
going to call this time my PRIMARY-MARK time. I will use it for all the
tables I request data on. Here is a query for all rows in the data_dict
table that were updated since Sunday at 9:15 PM your time (my PRIMARY-MARK
time the last time I replicated) and less than or equal to my current
PRIMARY-MARK time. Don't send me any rows that I sent you where the
date/time I sent them (JG_DateWhenSent) matches the lastChangedDate on the
row. These are rows that I sent you and I know about those changes.
- Primary site - sends the
requested rows from the data_dict table.
- Site 3 - Well I've processed
those. I'll ask for things newer than Tuesday at 5:30 PM the next time we
talk. I updated the lastChangedDate and the JG_DateWhenSent to my current
time and changed the JG_SentBy column to your site number. When we talked
Monday the most recent data I had was from Friday at 4:27 PM my time
(nobody worked over the weekend.) Just in case, I'm going to update your
tables with all of the changes on my system since that time on Friday.
(I'm not going to include the stuff you just sent me even though the time
stamp is newer than the rest of what I am sending. I'll know that it was
you because the lastChangedDate will be the same as the JG_DateWhenSent
and the JG_SentBy column will have your site number. If the two dates are
different someone has changed this record - probably in the last minute or
two - and I need to send you that change.) I'm going to use your date/time
to mark lastChangedDate and JG_DateWhenSent columns in the rows I'm
updating on your system. I've just checked my system and the highest
date/time I am going to send is a time I will call MARK. By the time I
finish there may be rows that are more current than this value, but I'll
send them to you next time.
- Site 3 - Here are some Table
Inserts for my new rows since my last update and my local MARK time.
- Primary site - handles the
inserts.
- Site 3 - Here are some Table
updates for my changed rows since my last update and my local MARK time.
- Primary site - handles the
updates
- Site 3 - I'm updating our
logs and storing the local MARK time as the time to start with during out
next replication. When you're ready here is a query for all of the new
data from the JGuiGen_I18N table. (Note that I'm using the smae local MARK
time in this query.)
- Primary site - sends the
requested data.
**********similar back and forth discussions about other tables.
- Site 3 - Okay I'm done. I
updated our logs to indicate that I had finished. In particular I changed
the column that contains the number of replications I have started that
didn't finish from 2 back to zero (remember I had an error when I last
replicated.) You don't talk much do you? Oh well, talk to you again soon!
- click - Site 3 disconnects.
- Note: During this
conversation with Site 3, Site 17 could have been carrying on similar
conversations.
Back
to Top
V. EVALUATING REPLICATION SYSTEM and JGuiGen Replication
One section of Ms. Buretta's book listed "General Requirements for
Creating and Evaluation Replication Systems." I am going to go down this
list and talk about the JGuiGen replication system. I will mark her statements
as "Replication Criteria: ....." This gives you a good starting point
to make your own evaluation about the suitability of JGuiGen Replication for
your own uses.
NOTE: As of 6/27/2006, this document is being written as if this
system is operational. It is under active development, but is not fully
functional. The code that it is based on exists and in most cases the
development consists of recreating the functionality in java. I will update
this document as development proceeds.
NOTE 2 This section is written in light of setting up a full-blown,
secure, multi sever system. For test purposes and non-critical situations,
JGuiGen Replication is much simpler. See the note at the end of the Evaluation
section.
ARCHITECTURAL REQUIREMENTS
- Replication Criteria:
The system should be architected as middleware whose purpose to provide
data replication services. This allows the system to be plugged and
unplugged with minimal impact on the rest of the system. - JGuiGen Replication
meets this criterion. The replication system does not require JGuiGen or
its applications to run. It does require an XML file (or LDAP system) with
system settings. A data_dict table with some information about the tables
to be replicated and a JGReplicate table to store replication information.
There is an options JG_Killed_Rows table if your system actually deletes
rows.
- Replication Criteria:
The system should support location/distribution transparency. A database
or process should not need to know where objects are. - This is not truly
applicable to a database only replication system like JGuiGen Replication.
Each site needs to know how to connect to its own database and how to
connect to one remote site. JGuiGen Replication does handle time zone
differences between primary and target sites.
- Replication Criteria:
The system should support a high degree of database and process
transparency. - JGuiGen meets this criterion for data replication. It will
move data from and to any of the databases that JGuiGen supports and can
add other database types as long as they have JDBC support.
- Replication
Criteria: Question about database replication systems. How
transparent is the system to the resource managers involved in the
replication process? - This is talking about the databases and the
existing applications. JGuiGen Replication is very transparent. With the
addition of four fields to each table to be replicated, two or three
tables for the replication process and at most two stored procedures that
are triggered on updating a row and deleting a row any database system
can work with JGuiGen Replication.
- Replication
Criteria: Question about database replication systems. Do resource
managers need to be altered to participate in the replication
process? If so how hard is it to
do and undo? - Here is the whole list of changes required by the JGuiGen
Replication system.
- Four columns need to
be added to each table: 1) an int JG_Owner column, 2) a long
JG_Unique_id column, 3) an int JG_SentBy column and 4) a date
JG_DateWhenSent column.
- There needs to be an
integer row_id column with a unique value in each table to be
replicated.
- There needs to be a
lastChangedDate column in each table to be replicated.
- If the current
software does not already update the lastChangedDate column when a row
is inserted or modified, a stored procedure will need to be installed
that updates this column.
- There either needs to
be an isRemoved column to mark tables as removed or a JG_Killed_Rows
table will need to be added and a stored procedure installed that fills
the JG_Killed_Rows table when a row is deleted. There is a section below
that describes this.
- An index should be
created on the JG_Unique_id column.
- Two tables need to be
added to the database - JGReplicate and Data_dict.
- JGuiGen needs to be
run once to populate the Data_dict table and someone needs to indicate
which columns handle the row_id, lastChangedDate and isRemoved
functions. Right now the
JG_SentBy, JG_DateWhenSent, JG_Unique_id and JG_Owner columns
must be named exactly. This could easily be changed.
- The system hosting
the database should subscribe to a time service to keep its date/time
accurate.
- A small
program/daemon will need to be running at all times that connects to the
remote sites via SSL on a port other than the standard database port.
- this port will need
to be opened through any firewalls.
- Each site will need a
JGuiGenIni.xml file or an LDAP system that contains the JDBC Driver name
and location for connecting to the remote site.
- Each site will need
to install the JGuiGen Replicate software.
- An optional task
involves running a small application at each site once that creates
encrypted files containing the information needed to connect to the
local and/or remote database.
- SSL certificates for
remote sites will need to be generated.
JGuiGen applications already have almost all of these items
in place from the beginning.
NOTE JGuiGen Replication is currently under
development and the levels described here are all in place but have not been
fully implemented. The system can be run in a much looser fashion when
simplicity is more important than data security.
- Replication
Criteria: The system should be industry standard compliant - for
example it should use the Open Source Foundation - Distributed Computing
Environment (DCE) standards and handles XA and non-XA data.. - JGuiGen
Replication meets many of these standards. It uses the industry standard
of JDBC to connect to disparate database systems. It uses industry
standard SSL connections to insure that data transmission is secure. It
can store passwords and user names in local LDAP systems. It uses
industry standard encryption algorithms such as MD5 hashing,
CBCBlockCipher, Thriple DES, PassWordBasedENtryption (PBEWithMD5AndDES).
JGuiGen Replication does not work with the XA transaction manager
protocol at all.
Replication Criteria: Question about system security.
How robust are the security implementations? - JGuiGen is very secure against
internal attacks and relies on SSL to secure transmission of data. It would be
easy to add a process called Tumbling (a form or port knocking) if more
security was needed.
- Replication
Criteria: Question about system security. Who can use the
functionality of the Replications system? - JGuiGen replications
requires that a site be set up to establish an SSL link using SSL Data
Tunneling software. Anyone who has this setup can get to the database.
In its most secure form JGuiGen Replication uses encrypted files to
store the user name and password needed to attach to the database.
- Replication
Criteria: Question about system security. What Ids and passwords are
stored? - JGuiGen Replication uses three id's and passwords. A local
user name and password, the local database username and password and the
remote username and password. These can all be stored in encrypted files
at the local site or in a local LDAP system. The next step for our
system is to store the remote database username and password in a way
that causes the local site to request it using an SSL Link. This would
make it easier to shift from one primary site to another that could be
located anywhere in the world.
- Replication
Criteria: Question about system security. Where are they stored and
are the encrypted? - On the local file system in encrypted files or in a
local LDAP system.
- Replication
Criteria: Question about system security. What usernames and
passwords are used on remote resource managers (databases) to alter
data? - As noted JGuiGen Replication requires a database username and
password. This would often be a special user name dedicated to the
replication process. It would be set up so that its access to tables in
the database was just as tight as possible. For examples tables that are
only downloaded to remote sites could be marked "read-only" to
this special user.
- Replication
Criteria: Question about system security. How is network security
integrated into the replication system? - As mentioned, JGuiGen
Replication uses a special daemon/process that sets up an SSL link with
a remote site on an arbitrary port. We have the code (called the Tumbler
on Sourceforge) that requires the remote site send and encrypted
password to even open this port.
- Replication
Criteria: The system should be fail-safe (i.e. fail in a manner that
protects other components in the environment) - JGuiGen Replication
should meet this criterion quite well. It is simply a database link to a
database.
- Replication
Criteria: There should be no inherent bottlenecks in the solution. -
Since JGuiGen Replication can be scaled to as many clusters as are needed
this should not be a problem. There are no calls to other sites required
at any time except when a full replication is being done.
- Replication
Criteria: I would add that the system should come with a set of
reliability tests that demonstrate most or all of its facilities. - These
are under development as part of the JGuiGen Replication coding. There
are JUnit test and a system for passing information relevant to these
tests back and forth. A critical feature involves creating a "compare"
facility that compares the values in a table with its counterpart on
another system. This facility needs to ignore the row_id and
lastChangedDate which are unique to the local system and the JG_SentBy
and JG_DateWhenSent columns that are different on different systems.
Back to Top
FUNCTIONAL REQUIREMENTS
- Replication
Criteria: The replication system should be scalable. - This system
is.
- Replication
Criteria: Question about scalability. What resource managers
(databases) can be used as primary sources? - Any of the over six that
JGuiGen supports. Others that have JDBC drivers could be added in very
short order.
- Replication
Criteria: Question about scalability. What resource managers
(databases) can be used as targets? - Any of the over six that JGuiGen
supports. Others that have JDBC drivers could be added in very short
order.
- Replication
Criteria: Question about scalability. Given the available resources,
are there limits on the amount of data that can be transmitted through
the system? - This is an individual issue that I can't answer. Given
that JGuiGen does not handle blobs at this point, this should not be a
problem.
- Replication
Criteria: The system should use or interface with data transformation
and mapping. JGuiGen Replication handles minimal transformation. It will
convert Boolean fields to integers given that some databases use Booleans
and others do not. It does not support (at this time) full scale data transformation
(such as a situation where one site stores Male and Female as 1 and 0 and
another stores the same information as "M" and "F".)
- Replication
Criteria: Question about transformations. Does the system alter data
types between systems? - As described above - yes in limited situations.
- Replication
Criteria: Question about transformation. Does the system insure that
all target sites represent the data identically? - This is the 1/0
versus M/F question. The answer is no except for Boolean transformation.
- Replication
Criteria: Question about transformations. Does the system enhance
the data? - This means can it creates summaries or other major changes
as data is being delivered? - No
NOTE: It should be noted that JGuiGen Replication does not have the
facility to do more that replicate tables. If you need a system that has
failsafe ability to update totals or make other cumulative changes at
the primary site, JGuiGen Replication is not appropriate. For example,
if you need a system that will send new financial transactions and then
update a "balance" field on the primary site, JGuiGen
Replication is not appropriate. The system will insert the new rows and
in the case of failure it would finish inserting the rows. There is no
way to guarantee (without modifying the existing code) that the balance
field would not be updated multiple times or not updated at all if all
the records were inserted and then a failure happened.
- Replication
Criteria: There should be a scheduling mechanism. JGuiGen Replication
depends upon native systems to fire off the replication executable at a
regular time. Road warriors can fire off their systems at any time.
- Replication
Criteria: There should be a friendly system that describes the data
or objects being transmitted. JGuiGen Replication lists the tables it is
working on in a command window while it works.
- Replication
Criteria: There should be a friendly system to subscribe to the
data/objects that are available for replication. - JGuiGen Replication is
an all or nothing system at present. Once you attach to a primary site it
replicates the tables that have been set up for replication.
- Replication
Criteria: The initialization process for targets should be simple and
straight forward.- As noted the hardest part of getting started (in
secure systems) is setting up the SSL certificates on the remote sites.
After that the remote site starts with an empty JGReplicate table and
attaches to a primary system. The table is populated and replication
begins.
- Replication
Criteria: The replication system should be well behaved and provide
the quality of services specified by the business environment - Since
this is basically a database connection with updates and inserts it is
well behaved. The issue on whether or not the "quality of service"
is acceptable depends upon the group wishing to use JGuiGen Replication.
Back to Top
ADMINISTRATIVE CRITERIA
- Replication
Criteria: Configuration - The configuration process should be simple
and straightforward. - I believe that this criterion is met. For full
security there is one daemon/process that must be installed on primary
servers and then the changes listed earlier. For target sites the
software can be delivered via JAWS. Existing databases will need to have
two columns added and two or three tables added. The SSL certificates
will need to be created and a small executable run that creates the
encrypted files for the target system.
- Replication
Criteria: Configuration - It should support dynamic reconfiguration -
It does.
- Replication
Criteria: Configuration - Some sort of configuration history is a
plus - JGuiGen Replication logs virtually all of its activities. This
provides a fairly full history.
- Replication Criteria:
Monitoring - There should be a GUI monitoring tool for the entire
replication system - This does not exist yet. It could easily be added
when someone needs it.
- Replication
Criteria: Monitoring - The monitoring system should integrate with
the existing monitoring system for the firm - JGuiGen currently documents
errors in the JGReplicate table and sends emails to a system
administrator when errors occur. This could easily be modified to handle
other reporting requirements.
- Replication
Criteria: Performance Monitoring - JGuiGen Replication has a GUI for
the JGReplicate table. This includes several standard reports that can be
run at any time to monitor the activity in the system. Other ways of
reporting this data could be added very easily.
- Replication
Criteria: Performance Monitoring - The system should efficiently sue
existing resource. JGuiGen Replication requires internet band width and
database server machine cycles. Thats about it.
- Replication
Criteria: Error Management - There should be an error log - This is
handled in the JGReplicate table as described in this document.
- Replication
Criteria: Control of the replication process should be controllable
from either the target sites or the central sites. - yes.
- Replication
Criteria: The system should handle emergency shutdowns - yes
- Replication
Criteria: The system should automatically restart after emergency
shutdown. - yes
Back to Top
RESOURCE MANAGEMENT
- Replication
Criteria: Resource Management - the queuing system should efficiently
handle resources. - This is currently done by manually scheduling when
remote sites will replicate.
- Replication
Criteria: Resource Management - CPU - the subscription service should
be well behaved. - As noted, after a target site is configured it
automatically joins the system when it first connects.
- Replication
Criteria: Resource Management - Network - the system should be well
behaved on the network.- As noted this system uses standard SSL for
connection and security. The amount of band-width or extra bandwidth
needed would depend upon the amount of data being moved. In most cases
the replication will occur during off-hours and normal bandwidth that is
under-used in those time periods will be sufficient.
- Replication
Criteria: Resource Management - Access management should be done with
standard protocols - SSL and JDBC are extremely standard.
Back to Top
CHANGE
MANAGEMENT
- Replication
Criteria: Change Management - It should be easy to upgrade the
system. - JGuiGen Replication is being built with systems that change
table structures at remote sites and run special methods that modify
existing data after a data base schema changes has been made. Target
sites can automatically update they software via JAWS, which will
automatically install updated replication software when the site starts
the replication process. Since they have to be connected to the Internet
before they can replicate this imposes no additional burden.
- Replication
Criteria: Change Management - Migration should be straight -forward
and should permit release level coexistence. - Yes in most cases.
- Replication
Criteria: Change Management - Must all sites migrate to the new
system at the same time? - Since the migration is performed via the
Internet during the replication process, this will usually happen
automatically.
- Replication
Criteria: Change Management - can multiple levels exist in the
system at the same time? This will happen rarely given the automatic
nature of the upgrade system. JGuiGen Replication is not designed to
handle some sites that have tables with different database schemas than
others. (Except in certain cases where target sites have fields that are
never replicated to the primary sites.)
CHANGE MANAGEMENT OF DATABASE SYSTEMS
- Replication
Criteria: Change Management - There should be tools for altering
database objects - Yes
- Replication
Criteria: Change Management - What tasks must be done when the DDL
(Data Definition Language) changes? - As described later, a
"U" record must be inserted in the JGReplicate table to tell
target sites how to modify their tables. Software can be distributed at
the same time that uses the update number to trigger special methods
that fill new columns with appropriate data or move data from one column
to another.
- Replication
Criteria: Change Management - Are tools provided to assist with this
change process at the primary sties? as the target sites? - JGuiGen
provides as GUI for accessing the JGReplicate table. The system for
automatically inserting the update records ("U" records) is
under development.
- Replication
Criteria: Change Management - Is there a repository that indicates
which targets have received the changes? - Yes. The JGReplicate table
contains this information and there can easily be a report to show the
sites that have received updates.
Back to Top
RECOVERY
- Replication
Criteria: Recovery - There should be automatic recovery procedures to
recover from specified types of failure - Most will be handled when the
primary database server is rebooted, a warm back server is activated or
the primary server is regenerated from backups. A primary could easily be
repopulated from another primary or even a site that is normally a target
site.
- Replication
Criteria: Recovery - The system should fail safely - yes
- Replication
Criteria: Recovery - There should be a data reconciliation facility
to com[are one site with another - under development
Back to Top
VENDOR HEALTH
There were several questions about the financial
health of the vendor. This is Open Source software. A user has the full source
code and can modify it or take over its development at any time. The
"vendor" health issues are not particularly relevant.
Given that this is a small open source project developed by
one person, it would seem that JGuiGen Replication does a fairly decent job of
meeting this standard set of evaluation criteria.
****END OF EVALUTAION SECTION
Back
to Top
VI. MINIMAL JGuiGen REPLICATION
As mentioned in a note above, JGuiGen Replication does not require all of
these bells and whistles. For a test drive or a non-critical small application
all you need is the items in the following list.
- The tables to be replicated
need to exist with the appropriate schema at the target site. (They
obviously must exist as the primary site, but I assume you have something
that already exists that you want to replicate.)
- The seven variables in each
table to be replicated need to exist.
- The JGReplicate table needs
to exist at the primary site and the target site.
- Minimal information needs to
be installed in an "M" record on the primary site. (Site number
=1, Enter a name, Process = "Y", Next-available-site = 2,
Next-available-site- lock = "N") JGuiGen supplies a GUI for
entering this data.
- Table records ("T"
records) in JGReplicate on the primary site. These need a table_name,
Process = "Y". Again, JGuiGen supplies a GUI for entering this
data.
- A primary site with a port
open to the Internet or an intranet.
- Settings in the
JGuiGenIni.xml file to attach to a local database and a remote database.
- The JGuiGen application with
it's menu option - Maintenance --> Replicate
This is how we have users who want to use a larger database
system than HSQL populate their new database from the HSQL records that ship
with JGuiGen.
Back
to Top
VII.
SYSTEM DETAILS
Replication involves databases running at multiple sites. For each database
system using JGuiGen replication there is one master primary site. There can be
secondary primary sites. For example a remote office could call into the main
office, but all sites local to the remote office would get their replications
from the remote office. All sites connect to a primary and update it. Secondary
primaries connect to the master primary. The remote sites also get new and
updated rows from their primary site. Each site has a unique site number across
the whole system.
Every table that will be replicated by JGuiGen must have seven variables in
it These are used by this replication system.
- row_id - integer - a unique
id number for the row (regardless of what it is called) This is almost
always the primary key for the table.
- lastChangedDate - timestamp
of when a row was last updated.
- isRemoved - A Flag column
used to indicate that a row should be considered as "deleted"
- JG_Owner int - this contains
the site number that OWNS a row. Owner wins if there are differences. This
is the "master" in the master/slave relationship that is part of
this type of replication.
- JG_Unique_id -long - this is
the unique row number for this row across all sites. It consists of the
local site-number times 10,000,000,000 plus the local site's row_id times
10. For example if site 5 has a row with a row_id of 57, it will generate
a JG_Unique_id of 500,000,000,570 ((5 * 10,000,000,000) + 57*10). Note:
the reason the row_id is multiplied by 10 is that this leaves us
room to temporarily insert a row that ends with a 5 and sorts between
existing rows. This will be used
when comparing a row from another site with a row from the current site.
- JG_SentBy - int - the site
number that sent the record.
- JG_DateWhenSent - timestamp
- a date/time that matches lastChangedDate when a record arrives at a
site. (See the note about the "Problem Windows" below.)
- Note: Right now all of the
fields that start with "JG_" MUST be named exactly. JGuiGen
queries the data_dict table to determine the actual column names used for
the other functions.
Back
to Top
Assumptions
- Every site has a unique site
id. There are NO duplicates. New sites get their site number from a
primary site. (There is a built-in test for violations of this
assumption.)
- There is one master primary
site. It hands out the site numbers.
- If there are secondary -
primary sites, they each can get five new site numbers they can give away.
Once they have given them away they can't help a new site join the system
until they get the new set of site numbers to give away from the master
primary site. The secondary master replenishes any numbers it has given
away when it next replicates with the Master Primary site. (A stored
procedure could be created to get new site ids to give away once current
set has been doled out.)
- Each table being replicated
has a row_id column that contains a unique value within the table. This
row_id is unique at the site. Any row_id may be duplicated at different
sites.
- No site has a local row_id
that is greater than 10,000,000,000.
- Each table has a variable
lastChangedDate (regardless of what it is called) that is updated with the
current date/time when a row is added or changed.
- There will be no rows in any
of the tables being replicated that have a column that is constantly being
updated by some batch process. Since we allow users and other processes to
use the system during replication, it is possible that the row with this
kind of column would miss the replication cycle much or all of the time.
- All tables to be replicated
have a variable isRemoved (regardless of what it is called) that marks a
row to be considered as deleted. Rows cannot be really deleted. This requirement
can be waived with a stored procedure and a JG_Killed_Rows table. There is
a section below that describes this process.
- Each table to be replicated
has the seven variables (regardless of their names) listed above. The
replication system will fill in the JG_Owner, JG_SentBy, JG_Unique_id and
the JG_DateWhenSent columns along with the lastChangedDate column when a
row is replicated. The isRemoved and lastChangedDate columns need to be
updated by applications or with triggers in the SQL database during other
changes.
- The JGReplicate table must
be available for the JGuiGen Replication system to use.
- The JGuiGenIni.xml file must
be available. It must have the email address of the system administrator,
connection information about the local and the primary site this site will
connect to. It should also have an English site name for each site. (Site
17 is nice, but "Jim Smith's Laptop" is easier to work with.)
This could also be stored in a local LDAP system.
- The JGuiGen Data Dictionary
table must be available and must contain data on the tables to be
replicated. The basic data is done when you mark the table to be processed
in the JGuiGen Create system. Someone needs to indicate which columns in
the table are the primary key field, the lastChangedDate columns and the
isRemoved column.
- There needs to be a JDBC
driver for the database.
- The database at the primary
site must support the JDBC function CURDATE() which returns the current
date and time on the database server doing the update.
- Primary servers should
subscribe to a time server. Their date/time stamps are critical.
Back
to Top
Security
- We will be using SSL
tunneling software (already available) between local sites and the primary
to insure that the data transfer is secure.
- We could also institute the
use for the Sourceforge Tumbler. This is code that requires a remote site
to send an encrypted password in order to unlock a port.
- When you connect to a database
you need a user name and password. JGuiGen can obtain these from encrypted
files (or an LDAP system) that are unique to each site.
- We could do a - "here
is my password can I work?" from the remote to the primary. For
example the "Tumbler" project on Sourceforge is java code that
requires a secret password before it will open a port. (Similar to port
knocking.)
- We could require that some
sites come from a specific IP address.
- I would suggest having the
replication software use a special account to access the local database.
This account should be the only one with read/write rights to the
JGReplicate table. This helps to insure that no one accidentally (or
deliberately) meddles with the JGReplicate table during replications.
(When I find a way to issue a JDBC Table-lock, I will implement that as
well.)
Back
to Top
JGReplicate Table
There is a special table at each site called JGReplicate. It contains one
row for each table to be updated and log entries that track replications
attempted and completed.
Note: In the following rows the [] notations indicate which record types use
the field.
The schema of the JGReplicate table is
- replicate_id - int - an id
number of the row.
- table_name {T]- char(35) -
the table name of the table to be replicated.
- database_type [MS] -
char[20] - Oracle, MySQL, etc.
- JGuiGen_specific_table [T]-
char(1) - Y/N - Y for userlogdata, JGuiGen_I18N and data_dict. JGReplicate
is not replicated.
- Type-of-replication [T]-
(Used for "T" rows) Empty or "F" for full replication,
"D" for download only (primary to target) or "U" for
upload only (target to primary).
- process [ETU]- char(1) -
"Y" for yes and "N" if the replication of this table
should be turned off.
- done [ETU]- char(1) - Y for
table was complete and N or blank if not.
- Allow-blank-field-filling
[T]- char(1) - defaults to "Y". Set to "N" if sites
that do not own a row should not be allowed to insert data into empty
fields.
- Current-local-high-date-time
[T]- timestamp - This is an arbitrary MARK time established when the
current site started to send inserts and updates to the primary site for
any table. It sends everything that was inserted between the previous MARK
and the current MARK (At the end it gets moved to previous-
local-high-date-time.)
- Previous-local-high-date-time
[T]- timestamp - Used in the Table record. This was an arbitrary MARK time
when the site starts to send inserts and updates on any table. The site
sends everything new up to this time during the previous update. The MARK
time is used since the system has to send either inserts and have them
processed and then updates and have them processed or vice versa. We don't
want to use a different time for inserts than we use for updates. And we
really don want to use different times for different tables.
NOTE: When a site
tries to replicate and their Previous-local-high-date-time does NOT match the
Previous-local-high-date-time at the primary server an error record is written
and an email is sent to the System administrators. This usually means that two different locations are trying to use
the same site number. The only
exception is when the One-time-mismatch-of-times-allowed flag is set to
“Y”.
- Current-primary-high-date-time
[T]- timestamp - The system updates the "S" record on the
primary server and then queries for the lastChangedDate for that row. This
becomes the PRIMARY-MARK. We will query all tables for inserts and changes
made between the last PRIMARY-MARK and the current PRIMARY-MARK.
- Previous-primary-high-date-time
[T]- timestamp - The PRIMARY-MARK time determined during the last
replication. We will ask for all changes between this date/time and the
new PRIMARY-MARK.
- One-time-mismatch-of-times-allowed
– char(1) Usually “N”. When this
is “Y” the remote site is allowed to replicate even though their times do
not match the times stored on the primary server. Requiring the times to match stops two
sites from using the same site number.
Allowing a one-time relaxation of this rules allows a site to
recreate its database from a backup and then bring itself up-to-date.
- rep-count-started [SMT]-
int - Number of replications started.
- rep-count-finished [SMT]-
int - Number of replications completed.
- out_of_date_after_minutes -
[SM]int - How many minutes (days) can pass before a site is considered
out-of-date. This number is used
by local software to notify local users that they need to replicate.
- out_of_date_after_minutes_error
[SM]- int - How many minutes (days) can pass before a site is considered
out-of-date and an error is logged and an email is sent to the system
administrators. This is used when
the local users did not respond to their system prompts that they were
out-of-date. Each replicating site
sends an email about other sites who have a error-1-email-sent" =
"N" and haven't replicated in this many minutes (days). (The
sender ignores the site if the error-1-email-sent field is already set to
"Y". It sets the error-1-email-sent field to "Y" once
this message has been sent.)
- Flag [all]- Char(1) - This
column indicates which "type" of row you are looking at.
- M row - There is one
row marked "M" for master. This row contains the site number
for this site and other key information.
- S row - Each target
site that attaches to a primary site creates a Site row. This row will
contain counts and dates that show a sites replication status.
- T for Table entries
(using the columns described above)
- L for Log entries
using the next set of columns.
- U rows -Update rows
are marked with a "U". These are used to pass table schema
changes to target sites.
- E rows - these
indicate Error conditions. Error messages are left in these rows.
- J rows - (under
development) special rows used by JUnit tests to pass information between
replicating sites.
- Site_number [all]- int -
site number doing the replication (locally this will always be the local
site's number. At the primary site this will indicate which site
started/finished a table replication.)
- Site_name [all]- Char(100)
- English name for the site.
- Date_rep_started [LSM]-
timestamp - when a table replication was started
- Date_rep_ended
[LSM]-timestamp - when the table replication ended. (If this is empty the
replication aborted somehow.)
- record_created_date
[MSEUL]- timestamp - used by all facilities to indicate when a record was
originally placed in the system. (e.g. when a site record was first
created.)
- Notes [EMLS]- Text - a
field used by the JUnit tests or by administrators to help with debugging
problems. A log of all replications
is kept in the field in the “S” record.
- JUnit-Work [J]- Char(60) -
JUnit test can use this column to pass information between replicating
sites.
- Primary-site [M]- Char(1) -
"M" for the master primary, "S" for secondary
primaries, blank for others.
- Number_of_rows_inserted_by_primary
[L]- int - used in "L" records.
- Number_of_rows_inserted_by_target
[L]- int - used in "L" records.
- Number_of_rows_updated_by_primary
[L]- int - used in "L" records.
- Number_of_rows_updated_by_target
[L]- int - used in "L" records.
- Next_Available_Site_number-1
[M]- int. - next site number to dole out
- Next_Available_Site_number-2
[M]- int. - only used by secondary masters.
- Next_Available_Site_number-3
[M]- int. - only used by secondary masters.
- Next_Available_Site_number-4
[M]- int. - only used by secondary masters.
- Next_Available_Site_number-5
[M]- int. - only used by secondary masters.
- next_available_Site_number_locked
[M]- char(1) - Y/N used to make sure that two sites don't get the same
site number.
- Send_email_on_completion
[MES]- char(1) - Y/N - should each site send the administrator an email
when they successfully complete a replication. (Used for debugging and
early in deployment.) This can be turned on for just one
"problem" site.
- process_mgmt_by_exception [M]-
char(1) ="R" = remotes do it (default), "L" = local do
it (requires a local process to be fired off) - else do not do them at all
- email-bad_rep_days [M]- int
- each replicating site sends an email about other sites whose last
completed replication is more than this number of days old. (The sender
ignores this site if the "error-2-email-sent" field is already
set to "Y". It sets the "error-2-email-sent" field to
"Y" once this message is sent.)
- email-bad-rep-count[m] =
int - each replication site sends an email when they start to replicate
and their count of unsuccessful replications exceeds this.
- update_number [US]- int - a
sequential number that identifies "U" records. Each schema
update should be one higher than the last. On the "S" record,
this field contains the highest update the site completed.
- update_decripton [U]-
char(50) - short description of schema change
- update_spec [U]- char(150)
- this is the schema change that should be made ("ADD" var_name,
var_type, default value, length,, decimals, nullable).
- error-1-email-sent [S]-
char(1) - "N" if last replication date is too old message has
not been sent. "Y" if it has. (These are set to "N"
when a site completes a replication.)
- error-2-email-sent [S]-
char(1) - "N" if the too-many-consecutive-unsuccessful-replications
email message has not been sent. "Y" if it has. (These are set
to "N" when a site completes a replication.)
- error-3-email-sent [S]-
char(1) - extra
- error-4-email-sent [S]-
char(1) - extra
Back
to Top
Over-Booking
the JGReplicate Table
This table is "over-booked". It has at least nine different things
that it takes care of.
- It holds the site number,
name, and type of database in the "M" record
- It holds the list of tables
being updated and the critical times needed to replicate in the
"T" records. On the primary sites these rows indicate if a Table
is currently being replicated. It also indicates what kind of replication
should be done - full replication, download only, or upload only.
- On the primary site it holds
a record for each site that replicates. The "S" record has the
site number and name, the number of times replication started, the number
of times it completed successfully, the date and time for the last
replication (The Note: this column holds a list of all replications in the
“S” record)
- It holds a list of all table
schema update that need to be applied in the "U" records
- On the primary site it hold
the next-available-site-number in the "M" record
- It contains a table by table
log of replication dates and times in the "L" records
- It is used by the
JUnit-tests to pass info between sites for the test ("M" record)
- There can be optional
"J" records used by JUnit test to pass information.
- There are "E"
records that contain error messages.
There are two main values from the previous replication kept
in the "Table" row for each table in the JGReplicate table
- The first major value is
the date/time of the last replication. When a table is being replicated,
one of the first steps is to ask the primary site for all rows at the
primary site that have been inserted or changed since this date. This
date/time will be exactly the same for each table even though the system
processes tables one at a time. We basically want a snapshot at the time
we are calling MARK. "Send me everything new up on tine the MARK
time."
Note: To originally fill a site with data this date/time is set to a
date/time before the system existed. This will cause all of the rows from
the primary site to be sent.
Note: The JGuiGen replication system is not transaction safe. It is
possible with this system if two sites are replicating at the same time to
get part of one site's new data, but not all of it. For example if site 2
asks for all updates on a table and gets lots of rows to process, site 3
could then process the same table and another table before site 2
finishes. This would result in site 2 not having site 3's data for the
first table and having site 3's data for the second table. The next
replication takes care of this.
- The second important value
is the date/time of the most recently updated record in the local table as
of the last replication. Any rows with a lastChangedDate date/time stamp
higher than this date/time will have been inserted or changed locally
since the last replication.
Back
to Top
VIII. RULES-
- Each site gets a site number
- Primary site is usually 1, others are 2 through ?. Sites -99 is special.
- Each row to be replicated
has an Owner (This is the site number of the site that originated the
record.)
- Owner's data always wins if
two sites are different
- Who the Owner is can be
changed from one site to another. If the primary site changes who owns a
row this change is honored by all other sites. This allows the local
administrators to "take control" of a record if a site goes down
or a time critical change must be made and the other site is not available
for any reason.
- Note: The system must use
"Removed rows flags" They cannot really DELETE a row. If a
remote site DELETES a row, it won't exist when they try to tell the base
site about the changes they made and the row won't get deleted at the
primary site. There is a way to waive this restriction. It is described
below and involves a JG_Killed_Rows table and a stored procedure fired off
by a trigger when a row is deleted.
- Any site can fill in blank
text fields (real data wins over empty data - if the owner doesn't like
the data they must put something like "missing" in the field and
their version will overwrite the data they don't like - Owner wins.)
- The ability of sites to fill
in empty text fields in rows they do not own can be turned off in the
JGReplicate table. The local GUI application is responsible for not
allowing users to edit the field, but the replication system will not pass
these changes to the primary site. It will also change the columns back to
blank since in this case the Owner ALWAYS wins, even when the owner's
field is blank.
- Fields in GUI should be (at
the most) "Display Only" if they have data and another site owns
the row. Empty fields can be editable.
- Primary ID fields and
lastChangedDate fields are not replicated.
- Both sites can have active
users during their updates.
- Replications are initiated
by the remote sites - this is a pull technology as opposed to a push
technology. (I suppose we could push as well, but I'm not going there
now.)
- The system described here
takes care of time differences between the primary site and remote sites.
- Sites with site number -99
will get downloads from the base site but they won't attempt to send data
back. This is good for standby servers and building a new primary server
after a disaster.
NOTE: Need a way to display a message to a local user to indicate
that their site has gone too long without replication. This can be added to
JGuiGen itself, but a small process is needed that checks every x minutes and
pops-up a message or checks when a machine is booted up.
Note: Check into DCE Security requirements more
Back
to Top
IX. WALK THROUGH - rows are inserted/changed
Row is added at the primary site.
- lastChangedDate is set to
current date/time.
- JG_Owner is set to 0. If it
is left as zero the next replication will fix it.
- JG_Unique_id is to 0. The
next site to replicate from the primary will update the value to site# *
10,000,000,000 + row_id*10 for the new row.
- JG_SentBy is set to 0.
- JG_DateWhenSent is set to
null.
Row is changed on the primary site.
- lastChangedDate is updated
to current date/time.
- JG_Owner is already set to
the local site number unless no one has replicated since this row was
created.
- JG_Unique_id was set zero
and they updated the first time row was created. It will still be zero if
it has never been replicated.
- JG_SentBy is left alone.
- JG_DateWhenSent is left
alone.
Row is added at a remote site
- lastChangedDate is set to
current date/time.
- JG_Owner is set to 0. The
replication system will update this with the site number.
- JG_Unique_id is set to 0.
The replication system will change this to the site number * 10,000,000,000
+ the row_id*10 for this row.
- JG_SentBy is set to 0.
- JG_DateWhenSent is set to
null.
Row is changed at a remote site.
- lastChangedDate is set to
current date/time.
- JG_Owner has already been
set (unless this row has never been replicated).
- JG_Unique_id is zero if this
row hasn't ever been sent to the primary site or has an id number that was
inserted by the Replication system when the row was sent to the primary
site.
- JG_SentBy is left alone.
- JG_DateWhenSent is left
alone.
Note: As long as the row_id is automatically set and the lastChangedDate
column is automatically updated, JGuiGen can replicate tables that were created
for applications other than JGuiGen. See the section of JG_Killed_Rows.
Row is deleted anywhere - this is the same as a change since the
"isRemoved" flag is a column in the table and it is changed.
lastChangedDate field is updated to current date/time. See the section of
JG_Killed_Rows.
A site attaches to the base database.
- The system checks with the
local JGReplicate table to get the local site number
- The system checks in the
JGuiGenIni.xml file (or LDAP system) for the email address of the system
administrator.
- The system checks in the
JGuiGenIni.xml file for the addresses of the local database and the
primary site that it will be connecting to. In more secure environments,
it gets this information from encrypted files on the local system or a
local LDAP system.
Note: JGuiGen can obtain the database user name and database password from
encrypted files. If it does not find an account name it should use to
connect to the databases in the JGuiGenIni.xml file, it looks for them in
encrypted files. If it can't find them it creates an error record
("E" record) and emails the system administrator, puts up an
error message and quits.
- It connects to both
databases.
- It sends any error records
("E" records) that exist locally and have not been sent to the
primary (Process = "Y" and Done = "N") Change Done to
"Y".
- It asks the primary site
for it's site number, name, and database type. It stores this away. It
will use this to update JG_Owner and JG_Unique_id number on the primary
site if it finds that they are empty.
- If it does not have a site
number it requests one from the primary site and then creates a Site
record (“S” record) on the primary site. It is presumed that the
"new" site got its table schema from JAWS or something else
current. It requests all of the "update schema" records from the
primary system and stores them. It does not process them. (We might
include a way to "check" if the change already exists.)
- It updates its personal
Site record on the primary site ("S" record), bumps the
replication started count, Enter a date/time that replication started and
bumps the count of how many times it started a replication and didn't
finish.
- It updates the M record at
the local site. The updates indicate that a replication started. They will
be updated when the replication finishes.
- If remote
management-by-exception is turned on, it checks the primary site for all
sites that have error-1-email-sent = "N" and has "S"
records that indicate they have not replicated during the last X days. It
gets value of X from the primary site "M" record. If any
offending sites are found, tt creates an error record, sends an email to
system administrators and marks the error-1-email-sent column to
"Y" so the email won't get sent again.
- If remote
management-by-exception is turned on, it checks the primary site for all
sites that have error-2-email-sent = "N" and have "S"
records that indicate they have more than Y consecutive replication
failures. They get the value of Y from the primary server. After the error
record is created and the email is, the error-2-email-sent column is set
to "Y" in the S record.
- Each replication site sends
an email when they start to replicate and their own count of unsuccessful
replications exceeds a number supplied by the primary site..
- It finds the highest
"update" record it has in the local JGReplicate file and asks
the primary site for any updates that are higher than this. ("U"
records)
- It tries to apply these
updates. If it fails it emails the administrator, creates an error record
("E" record), shows an error message and quits.
- When it finishes, it
updates the remote "S" record and the local "M" with
the update number it just completed. It emails if it can't do the update.
- It asks the primary site
for a list of tables to be replicated and marks these tables in the local
JGReplicate table. It adds new rows to the local JGReplicate table if the
primary site sends a table name that is currently not in the local
JGReplicate table. These rows are designated with a "T" in the
flag column. These new rows are set with very old dates so that the first
replication will populate the table with all existing records.
- The next thing that happens
when JGuiGen starts to process is that the schema of each table to be
replicated is compared with the schema at the primary server. The schema
for the primary site and the remote site must match. The columns do not
need to be in exactly the same order, but each column on the primary site
must exist on the remote site and vice versa. The column type and size
must be the same for each column. (Note: There can be minor discrepancies
here. For example some systems have a Boolean data type and other use an
integer for the same purpose. JGuiGen tries to takes these differences
into account.) If a schema does not match the system creates an error
record ("E" record), sends an email to the system administrator,
and displays a message describing the difference. The notes field in the
JGReplicate field is updated with an error message and the replication is
aborted.
- Once the schema check
passes, the local Done field is set to "Y" in the Update record
("U" record).
- The system now checks if
any of the Table ("T" records) rows in JGReplicate are marked as
DONE ="Y". If so, the system did NOT complete the last
replication (Done is set to "N" after all tables have been
processed). The system then starts with the first table that was not done
last time. This avoids doing several tables over and over and then always
crashing on a particular table. If the system always started with the
first table, some tables (those updated before the bad table) would
contain newer and newer data and the data in the others would never
change.
NOTE: Should we try
to finish the previous replication first?
Probably not since the later tables will now have “new data” and if
there are dependencies we have ordered the tables so that the earlier tables
should be done first. Even processing the one table first could cause a
problem.
- If the first "not
done" table replicates, then either the table has been repaired or
the last failure was due to a lost connection. All of the rows are marked
as NOT DONE and the replication starts from the table listed in the first
"T" record. (This avoids having the last tables being several
days more current than the first tables.)
- When the replication system
starts working on a table it updates the count in the rep_count_started
field for that table. ("T" records)
- The system creates an error
record ("E" record) and sends an email to the database
administrators that the replication stared in an unclean environment. (Any
of the tables are marked with a Done = "Y".) The Notes column
for that table in the JGReplicate table would be updated with an error
message.
- It loops through the rows
in the JGReplicate table with a flag of "T". For each table
listed there it uses the following overall process:
- A "Log"
row is added to the JGReplicate table with a Flag of "L" and
the table_name of the table being replicated, the date/time the
replication was started and the site number. The row_id of this new row
is saved so that it can be updated when the system finishes the table.
- A log row is added
to the JGReplicate table on the primary machine with the same information
that was stored in the local database.
- It processes the
data in the row using procedures documented below.
- After the data in a
table is processed
- The row for this
table ("T" record locally) in the JGReplicate table is marked
as DONE = "Y".
- The current MARK
date/time that was determined during processing is updated into the
previous-local-high-date-time field. (See the Note on what I call the
"Problem Window". This window starts when this site starts
updating the primary server and ends when it finishes these update. The
primary server has rows that this site already knows everything about.
On the other hand other users or other replication sites may have
inserted or updated rows during this window of time. The replication
system must determine which rows that have a lastChangedDate in this
window should be ignored and which should be sent.)
- During the update
process the replication software established a date/time on the primary
server that I call the PRIMARY-MARK (this uses the Primary server's time
zone) and we query for any record that was inserted or updated since the
previous PRIMARY-MARK and less than or equal to the current
PRIMARY-MARK. This is written to the previous-primary-high-date-time
field. This is the value we will use to get rows from the primary site
when we next replicate. (Note: See the "Problem Window"
section.)
- The
rep_count_finished is bumped by one in the "Table" row of
JGReplicate when a table is finished
- The date_rep_ended
in the "Log" record in the local JGReplicate table is updated.
- The date_rep_ended
in the "Log" record in the primary JGReplicate table is
updated.
- The
Number_of_rows_inserted_by_primary is updated the Log record.
- The
Number_of_rows_inserted_by_target is updated the Log record.
- The
Number_of_rows_updated_by_primary is updated the Log record.
- The
Number_of_rows_updated_by_target is updated the Log record.
- At the end of the
process all table ("T" in the flag column) rows are marked as
DONE ="N".
- Processing an individual
table uses the following steps.
1. For
each table the system gets the previous-primary-high-date/time using the time
zone on the primary machine and the previous-high-local-date/rime (using the
local time zone) on the local machine.
2. The
system is connected to the primary server. It establishes the current
PRIMARY-MARK time by updating the "S" row in the JGReplicate table
and then querying for the lastChangedDate column.
3. It
then queries for all rows at the primary site for one table where the date/time
stamp is higher than the previous-primary-high-date-time (previous
PRIMARY-MARK) and less than or equal to the current PRIMARY-MARK.
4. The
current PRIMARY-MARK is the current-primary-high-date/time and it is updated in
the local JGReplicate table. It will become the previous-primary-high-date/time
when the replication is finished. The next time the site asks for data on this
table it will want everything newer than this date/time and less than or equal
to a new PRIMARY-MARK time it will establish then. Note: This time is from the
primary site and may be several time zones away from the local site time.
5. The
system updates a local record and gets the lastChangedDate for that update.
This becomes the current MARK time. This time is used to update the
current-local-high-date-time in the local JGReplicate table.
6. Once
the system has the date/times and a record set of updated rows from the primary
server it loops through the rows. If it finds an empty JG_Owner or JG_Unique_id
column it uses the site number from the primary site, calculates what these
should be and updates the primary site. Next it checks the local database for a
row with the JG_Unique_id equal to the JG_Unique_id value in the row sent from
the primary site.
1. If
the row does NOT exist locally it is a new row.
- add the row.
- set the
lastChangedDate and the JG_DateWhenSent to the local date/time. Set the
JG_SentBy column to the site number for the primary server.
- Later when we ask
for changed local records ignore record that came from the primary
server and the two dates are still the same.
2. If
the row exists and someone else owns the row then it is a changed row
(JG_Unique_id exists on local machine).
- Update the site
row with all the new values unless local site is not empty and the
base's version is empty. (If the primary site indicated that secondary
sites were not allowed to fill empty columns then all of the owner's
values are used.)
- If the local site
has non-empty values and they are allowed to overwrite blank value- set the lastChangedDate to the
current date/time but do not set the JG_DateWhenSent to the same
time. If these conditions were
not met set the lastChangedDate to the current date/time and set the
JG_DateWhenSent column to the same time. In either case set the JG_SentBy
column to the primary Site's site number. Note: The reason for having
different dates in the lastChangedDate field and the JG_DateWhenSent
fields is that shortly we will update the primary server with all local
rows that have dates between the PREVIOUS-MARK and the CURRENT-MARK.
The only rows we don't send are those that came from the primary site
and the two dates are the same. Since we want the row to be picked up
by this query we make sure the two dates are different.
3. If
the row exists and the local site owns the row, then someone else changed the
row.
- Update local data
where local site column is empty and theirs is not (unless this has
been disallowed).
- Change
lastChangedDate and JG_DateWhenSent to current date/time. Change the
JG_SentBy column to the primary Site's site number.
- If there is data
in the downloaded row that conflicts with data in the local row (it was
blank and someone edited, but someone locally also edited it with
different info), set the lastChangedDate to the current date/time and
the JG_DateWhenSent to some other time. See the note in the previous
paragraph to see why we do this and why it results in our row being
sent to the primary server with our data in it.
At this point we have all of the changes the
Primary Site knows about.
7. The
local site selects all local rows that have a JG_Unique_id = 0 (New Rows) OR
where the lastChangedDate is greater than the previous-high-local-date/time and
less than or equal to the MARK value determined earlier. (Again, note that
there will be local rows that were just added or changed from the primary
server that have date/time stamps that are included in this query. We will get
rid of them by comparing where they came from and if the lastChangedDate equals
the JG_DateWhenSent.)
8. The
local site sends inserts of all of the rows with a JG_Unique_id = 0 (New Rows).
It fills in the JG_Unique_id, the JG_Owner, the lastChangedDate, the
JG_DateWhenSent and the JG_SentBy columns.
9. The
local site sends updates for all of the rows with a lastChangedDate greater the
previous-local-high-date-time and the MARK time. It fills in the JG_Unique_id,
the JG_Owner, the lastChangedDate, the JG_DateWhenSent and the JG_SentBy
columns.
Loop through rows (Be sure to include rows where
IsRemoved Flag is set with the value that shows the row should be treated as if
it was deleted.) Query the JG_Killed_Rows tables for deleted rows in the right
time frame for this table if this variation is being used.
0. If
the JG_Unique_id is zero - this is a NEW row
- Update the
JG_Unique_id to the local site# * 10,000,000,000 + row_id*10 in the
local table.
- If the JG_Owner
field is empty, update it to the local site number.
- Insert the row on
main site - the lastChangedDate on the primary machine is set using the
JDBC function {fn CURDATE()}. This will set the lastChangedDate to the
current date/time on the primary database machine. Use the same
function to update the JG_DateWhenSent column. Fill JG_SentBy and
JG_Owner with the local site's site number.
- The
lastChangedDate locally is left with its current value.
1. The
other rows are rows the local site changed.
1. ***
If local site owns the row,
- The row should
already contain any changes made by some other site.
- Send an update of
all non-empty columns and set the lastChangedDate and the JG_DateWhenSent
on the primary server using the JDBC CURDATE() function. This will set
the date/time stamp to the current date/time on the primary server. If
the primary indicates the no one but the owner can enter data then
send the empty text fields as well. Update the JG_SentBy column to
this site's site number.
- Leave the
lastChangedDate on local machine alone.
2. ***
If local site does not own the row.
Request a copy from the primary - We need this row to be sure about which
fields are empty on the primary server so that we can update them.
- Update local row
with their data if the two diff. (Except where ours is not blank and
the corresponding column from the primary site is empty)
- Build an update
where ours is not empty and theirs is
- Update the
primary site and set the lastChangedDate and JG_DateWhenSent to the
current date/time using the CURDATE() function on the primary site.
Update the JG_SentBy column to the local site's site number.
- Leave the
lastChangedDate column alone at the local site unless values were
changed and set it to the current local date/time.
10. The local machine
uses the same MARK time for each of the tables being processed and saves that
as the previous-high-local-date/time. This will the highest change we just
applied from the primary server. During the next replication anything higher
than this will have been changed locally since the replication that was just
finished.
The local system saves the
current-primary-high-date/time as the previous-primary-high-date/time. The next
time the site replicates and asks the primary site for new/changed records it
wants records that are higher than this value. NOTE: See the section called
"Problem Window."
The DONE field is marked and the
rep-count-finished is bumped by one. The log rows are updated in the local
JGReplicate table and the JGReplicate table at the primary site.
- The system loops through
another table.
- When all are done - mark
all of the DONE columns as "N" (not done). The local system is
"clean" and ready for another update. The steps used to complete
a table were listed above.
- Updates its personal Site
record on the primary site ("S" record), bumps the replication
finished count, Enter a date/time that replication ended and sets the
count of how many times it started a replication and didn't finish to
zero. Appends a line to the Notes text field that shows the date/time of
this replication. The error-1-email-sent through error-4-email-sent
columns are set to "N". The site has successfully completed and
these columns won't be used until it is either late replicating or tries
and fails several times. Then the columns need to be "N" so the
management-by-exception routine will create error rows and send emails.
To change the owner of a row.
- Local sites can change the
owner from their site to another site. This works.
- Local sites cannot change
the owner if their site is not the owner
- primary site can change the
owner and that always wins.
Back
to Top
XX. AVOIDING ISREMOVED FIELD REQUIREMENT
There are notes on How JGuiGen Replication handles situations where existing
application actually delete rows from a table that is part of the replication
system.
- A new table must be added
to the local database. The table is called JG_Killed_Rows and it has the
following structure:
- kill_id - int -
primary key for the table
- table_name -
char(35) - name of the table where a row was deleted.
- row_id - int- the id
of the row that was deleted.
- Site_id – int – the
site number that originated the deletion.
- User_name – char(30)
– (if this is being used by other applications) – user who deleted the
row.
- JG_Unique_ID - the
replication id of the row that was deleted
- date_time_killed -
timestamp - when the row was deleted
- lastChangeddate -
timestamp - date/time this row was added to JG_Killed_Rows
- A stored procedure must be
installed that is activated by a trigger when the row is actually deleted.
This stored procedure must insert a new row into the JG_Killed_Rows table.
- When a replication occurs,
the JGuiGen replication software will query this table for rows with a
date_time_killed that match the time period being replicated. If it finds
a row, it will attempt to delete the row with that JG_Unique_id on the
primary system.
- The primary system will
need the same type of stored procedure and a JG_Killed_Rows table if it is
not the master primary site. It will need to propagate the delete up the
chain and down to other replicating sites.
XXI. MOVING A SITE TO A NEW PRIMARY SERVER
Here are the steps needed to do this.
- Change JGuiGenIni.xml file
or LDAP references to point to the new server.
- Install new SSL
certificates.
- Change database username and
database passwords (In encrypted files or LDAP)
- Adjust the Table
("T" records) in JGReplicate. The
previous-primary-high-date-time may need to be adjusted to reflect a new
time zone.
SPECIAL NOTE ABOUT SECONDARY PRIMARY SERVERS
I would think that it would be prudent to stop users from replicating to a
secondary server while it updates from its own primary server. It shouldn't
really be a problem, but I feel better if source for other site's data runs in
a fairly pure envrionment while it gets its down data.
These servers will need to set the Process field in the Master record ("M"
record to "N") and wait for replicating sites to finish before they
start to replicate. They cannot have active users or remote sites replicating
when they themselves connect to a remote server and replicate from it.
Back
to Top
XXII.
SPECIAL NOTES ABOUT THE JGREPLICATE TABLE
This list.needs to be fleshed out from notes that are scattered through this
document.
Notes on getting a new site number
If a site needs a site number
- It sends an update to the
primary to change the Next-site-number-lock field to "Y" in the
master site record ("M" record) and the next-site-number-lock
field = "N".
- If the update succeeds it
queries for the next-site-number.
- If the primary site is the
master primary site it sends an update to bump the Next-site-number-1 and
set the next-site-number-lock to "N"
- If the primary site is a
secondary primary it checks it's five possible site number for one that is
not zero. It makes the same update except it sets the Next-site-number it
used to zero. (The secondary site will get replacement numbers when it
replicates from the master primary server.)
- If the original update
fails, it waits 10 seconds and tries again. After three tries it aborts,
creates an Error record ("E" record)c and sends an email to the
administrators.
- If the next-site-number
fields are all zero, it aborts, creates an error record ("E"
record), and sends an email to the administrators
Notes about the Master Record ("M" records) in JGReplicate
There is one master record at each site. This contains overall information
about the current site.
- Contains the site English
Name
- Contains the site's official
site number
- A variable indicates if the
site is a master primary, secondary master or a target site
- Replications started
- Replications finished
- Send_email_on_completion
- Process_mgmt_by_exception remotely, locally or not at all
- email_days - each
replicating site sends an email about other sites who have a "Process
= Y" and haven't replicated in this many days.
- email-bad_rep_days - each
replicating site sends an email about other sites who have uncompleted
replications that are more than this number of days old.
- email-bad-rep-count each
replication site sends an email when they start to replicate and their
count of unsuccessful replications exceeds this.
- Next-available-site-number
1, 2, 3 4, 5
- Next-available-site-lock
- ....
Notes in Site Records ("S" records) in JGReplicate
Note: There will be no Site records at target/remote sites. These only exist
on primary sites and contain information about the sites that are replicating
from this primary site.
- When a site attaches to a
primary site, they update a field with the date/time they stared and bump
a column that counts how many times they started.
- They also bump a field that
counts how many times they started but didn't finish
- When a site finishes a
table schema update, they update a column that contains the highest update
they have finished.
- When the site finishes it
updates a field with the date/time finished. This also bumps a column that
counts how many times they finished.
- When they finish they set
the counter for how many times they tried and did not succeed back to
zero.
- The site record contains a
site number, an English name, and type of database for the site.
- There is a field that
contains the date/time the site record was created.
- Update_number - the highest
update this site has completed.
Notes about the Table Record ("T" records) in JGReplicate
- Contains the name of a
Table to be replicated
- Indicates the type of
replication (Full, download to target only, or upload from target only.)
- A flag to indicate if
processing of this table is currently activated
- Count to times a
replication of this table started
- Count of times a
replication of this table ended
- ....
Notes about the Update Records ("U" records) in JGReplicate
- Contains a unique update
number that gets sequentially larger
- Contains a short
description of the update.
- Contains the specification
for the update
- Contains the date/time the
update became available
- at the local site there are
fields that indicate an update should be processed and one that indicates
that the update has been processed.
- ...
Notes about the Error Records ("E" records) in JGReplicate
- Contain the site number that
reported the error
- Contains the site where the
error occurred
- Contains the text of the
error message.
- Contains the date/time the
error record was inserted
- Contains the date/time the
error occurred if possible
- Indicates if an email was
sent
- At target sites there is a
flag that indicates the record should be sent to the primary during the
next replication and one that indicates it was done.
- ....
Notes about the Log Records ("L" records) in JGReplicate
- Contain the site number that
created Log
- Contains the table name
being logged
- Contains the start date/time
- Contains the ending
date/time
- Contains a
Number_of_rows_inserted_by_primary
- Contains a
Number_of_rows_inserted_by_target
- Contains a
Number_of_rows_updated_by_primary
- Contains a
Number_of_rows_updated_by_target
- Indicates if there were any
issues
- ....
Notes about the JUnit Records ("J" records) in JGReplicate
- Contain the site number that
created record
- Contains the site number
that should use the record
- Indicates if the record
should be processed.
- Indicates if the record has
been processed.
- Contains the start date/time
- Contains information that a
JUnit test uses to see if updates that should have been replicated
actually have been.
- ....
Back
to Top
XXIV. REPORTS
I was going to list the reports we will be creating in the JGuiGen application's
report generator, but I realized that there will basically be one report for
each of the sections just listed. The fields will be those listed as relevant
for each type of record (M, S, L, E, etc.)
Back
to Top
XXIV.
GENERAL NOTES
- Email to System
administrators tell about errors. Note there should always be error
records created locally and on the primary where an email is sent.
- when a replication
starts after a previous replication failed to finish
- A difference in the
schema was found.
- When sites haven't
replicated
- Lost connections
- Too long without a
replication
- Too many tries that
did not complete
- When sites are
having problems finishing their replications
- Attempt to update a
table schema failed
- An update failed
(due to dependency issue or other wise)
- Message -
"Tried to find the real column names in table X for the functions
"lastChangedDate, IsRemoved, and Row_ID" Could not find.
- Programmatic error
when trying to process the method triggered by an update.
- Other errors - to be
determined
- Any site can take over as
the primary site. Mark the "M" record as "Y" in the
"primary_site" field. You will need to update the JGuiGenIni.xml
files to show where the connection is and update the SSL certificates. and
then have remote PC's attach to the new primary site. See the longer
description about this process available elsewhere.
- Any site can redo scrambled
data by setting the previous-primary-high-date/time values in the
JGReplicate table back
- If a site crashes and
restores from a backup that is several days/weeks old, the
previous-primary-high-date/time will be from before their crash and they
will get all of their own records back as well as new records from other
sites. (Note: There is a field in the site record
that must be set to allow them to do this. The field allows a one time
replication where the date at the primary site does not match the
date at the target site.)
- If there are dependencies
and triggers in the databases (payment records that require the invoice
being paid exists). then the order of the tables in the JGReplicate table
should be set very carefully. We are not doing transaction level
processing so it is possible for a remote site that lost its link during
replication to have part of another site's data but not all of it. An
email is sent to the system administrator if a failure occurs due to this
issue.
- The owner of a row can be
passed around. A salesperson might enter a proposed bid and transfer
ownership to a higher level authority. This person could then make changes
that the salesperson could not override. Until the salesperson finishes
with the bid the higher authority could see the proposed bid and fill in
empty spaces, but they couldn't change what the salesman had entered until
the salesperson was finished with it and transferred ownership.
- Special note on
JGuiGen-Specific tables. The three tables marked as
"JGuiGen-specific" are treated a little differently. Since we
plan to let sites use our Sourceforge table to update their primary sites
table schema these tables are special. Local sites can add columns and
these will be ignored. The remote sites MUST have all of the columns the
primary site has. Table replication will ignore data sent for columns that
do not exist on the primary server.
Back
to Top
XXV. DISCUSSION OF THE "PROBLEM WINDOW" DURING REPLCICATION
We are letting users access the database being replicated while the
replication process is going on. This is done to minimize the impact on users.
(Think of road warriors who can't use their laptop while the replication is
going on and it is 3:00 AM. Replication won't happen and we all know it.) This
causes some problems. The most glaring is that changes made by the users can
have approximately the same date/time stamps and records being updated by the replication
service. I call the period between the time that the replication service
started actually changing rows and when it stops changes rows in a replicated
table the "problem window." Here is how JGuiGen Replication handles
this.
- Every row that is changed
has its lastChangedDate set to the local date/time. This is done
automatically or by the JGuiGen Replication software.
- When the JGuiGen
Replication software changes a row with data from a remote site (a primary
or a target site) it changes the lastChangedDate. It set another column
called JG_DateWhenSent to the same date/time. It also sets a column called
JG_SentBy to the remote site number.
- All rows updated by the
replication system will have two date fields that are equal. Rows updated
by a local user will not change the JG_DateWhenSent column so the two will
be different. If there is a value in the JG_DateWhenSent column it will
now be older than the lastChangedDate column.
- This allows us to
distinguish between records changed by the replication process and those
changed by other users.
- Assume we get 10 rows from
a remote server. They will all have current date/time stamps and therefore
are much more recent than the replication that completed yesterday.
- Also assume that a user
happened to jump on one of these new records and changed it. The
lastChangedDate will be changed, but the JG_DateWhenSent will not be
changed.
- The replication system
updates a row in the local JGReplicate table and then queries to determine
the date/time stamp of that change. It calls this the MARK time and starts
to send inserts and changes for all records with lastChangedDates between
the time of the replication yesterday and the MARK time. This MARK time is
the same for all tables even though it takes time to work through the
various tables. The system specifically excludes records where the
lastChangedDate is equal to the JG_DateWhenSent and the JG_SentBy is the
remote site it is talking to. All of the local changes (including the one
record just changed) will be sent and the 9 other records we just got will
not be sent.
- If our industrious local
user changes another of the 9 records, but it is after the MARK time this
change will not be replicated until next time, even if rows are still
being sent to the remote server.
There are two major concerns with letting users use
the system during replication.
- The first involves
situations where a local transaction takes place that involves several
tables and it starts before the MARK time and ends after it. There could
be very small discrepancies once the SQL system executes a commit. Rows in
one table could have a slightly different time stamp from the rows in
another table. If the MARK time occurred at just the right moment the
replication process would send some of the changes and not the rest. If
this is a problem on your system, then users (or other processes) will
have to be suspended while the replication is going on.
- The second problem occurs
when a column is being constantly updated by some automatic process. By
the time the replication process tries to update a row the date on the row
being updated all the time is already higher than MARK. When we do it
again and establish a new MARK, the offending row is updated to a
date/time higher again. This row conceivably would never be replicated.
Your site needs to determine if any of these situations exist and
determine how you wnat to handle it.
- There is one more concern
that needs to be mentioned here. Your system might have conflicting
dependencies that JGuiGen Replication cannot handle. Here is the
situation. One transaction inserts data into Table 1 and then data into
Table 2. The data going into Table 2 is dependent upon the data in Table 2
and dependency checks will cause the transaction to fail if the data is
not correct. Normally you would just insure that Table 1 is earlier in the
update cycle than Table 2. The problem arises if there is another type of
transaction that does something to Table 2 first and then makes changes to
Table 1 that are dependent on the changes to Table 2. We have no way to
resolve this.
- Note the procedures
described in this document establish a local MARK time and then use this
time for all tables so that all table are queries for updates for example
the same time periods. While this can cause problems with transactions
that fall over a boundary, it reduces problems cause by getting updates
from one table that are 10 or 15 minutes later than those in another table
(Assuming a very slow link and update process.
- There is a similar
PRIMARY-MARK time established for the remote server.
- If you are wondering where
the "MARK" came from, I think of it as looking at my watch and
saying the time boundary starts when I say mark..... MARK.
Back to Top
XXVI. PLANNED JUNIT TESTS
Setup
- Primary site - add a record
- Primary site - changes a record
Setup
- Site 2 - add a record
- Site 2 - changes a record it owns
- Site 2 - changes a record it does not own
- Site 2 = Create an error record with Process = "Y" and
Done = "N"
- Site 2 Replicate
Tests
- Check that added row has correct JG_Owner and JG_Unique_id fields
- Check that the changed row has same JG_Owner and JG_Unique_id it
had before
- Check JGReplicate for correct values in data_dict table row
- Check JGReplicate for correct Log entry
- Check JGReplicate on primary site for correct Log Entry
- Make sure the dates used on the primary are its date/time not site
2s.
- Check primary for added record. - Check lastChangedDate field,
- Check that the row added on the primary site now has a JG_Owner,
JG_Unique_id, JG_SentBy and JG_DateWhenSent.
- Check the the lastChangedBy and JG_DateWhenSent are equal.
- Check that the row changed on the primary site now has a JG_Owner,
JG_Unique_id, JG_SentBy and JG_DateWhenSent.
- Check the the lastChangedBy and JG_DateWhenSent are equal.
- Check that site 2 has the row added by primary site and that the
values are the same
- Check that site 2 has the row changed by primary site and that the
values are the same
- Check that the Error record was added to the primary's JGReplicate
table. JG_Unique_id and JG_Owner.
Setup
- Site 3 - adds a record
- Site 3 - changes a record they own
- Site 3 - changes a record they don't own
- Site 3 - replicates
Tests
- Check that added row has correct JG_Owner and JG_Unique_id fields
- Check that the changed row has same JG_Owner and JG_Unique_id it
had before
- Check that the row added by site 2 has arrived and has the same
JG_Unique_id and JG_Owner it had on Site 2.
- Check that lastChangedDate and JG_DateWhenSent are equal and
different than when they were sent.
- Check that the row changed by site 2 has arrived and has the same
JG_Unique_id and JG_Owner it had on Site 2
- Check that lastChangedDate and JG_DateWhenSent are equal and different
than when they were sent.
- Check JGReplicate for correct values in data_dict table row
- Check JGReplicate for correct Log entry
- Check JGReplicate on primary site for correct Log Entry
- Check that site 3 has the row added by primary site and the values
are the same
- Check that site 3 has the row changed by primary site and the
values are the same
-
- Setup
- Site 2 replicates
Tests
- See that the error record created at the beginning has Done =
"Y" now
- see if Site 2 got the records it sent again and if it ignored them.
- Check that nothing sent by the primary during the last replication
is being sent back.
Back to Top
XXVII. CODE STEPS
[square brackets represent specific info on updated HSQL to MySQL]
- Create JGReplicate Table
- Create a JGuiGen Application to edit it
- Get email address for administrator and site
name from JGuiGenIni.xml (If site number is empty - use machine name or
IP address)
- Get connection information from JGuiGenIni.xml
- Connect to local database (E record and msg
about failure) [MySQL site set to -99]
- Connect to remote database (E record and msg
about failure) [HSQL site set to -100]
- Get site number from local. Create an
"M" record and populate it if necessary
- If there was no local site number - get it from
the primary. Get all update schema rows ("U" rows) from the
remote and add them to the local JGReplicate table. Do not process them.
A new copy of the program should match the current tables schema at the
primary site.
- Start a log record locally - save row_id
- Start a log record on the primary - save the
row_id
- Update the "S" record on the primary
- bump count started
- Get is primary, site number, site name etc.
from primary [HSQL is set ]
- Msg and quit if error
- Find the local highest update number
- Request any higher at the primary site
- Show a message about what the schema change
should be if any exist and quit. We will do changes manually for now.
- Mark all local "T" records with
"N" in the process column
- Ask the primary which tables ("T"
records) should be marked with a Process="Y"
- Mark them. Add any new "T" records
and set timestamps for 1/1/1980
- Check local "T" rows for DONE =
"Y" - Show MSG - try first unmarked table If it succeeds, mark
all "T" records as NOT DONE and start from the beginning.
- Get real names for the lastChangedDate,
isRemoved and row_id columns from data_dict Msg and quit if missing.
- Loop through tables [Data_dict, JGuiGen_I18N,
userlogdata]
- Get previous-primary-high-date-time from
"T" record
- get MARK time by updating a local record and
getting the lastChandedDate
- set the current-date-time-at-start in the
"T" record
- update the "S" record for this site
in the primary Server's JGReplicate table and determine the lastChangedDate
for this update. This become the PRIMARY-MARK time.
- query primary for all records with a date
higher than the previous-primary-high-date/time and less than or equal
to the PRIMARY-MARK time.
- Set the current-primary-high-date-time in the
"T" record to the PRIMARY-MARK time
- Process rows
- Update "T" logs
- Loop
- BIG DEFER - processing the local records and
sending them back
- update "S" record and quit
Steps deferred
- Email about Connect to local database failure
- Email about Connect to remote database failure
- Email about error in Get is primary, site
number, site name etc. from primary
- Management by exception late replication
reporting
- 1. Get all active sites that have replicated
- 2. get all done in last x days
- 3. Send email about those in list 1 but not in
list 2
- Management by exception problems with
replication reporting
- 1. Get all who have not completed in the last Y
days
- 2. Send email about those in the list
- Processing schema changes
- Send email about starting with "T"
records that have DONE = "Y"
- Send email that getting real variable names
from data_dict failed.