JGuiGen
Replication System - Version 15 - 6/29/2006
Written by G. Hale Pringle Ed. D.
For the purposes of this document, Data Replication is defined as
synchronizing selected tables on two or more database servers. The JGuiGen
Replication system gets new and changed rows from one database and sends
local new and changed rows to another server. At the end of the process
the two tables are the same with the exception of a few system fields that
should be different. The objective is
to take a snapshot of each system and update the other system to match this
snapshot.
Beyond Simple Replication
When you have remote sites that must stand alone (satellite
offices, road warriors, sites with intermittent Internet connections, etc.) the
usual solution is to replicate the data between sites. This may be as crude as retyping data or as
complex as Oracle’s Replication System which requires experienced Data Base
Administrators at all of the sites.
JGuiGen has a built-in replication system that is much better than
retyping data, but is not as “strong” as the systems developed by major
database venders.
Having remote sites and evolving software means that you must take
care of five major concerns.
1.
Initial Distribution of software and data – We will be using
Sun’s JAWS, but there are other options (such as the software distribution
system developed by the Eclipse project.)
With JAWS, a user attaches to the Internet and automatically gets a copy
of the software. After the first time
they will run with the existing copy unless they find a newer copy when they
link to the Internet. In that case the
new copy is automatically installed.
2.
Updating the software – JAWS does this very elegantly.
3.
Updating the schema of the remote databases to include changes
required by “new” versions of the software.
This is not part of the JAWS system.
4.
Running one-time procedures required by updated software and
modified data schemas. Again this is
critical, but is not covered by JAWS or other systems that focus on
distributing “versions” of software.
5.
Verifying the accuracy of replicated data.
The
JGuiGen Replication system addresses all five of these concerns.
INDEX
I.
Background
II.
Type of replication
III.
Objectives
IV.
Overview - dummy walk through in English
V.
Criteria for Evaluating Replication Systems - Notes about how JGuiGen stands
up.
....
Architectural Requirements
....Functional
Requirements
....Administrative
Criteria
....Resource
Management
....Change
Management
....Recovery
....Vendor
Health
VI.
Minimal JGuiGen Replication
VII.
System Details
....Assumptions
....Security
....JGReplicate
Table
....Over-Booking
the JGReplicate Table
VIII.
Rules - Requirements that must be met.
IX.
Walk Through - following the process step-by-step
XX.
Avoiding the IsRemoved Field Requirement
XXI.
Moving a site to a new primary server
XXII.
Special Notes about the JGReplicate Table - fields used by various types of
records
XXIII.
Reports that will be available in the JGuiGen Replication system.
XXIV.
General Notes
XXV.
Discussion of the "PROBLEM WINDOW" during replication. XXVI.
Planned JUnit Tests
XXVII.
Code Steps - coding that is underway.
Back
to Top
I. BACKGROUND
I have designed and worked on homegrown data replication systems since the
mid 1990s. This document is my latest and gr8t3%t (nope - just couldn't say it)
thinking and it builds on the existing work. There are some differences but
this is basically what I spent those years developing and running. The systems
I developed connected several dozen sites in 22 different states.
Desktop applications (not web-based) that work with databases fall into
three rough categories.
- Applications that work solely
on one PC using a local database.
- Applications that work using
some type of client/server system where one or more desktop machines share
one database. (This category escalates up into the massive clustered
systems.)
- Applications that are hybrids
of category 1 and 2. These applications have at least one user who works
with a stand-alone system and needs to share at least some data with the
users on another system. The answer is always some type of replication
service, even if this means someone typing the data into the other system.
(Oh yes! This happens. More than we want to think.)
The advantage of the hybrid is that users can work off-line. They do not
need to be connected to the Internet or to some central database server. The
major disadvantage is latency. Some of the data is out-of-date and there is
some effort required to move correct data to and from the remote sites when
they do come on-line. Regardless. there are times when the hybrid is the only
viable solution.
JGuiGen can create applications that work in any of the three modes
described above. It handles stand-alone systems and large multi-user client
server systems. With the addition of the replication system is can synchronize
these two groups. (Note: It has always been able to connect a user to a remote
database via the Internet. This replication system just allows that user to
work when they are not connected. They can then connect and bring everything
up-to-date.)
Back
to Top
II. JGuiGen REPLICATION - WHAT TYPE OF REPLICATION IS IT?
There are several different types of replication. Some are download only.
Some require a full dump of one site's information and others involve real-time
or near real-time synchronization. Using definitions from Marie Barretta's book
"Data Replication" JGuiGen uses a modified master/slave asynchronous
multiple central site replication scheme. The master/slave designation
indicates that data from one site is considered to "own" a record. They
are the only ones that are allowed to make changes to that record. (Note: the
ownership can be passed around so that someone else can edit the record, but
only one site owns a given record at any given time.) The "modified"
in this definition come from the fact that JGuiGen allows the non-master users
to fill in empty text data in fields that someone else owns. The asynchronous
part indicates that the data is not updated in real-time or near real-time.
Users can run independently for long periods of time (they probably shouldn't,
but they could) and then synchronize their data with the primary site. The
"multiple central sites" refers to a system where some users can
replicate with a central site and that central site synchronizes with another
central site that is "higher" up the chain. All data is eventually
replicated, it just may take a couple of days.
Note: At present JGuiGen Replication
is designed to work at the table level.
A while table is owned by one master site. Since JGuiGen already includes a data dictionary table that
contains information about individual columns within each table, it could be
expanded to do column level master/slave replication without requiring a major
overhaul. The previous systems I
designed had column level controls, but it doesn’t seem necessary in this first
pass.
There are two main reasons for including Data Replication with JGuiGen.
- JGuiGen ships with the
standalone database HSQL. In order to use JGuiGen with one of the other
backend databases (e.g. mySQL, PostgreSQL, Oracle), the data_dict,
userlogdata, and jguigen_i18n tables need to be created and the data from
the HSQL database needs to be moved into them. This replication system can
be used to do this.
- The other major reason for
data replication is to support business applications that want to deploy
stand-alone applications via JAWS, or the Eclipse program update system,
or something like these and then have users able to run the application in
stand-alone mode. The users usually need to be able to synchronize their
data with a home/primary site. This replication system also can be used to
do this.
Back
to Top
III. OBJECTIVES
When I started this project in the mid 1990's I had a few objectives. As the
project went on I discovered a number of other things that needed to be
covered. Here is a fairly complete list.
- Accurate - All rows
synchronized, all columns correct
- Secure - Resistant to
attacks from outside and inside the system.
- Easy to use and easy to
automate. It should handle (as well as possible) users that don't care
about the whole thing. (I view apathy as an "attack" on the
integrity of the data. We need to know when a site is not sending and
receiving the current data that other sites are using to run our
business.)
- Reliable - it should just
work
- Reliable - It should be
tolerant of interrupted replication sessions
- Reliable - It should be
amenable to warm servers to minimize the impact of single point of failure
problems.
- If there is a problem, the
replication system should not make it worse.
- Fast - minimal redundant
transfer of data (Note: Since whole rows are sent, there is transmission
of data that wasn't changed, but this is a tradeoff with accuracy and
complexity. Tracking changes at the individual row/column level is very
complex or requires logging which is hard to add to legacy system.)
- Flexible - system should
allow for clusters of users when necessary.
- Low impact on users -
minimal down time. (This system does not require that users be off the
system during replication.)
- Low impact on users - There
should be no impact on local users when the Internet is unavailable.
- Low requirements on existing
applications and tables. (The only major one is requiring that rows never
be actually deleted. This can be waived if a stored procedure updates a
table called JG_Killed_Rows when a row is deleted in a table that will be
replicated.)
- Time zone and server clock
differences tolerant
- Scalable - Through a system
of secondary primary sites build a system that can get very large.
(Obviously this causes a longer lag in the time required for everyone to
get new/changed data.)
- Manageable - Management
should be aware of what is going on.
- Manageable - On special note
is Management-by-Exception. The system should actively tell administrators
when there are problems.
- Manageable - Management
should be able to override data ownership when necessary.
- Manageable - Management
should be able to suspend replication of selected tables when necessary.
- Can be used to propagate
table schema changes across sites.
- Can be used to fire
one-time-methods or classes that do work needed to populate tables that
have had their schema modified. For example if a legacy system had a
column called "NAME" which contained first and last names and a
schema change created a new "FIRST_NAME" column and a
"LAST_NAME" column, you might want to run a method that tried to
split the existing data in the NAME column and placed it into the
FIRST_NAME and LAST_NAME columns.
- Should be able to restore
sites that come on-line using a backup that is days or weeks old
- Should be able to populate a
new site completely with current data
- Should be able to work with
applications distributed with JAWS.
- Should be able to act as an
general purpose table replication system.
- Cross-platform capable -
should be able to handle situations where different sites are using
different back-end databases (this is a tough one)
- Real-world aware - Features
like the ability of any site to enter empty data should be encouraged but
optional.
- NOT DONE - system should be
transaction safe. It is mostly safe but does not guarantee all or nothing
transaction delivery. This would be very difficult to add to the system as
it now exists.
- NOT DONE YET- the system
should allow sites to get "selected" data so that they just get
what they need.
- NOT DONE YET- the system
should have a "compare" feature which allows one site to be
compared with another to verify tha accuracy of a site's data.
- NOT DONE YET - the system
should access rules for transforming data from disparate system and
creating enhanced (summarized) data as part of the replication process.
- NOT DONE YET- systems might
have the ability to reliably update summary data or totals or balances as
part of the replication.
- NOT DONE YET – column level
control of data ownership.
Back
to Top
IV. OVERVIEW
How the system works is repeated several times below (from different points
0f view), but I would like to start with an English conversation that shows the
concept behind the system. Note that in this conversation Sites 3 and 17 doe
all of the talking. Primary is very dumb, it just sends rows, updates records
and inserts records when Sites 3 and 17 ask it to.
- Buzz Click - someone just
attached to the Primary site's database system.
- Secondary site - Hello
Primary. I'm new. Here is a query to see if you are a primary and if you
are active and can give me a new site number.
- Primary - sends data.
- Secondary Site - I see that
you are a Primary and have a site number you can give me. Here is an
update to add a Log record to show that I've talked to you, but I don't
have a site number yet.
- Primary - processes insert
- Secondary Site - Here is a
query for the next available site number.
- Primary - sends site 17 as a
response.
- Site 17 - Hi! I'm in. Here
is an update for my log entry to say that Site 17 asked for a new site
number.
- Primary - handles update
- Site 17 - Here is a query
for all previous Schema updates. I shouldn't need them but I'll put them
in my JGReplicate table.
- Primary Site - sends U
records.
- Buzz Click - someone else
has attached to the Primary site's database system.
- Site 3 - Hello Primary. Here
is a query to see if you are a primary and active today.
- Primary site - responds to
request to Site 3.
- Site 3 - Hello there - I am
sending you an Error record that indicates that my last update did not
complete. I'll also send an email to system administrators.
- Primary site - handles
insert.
- Site 3 - Hi there again. I
see that you are a primary site and I have your type of database (you are
SQL-SERVER), along with your site number and name. I'll update any of your
JG_Unique_id's or JG_Owner fields that I find empty. I also see that you
have remote management-by-exception reporting turned off so I won't do
those functions. Here is an update to update my Site record ("S"
record) to indicate I started a new replication and to bump my
"count-of-unsuccessful-replications.
- Primary site - handles
update
- Site 3 – Here is a query for
the time I used when I last finished a replication. If your version and mine don’t match I
will need authorization from you before I can continue. (I know that this stops cloned machines
with my site number from replicating and screwing things up.)
- Primary site – returns data
- Site 3 – Got it! It matches what I have. Do you have any schema updates for me
("U" records) that have an update number greater than 22?
- Primary Site - Sends data if
it exists.
- Site 3 - Okay I put that
change in my JGReplicate table, and made the change to my local system.
Here is a query for the list of tables that you are actively replicating
today.
- Primary site - responds to
request to Site 3.
- Site 17 - Hello Primary this
is Site 17. Here is an insert to update your JGReplicate table to create a
new Site record ("S" record) that describes me. It indicates
that I came and started this conversation.
- Primary - handles insert
- Site 17 - Here is an update
to change the Log record I started when didn't have a site number to
reflect that site number 17 was the number I was given.
- Primary - handles update
- Site 17 - Let me start over.
Are you an active primary site, what is your site number, name and
database type and should I do remote management-by-exception reports?
- Primary site - responds to
request to Site 17.
******* I'm going to quit reporting Site 17's conversations here.
- Site 3 - Well here I am
again. I see you added a new table to the list. I've added that to my
JGReplicate table. Can we compare the schema in my tables against yours?
Here is a query for the structure of the first table.
- Primary site - Sends the
schema for the data_dict table Site 3 asked for.
- Site 3 - Okay that matched.
Here is a query for the schema of the JGuiGen_I18N table.
*********several back and forth discussions about the schema of other
tables.
- Site 3 - Hello again
Primary. It is now Tuesday at 10:17 PM. I last finished a replication with
you on Sunday at 11:04 PM my time. (I tried on Monday but we got
disconnected.) The highest date/time I got from you when we started our
last replication was Sunday at 9:15 PM your time. I'm going to update a
record on your site and determine the current time where you are. Here is
an update and a query.
- Primary -updates the
"S" record in the JGReplicate table sends "Tuesday at 5:30
pm."
- Site 3 - Okay I got it. I'm
going to call this time my PRIMARY-MARK time. I will use it for all the
tables I request data on. Here is a query for all rows in the data_dict
table that were updated since Sunday at 9:15 PM your time (my PRIMARY-MARK
time the last time I replicated) and less than or equal to my current
PRIMARY-MARK time. Don't send me any rows that I sent you where the
date/time I sent them (JG_DateWhenSent) matches the lastChangedDate on the
row. These are rows that I sent you and I know about those changes.
- Primary site - sends the
requested rows from the data_dict table.
- Site 3 - Well I've processed
those. I'll ask for things newer than Tuesday at 5:30 PM the next time we
talk. I updated the lastChangedDate and the JG_DateWhenSent to my current
time and changed the JG_SentBy column to your site number. When we talked
Monday the most recent data I had was from Friday at 4:27 PM my time
(nobody worked over the weekend.) Just in case, I'm going to update your
tables with all of the changes on my system since that time on Friday.
(I'm not going to include the stuff you just sent me even though the time
stamp is newer than the rest of what I am sending. I'll know that it was
you because the lastChangedDate will be the same as the JG_DateWhenSent
and the JG_SentBy column will have your site number. If the two dates are
different someone has changed this record - probably in the last minute or
two - and I need to send you that change.) I'm going to use your date/time
to mark lastChangedDate and JG_DateWhenSent columns in the rows I'm
updating on your system. I've just checked my system and the highest
date/time I am going to send is a time I will call MARK. By the time I
finish there may be rows that are more current than this value, but I'll
send them to you next time.
- Site 3 - Here are some Table
Inserts for my new rows since my last update and my local MARK time.
- Primary site - handles the
inserts.
- Site 3 - Here are some Table
updates for my changed rows since my last update and my local MARK time.
- Primary site - handles the
updates
- Site 3 - I'm updating our
logs and storing the local MARK time as the time to start with during out
next replication. When you're ready here is a query for all of the new
data from the JGuiGen_I18N table. (Note that I'm using the smae local MARK
time in this query.)
- Primary site - sends the
requested data.
**********similar back and forth discussions about other tables.
- Site 3 - Okay I'm done. I
updated our logs to indicate that I had finished. In particular I changed
the column that contains the number of replications I have started that
didn't finish from 2 back to zero (remember I had an error when I last
replicated.) You don't talk much do you? Oh well, talk to you again soon!
- click - Site 3 disconnects.
- Note: During this
conversation with Site 3, Site 17 could have been carrying on similar
conversations.
Back
to Top
V. EVALUATING REPLICATION SYSTEM and JGuiGen Replication
One section of Ms. Buretta's book listed "General Requirements for
Creating and Evaluation Replication Systems." I am going to go down this
list and talk about the JGuiGen replication system. I will mark her statements
as "Replication Criteria: ....." This gives you a good starting point
to make your own evaluation about the suitability of JGuiGen Replication for
your own uses.
NOTE: As of 6/27/2006, this document is being written as if this
system is operational. It is under active development, but is not fully
functional. The code that it is based on exists and in most cases the
development consists of recreating the functionality in java. I will update
this document as development proceeds.
NOTE 2 This section is written in light of setting up a full-blown,
secure, multi sever system. For test purposes and non-critical situations,
JGuiGen Replication is much simpler. See the note at the end of the Evaluation
section.
ARCHITECTURAL REQUIREMENTS
- Replication Criteria:
The system should be architected as middleware whose purpose to provide
data replication services. This allows the system to be plugged and
unplugged with minimal impact on the rest of the system. - JGuiGen Replication
meets this criterion. The replication system does not require JGuiGen or
its applications to run. It does require an XML file (or LDAP system) with
system settings. A data_dict table with some information about the tables
to be replicated and a JGReplicate table to store replication information.
There is an options JG_Killed_Rows table if your system actually deletes
rows.
- Replication Criteria:
The system should support location/distribution transparency. A database
or process should not need to know where objects are. - This is not truly
applicable to a database only replication system like JGuiGen Replication.
Each site needs to know how to connect to its own database and how to
connect to one remote site. JGuiGen Replication does handle time zone
differences between primary and target sites.
- Replication Criteria:
The system should support a high degree of database and process
transparency. - JGuiGen meets this criterion for data replication. It will
move data from and to any of the databases that JGuiGen supports and can
add other database types as long as they have JDBC support.
- Replication
Criteria: Question about database replication systems. How
transparent is the system to the resource managers involved in the
replication process? - This is talking about the databases and the
existing applications. JGuiGen Replication is very transparent. With the
addition of four fields to each table to be replicated, two or three
tables for the replication process and at most two stored procedures that
are triggered on updating a row and deleting a row any database system
can work with JGuiGen Replication.
- Replication
Criteria: Question about database replication systems. Do resource
managers need to be altered to participate in the replication
process? If so how hard is it to
do and undo? - Here is the whole list of changes required by the JGuiGen
Replication system.
- Four columns need to
be added to each table: 1) an int JG_Owner column, 2) a long
JG_Unique_id column, 3) an int JG_SentBy column and 4) a date
JG_DateWhenSent column.
- There needs to be an
integer row_id column with a unique value in each table to be
replicated.
- There needs to be a
lastChangedDate column in each table to be replicated.
- If the current
software does not already update the lastChangedDate column when a row
is inserted or modified, a stored procedure will need to be installed
that updates this column.
- There either needs to
be an isRemoved column to mark tables as removed or a JG_Killed_Rows
table will need to be added and a stored procedure installed that fills
the JG_Killed_Rows table when a row is deleted. There is a section below
that describes this.
- An index should be
created on the JG_Unique_id column.
- Two tables need to be
added to the database - JGReplicate and Data_dict.
- JGuiGen needs to be
run once to populate the Data_dict table and someone needs to indicate
which columns handle the row_id, lastChangedDate and isRemoved
functions. Right now the
JG_SentBy, JG_DateWhenSent, JG_Unique_id and JG_Owner columns
must be named exactly. This could easily be changed.
- The system hosting
the database should subscribe to a time service to keep its date/time
accurate.
- A small
program/daemon will need to be running at all times that connects to the
remote sites via SSL on a port other than the standard database port.
- this port will need
to be opened through any firewalls.
- Each site will need a
JGuiGenIni.xml file or an LDAP system that contains the JDBC Driver name
and location for connecting to the remote site.
- Each site will need
to install the JGuiGen Replicate software.
- An optional task
involves running a small application at each site once that creates
encrypted files containing the information needed to connect to the
local and/or remote database.
- SSL certificates for
remote sites will need to be generated.
JGuiGen applications already have almost all of these items
in place from the beginning.
NOTE JGuiGen Replication is currently under
development and the levels described here are all in place but have not been
fully implemented. The system can be run in a much looser fashion when
simplicity is more important than data security.
- Replication
Criteria: The system should be industry standard compliant - for
example it should use the Open Source Foundation - Distributed Computing
Environment (DCE) standards and handles XA and non-XA data.. - JGuiGen
Replication meets many of these standards. It uses the industry standard
of JDBC to connect to disparate database systems. It uses industry
standard SSL connections to insure that data transmission is secure. It
can store passwords and user names in local LDAP systems. It uses
industry standard encryption algorithms such as MD5 hashing,
CBCBlockCipher, Thriple DES, PassWordBasedENtryption (PBEWithMD5AndDES).
JGuiGen Replication does not work with the XA transaction manager
protocol at all.
Replication Criteria: Question about system security.
How robust are the security implementations? - JGuiGen is very secure against
internal attacks and relies on SSL to secure transmission of data. It would be
easy to add a process called Tumbling (a form or port knocking) if more
security was needed.
- Replication
Criteria: Question about system security. Who can use the
functionality of the Replications system? - JGuiGen replications
requires that a site be set up to establish an SSL link using SSL Data
Tunneling software. Anyone who has this setup can get to the database.
In its most secure form JGuiGen Replication uses encrypted files to
store the user name and password needed to attach to the database.
- Replication
Criteria: Question about system security. What Ids and passwords are
stored? - JGuiGen Replication uses three id's and passwords. A local
user name and password, the local database username and password and the
remote username and password. These can all be stored in encrypted files
at the local site or in a local LDAP system. The next step for our
system is to store the remote database username and password in a way
that causes the local site to request it using an SSL Link. This would
make it easier to shift from one primary site to another that could be
located anywhere in the world.
- Replication
Criteria: Question about system security. Where are they stored and
are the encrypted? - On the local file system in encrypted files or in a
local LDAP system.
- Replication
Criteria: Question about system security. What usernames and
passwords are used on remote resource managers (databases) to alter
data? - As noted JGuiGen Replication requires a database username and
password. This would often be a special user name dedicated to the
replication process. It would be set up so that its access to tables in
the database was just as tight as possible. For examples tables that are
only downloaded to remote sites could be marked "read-only" to
this special user.
- Replication
Criteria: Question about system security. How is network security
integrated into the replication system? - As mentioned, JGuiGen
Replication uses a special daemon/process that sets up an SSL link with
a remote site on an arbitrary port. We have the code (called the Tumbler
on Sourceforge) that requires the remote site send and encrypted
password to even open this port.
- Replication
Criteria: The system should be fail-safe (i.e. fail in a manner that
protects other components in the environment) - JGuiGen Replication
should meet this criterion quite well. It is simply a database link to a
database.
- Replication
Criteria: There should be no inherent bottlenecks in the solution. -
Since JGuiGen Replication can be scaled to as many clusters as are needed
this should not be a problem. There are no calls to other sites required
at any time except when a full replication is being done.
- Replication
Criteria: I would add that the system should come with a set of
reliability tests that demonstrate most or all of its facilities. - These
are under development as part of the JGuiGen Replication coding. There
are JUnit test and a system for passing information relevant to these
tests back and forth. A critical feature involves creating a "compare"
facility that compares the values in a table with its counterpart on
another system. This facility needs to ignore the row_id and
lastChangedDate which are unique to the local system and the JG_SentBy
and JG_DateWhenSent columns that are different on different systems.
Back to Top
FUNCTIONAL REQUIREMENTS
- Replication
Criteria: The replication system should be scalable. - This system
is.
- Replication
Criteria: Question about scalability. What resource managers
(databases) can be used as primary sources? - Any of the over six that
JGuiGen supports. Others that have JDBC drivers could be added in very
short order.
- Replication
Criteria: Question about scalability. What resource managers
(databases) can be used as targets? - Any of the over six that JGuiGen
supports. Others that have JDBC drivers could be added in very short
order.
- Replication
Criteria: Question about scalability. Given the available resources,
are there limits on the amount of data that can be transmitted through
the system? - This is an individual issue that I can't answer. Given
that JGuiGen does not handle blobs at this point, this should not be a
problem.
- Replication
Criteria: The system should use or interface with data transformation
and mapping. JGuiGen Replication handles minimal transformation. It will
convert Boolean fields to integers given that some databases use Booleans
and others do not. It does not support (at this time) full scale data transformation
(such as a situation where one site stores Male and Female as 1 and 0 and
another stores the same information as "M" and "F".)
- Replication
Criteria: Question about transformations. Does the system alter data
types between systems? - As described above - yes in limited situations.
- Replication
Criteria: Question about transformation. Does the system insure that
all target sites represent the data identically? - This is the 1/0
versus M/F question. The answer is no except for Boolean transformation.
- Replication
Criteria: Question about transformations. Does the system enhance
the data? - This means can it creates summaries or other major changes
as data is being delivered? - No
NOTE: It should be noted that JGuiGen Replication does not have the
facility to do more that replicate tables. If you need a system that has
failsafe ability to update totals or make other cumulative changes at
the primary site, JGuiGen Replication is not appropriate. For example,
if you need a system that will send new financial transactions and then
update a "balance" field on the primary site, JGuiGen
Replication is not appropriate. The system will insert the new rows and
in the case of failure it would finish inserting the rows. There is no
way to guarantee (without modifying the existing code) that the balance
field would not be updated multiple times or not updated at all if all
the records were inserted and then a failure happened.
- Replication
Criteria: There should be a scheduling mechanism. JGuiGen Replication
depends upon native systems to fire off the replication executable at a
regular time. Road warriors can fire off their systems at any time.
- Replication
Criteria: There should be a friendly system that describes the data
or objects being transmitted. JGuiGen Replication lists the tables it is
working on in a command window while it works.
- Replication
Criteria: There should be a friendly system to subscribe to the
data/objects that are available for replication. - JGuiGen Replication is
an all or nothing system at present. Once you attach to a primary site it
replicates the tables that have been set up for replication.
- Replication
Criteria: The initialization process for targets should be simple and
straight forward.- As noted the hardest part of getting started (in
secure systems) is setting up the SSL certificates on the remote sites.
After that the remote site starts with an empty JGReplicate table and
attaches to a primary system. The table is populated and replication
begins.
- Replication
Criteria: The replication system should be well behaved and provide
the quality of services specified by the business environment - Since
this is basically a database connection with updates and inserts it is
well behaved. The issue on whether or not the "quality of service"
is acceptable depends upon the group wishing to use JGuiGen Replication.
Back to Top
ADMINISTRATIVE CRITERIA
- Replication
Criteria: Configuration - The configuration process should be simple
and straightforward. - I believe that this criterion is met. For full
security there is one daemon/process that must be installed on primary
servers and then the changes listed earlier. For target sites the
software can be delivered via JAWS. Existing databases will need to have
two columns added and two or three tables added. The SSL certificates
will need to be created and a small executable run that creates the
encrypted files for the target system.
- Replication
Criteria: Configuration - It should support dynamic reconfiguration -
It does.
- Replication
Criteria: Configuration - Some sort of configuration history is a
plus - JGuiGen Replication logs virtually all of its activities. This
provides a fairly full history.
- Replication Criteria:
Monitoring - There should be a GUI monitoring tool for the entire
replication system - This does not exist yet. It could easily be added
when someone needs it.
- Replication
Criteria: Monitoring - The monitoring system should integrate with
the existing monitoring system for the firm - JGuiGen currently documents
errors in the JGReplicate table and sends emails to a system
administrator when errors occur. This could easily be modified to handle
other reporting requirements.
- Replication
Criteria: Performance Monitoring - JGuiGen Replication has a GUI for
the JGReplicate table. This includes several standard reports that can be
run at any time to monitor the activity in the system. Other ways of
reporting this data could be added very easily.
- Replication
Criteria: Performance Monitoring - The system should efficiently sue
existing resource. JGuiGen Replication requires internet band width and
database server machine cycles. Thats about it.
- Replication
Criteria: Error Management - There should be an error log - This is
handled in the JGReplicate table as described in this document.
- Replication
Criteria: Control of the replication process should be controllable
from either the target sites or the central sites. - yes.
- Replication
Criteria: The system should handle emergency shutdowns - yes
- Replication
Criteria: The system should automatically restart after emergency
shutdown. - yes
Back to Top
RESOURCE MANAGEMENT
- Replication
Criteria: Resource Management - the queuing system should efficiently
handle resources. - This is currently done by manually scheduling when
remote sites will replicate.
- Replication
Criteria: Resource Management - CPU - the subscription service should
be well behaved. - As noted, after a target site is configured it
automatically joins the system when it first connects.
- Replication
Criteria: Resource Management - Network - the system should be well
behaved on the network.- As noted this system uses standard SSL for
connection and security. The amount of band-width or extra bandwidth
needed would depend upon the amount of data being moved. In most cases
the replication will occur during off-hours and normal bandwidth that is
under-used in those time periods will be sufficient.
- Replication
Criteria: Resource Management - Access management should be done with
standard protocols - SSL and JDBC are extremely standard.
Back to Top
CHANGE
MANAGEMENT
- Replication
Criteria: Change Management - It should be easy to upgrade the
system. - JGuiGen Replication is being built with systems that change
table structures at remote sites and run special methods that modify
existing data after a data base schema changes has been made. Target
sites can automatically update they software via JAWS, which will
automatically install updated replication software when the site starts
the replication process. Since they have to be connected to the Internet
before they can replicate this imposes no additional burden.
- Replication
Criteria: Change Management - Migration should be straight -forward
and should permit release level coexistence. - Yes in most cases.
- Replication
Criteria: Change Management - Must all sites migrate to the new
system at the same time? - Since the migration is performed via the
Internet during the replication process, this will usually happen
automatically.
- Replication
Criteria: Change Management - can multiple levels exist in the
system at the same time? This will happen rarely given the automatic
nature of the upgrade system. JGuiGen Replication is not designed to
handle some sites that have tables with different database schemas than
others. (Except in certain cases where target sites have fields that are
never replicated to the primary sites.)
CHANGE MANAGEMENT OF DATABASE SYSTEMS
- Replication
Criteria: Change Management - There should be tools for altering
database objects - Yes
- Replication
Criteria: Change Management - What tasks must be done when the DDL
(Data Definition Language) changes? - As described later, a
"U" record must be inserted in the JGReplicate table to tell
target sites how to modify their tables. Software can be distributed at
the same time that uses the update number to trigger special methods
that fill new columns with appropriate data or move data from one column
to another.
- Replication
Criteria: Change Management - Are tools provided to assist with this
change process at the primary sties? as the target sites? - JGuiGen
provides as GUI for accessing the JGReplicate table. The system for
automatically inserting the update records ("U" records) is
under development.
- Replication
Criteria: Change Management - Is there a repository that indicates
which targets have received the changes? - Yes. The JGReplicate table
contains this information and there can easily be a report to show the
sites that have received updates.
Back to Top
RECOVERY
- Replication
Criteria: Recovery - There should be automatic recovery procedures to
recover from specified types of failure - Most will be handled when the
primary database server is rebooted, a warm back server is activated or
the primary server is regenerated from backups. A primary could easily be
repopulated from another primary or even a site that is normally a target
site.
- Replication
Criteria: Recovery - The system should fail safely - yes
- Replication
Criteria: Recovery - There should be a data reconciliation facility
to com[are one site with another - under development
Back to Top
VENDOR HEALTH
There were several questions about the financial
health of the vendor. This is Open Source software. A user has the full source
code and can modify it or take over its development at any time. The
"vendor" health issues are not particularly relevant.
Given that this is a small open source project developed by
one person, it would seem that JGuiGen Replication does a fairly decent job of
meeting this standard set of evaluation criteria.
****END OF EVALUTAION SECTION
Back
to Top
VI. MINIMAL JGuiGen REPLICATION
As mentioned in a note above, JGuiGen Replication does not require all of
these bells and whistles. For a test drive or a non-critical small application
all you need is the items in the following list.
- The tables to be replicated
need to exist with the appropriate schema at the target site. (They
obviously must exist as the primary site, but I assume you have something
that already exists that you want to replicate.)
- The seven variables in each
table to be replicated need to exist.
- The JGReplicate table needs
to exist at the primary site and the target site.
- Minimal information needs to
be installed in an "M" record on the primary site. (Site number
=1, Enter a name, Process = "Y", Next-available-site = 2,
Next-available-site- lock = "N") JGuiGen supplies a GUI for
entering this data.
- Table records ("T"
records) in JGReplicate on the primary site. These need a table_name,
Process = "Y". Again, JGuiGen supplies a GUI for entering this
data.
- A primary site with a port
open to the Internet or an intranet.
- Settings in the
JGuiGenIni.xml file to attach to a local database and a remote database.
- The JGuiGen application with
it's menu option - Maintenance --> Replicate
This is how we have users who want to use a larger database
system than HSQL populate their new database from the HSQL records that ship
with JGuiGen.
Back
to Top
VII.
SYSTEM DETAILS
Replication involves databases running at multiple sites. For each database
system using JGuiGen replication there is one master primary site. There can be
secondary primary sites. For example a remote office could call into the main
office, but all sites local to the remote office would get their replications
from the remote office. All sites connect to a primary and update it. Secondary
primaries connect to the master primary. The remote sites also get new and
updated rows from their primary site. Each site has a unique site number across
the whole system.
Every table that will be replicated by JGuiGen must have seven variables in
it These are used by this replication system.
- row_id - integer - a unique
id number for the row (regardless of what it is called) This is almost
always the primary key for the table.
- lastChangedDate - timestamp
of when a row was last updated.
- isRemoved - A Flag column
used to indicate that a row should be considered as "deleted"
- JG_Owner int - this contains
the site number that OWNS a row. Owner wins if there are differences. This
is the "master" in the master/slave relationship that is part of
this type of replication.
- JG_Unique_id -long - this is
the unique row number for this row across all sites. It consists of the
local site-number times 10,000,000,000 plus the local site's row_id times
10. For example if site 5 has a row with a row_id of 57, it will generate
a JG_Unique_id of 500,000,000,570 ((5 * 10,000,000,000) + 57*10). Note:
the reason the row_id is multiplied by 10 is that this leaves us
room to temporarily insert a row that ends with a 5 and sorts between
existing rows. This will be used
when comparing a row from another site with a row from the current site.
- JG_SentBy - int - the site
number that sent the record.
- JG_DateWhenSent - timestamp
- a date/time that matches lastChangedDate when a record arrives at a
site. (See the note about the "Problem Windows" below.)
- Note: Right now all of the
fields that start with "JG_" MUST be named exactly. JGuiGen
queries the data_dict table to determine the actual column names used for
the other functions.
Back
to Top
Assumptions
- Every site has a unique site
id. There are NO duplicates. New sites get their site number from a
primary site. (There is a built-in test for violations of this
assumption.)
- There is one master primary
site. It hands out the site numbers.
- If there are secondary -
primary sites, they each can get five new site numbers they can give away.
Once they have given them away they can't help a new site join the system
until they get the new set of site numbers to give away from the master
primary site. The secondary master replenishes any numbers it has given
away when it next replicates with the Master Primary site. (A stored
procedure could be created to get new site ids to give away once current
set has been doled out.)
- Each table being replicated
has a row_id column that contains a unique value within the table. This
row_id is unique at the site. Any row_id may be duplicated at different
sites.
- No site has a local row_id
that is greater than 10,000,000,000.
- Each table has a variable
lastChangedDate (regardless of what it is called) that is updated with the
current date/time when a row is added or changed.
- There will be no rows in any
of the tables being replicated that have a column that is constantly being
updated by some batch process. Since we allow users and other processes to
use the system during replication, it is possible that the row with this
kind of column would miss the replication cycle much or all of the time.
- All tables to be replicated
have a variable isRemoved (regardless of what it is called) that marks a
row to be considered as deleted. Rows cannot be really deleted. This requirement
can be waived with a stored procedure and a JG_Killed_Rows table. There is
a section below that describes this process.
- Each table to be replicated
has the seven variables (regardless of their names) listed above. The
replication system will fill in the JG_Owner, JG_SentBy, JG_Unique_id and
the JG_DateWhenSent columns along with the lastChangedDate column when a
row is replicated. The isRemoved and lastChangedDate columns need to be
updated by applications or with triggers in the SQL database during other
changes.
- The JGReplicate table must
be available for the JGuiGen Replication system to use.
- The JGuiGenIni.xml file must
be available. It must have the email address of the system administrator,
connection information about the local and the primary site this site will
connect to. It should also have an English site name for each site. (Site
17 is nice, but "Jim Smith's Laptop" is easier to work with.)
This could also be stored in a local LDAP system.
- The JGuiGen Data Dictionary
table must be available and must contain data on the tables to be
replicated. The basic data is done when you mark the table to be processed
in the JGuiGen Create system. Someone needs to indicate which columns in
the table are the primary key field, the lastChangedDate columns and the
isRemoved column.
- There needs to be a JDBC
driver for the database.
- The database at the primary
site must support the JDBC function CURDATE() which returns the current
date and time on the database server doing the update.
- Primary servers should
subscribe to a time server. Their date/time stamps are critical.
Back
to Top
Security
- We will be using SSL
tunneling software (already available) between local sites and the primary
to insure that the data transfer is secure.
- We could also institute the
use for the Sourceforge Tumbler. This is code that requires a remote site
to send an encrypted password in order to unlock a port.
- When you connect to a database
you need a user name and password. JGuiGen can obtain these from encrypted
files (or an LDAP system) that are unique to each site.
- We could do a - "here
is my password can I work?" from the remote to the primary. For
example the "Tumbler" project on Sourceforge is java code that
requires a secret password before it will open a port. (Similar to port
knocking.)
- We could require that some
sites come from a specific IP address.
- I would suggest having the
replication software use a special account to access the local database.
This account should be the only one with read/write rights to the
JGReplicate table. This helps to insure that no one accidentally (or
deliberately) meddles with the JGReplicate table during replications.
(When I find a way to issue a JDBC Table-lock, I will implement that as
well.)
Back
to Top
JGReplicate Table
There is a special table at each site called JGReplicate. It contains one
row for each table to be updated and log entries that track replications
attempted and completed.
Note: In the following rows the [] notations indicate which record types use
the field.
The schema of the JGReplicate table is
- replicate_id - int - an id
number of the row.
- table_name {T]- char(35) -
the table name of the table to be replicated.
- database_type [MS] -
char[20] - Oracle, MySQL, etc.
- JGuiGen_specific_table [T]-
char(1) - Y/N - Y for userlogdata, JGuiGen_I18N and data_dict. JGReplicate
is not replicated.
- Type-of-replication [T]-
(Used for "T" rows) Empty or "F" for full replication,
"D" for download only (primary to target) or "U" for
upload only (target to primary).
- process [ETU]- char(1) -
"Y" for yes and "N" if the replication of this table
should be turned off.
- done [ETU]- char(1) - Y for
table was complete and N or blank if not.
- Allow-blank-field-filling
[T]- char(1) - defaults to "Y". Set to "N" if sites
that do not own a row should not be allowed to insert data into empty
fields.
- Current-local-high-date-time
[T]- timestamp - This is an arbitrary MARK time established when the
current site started to send inserts and updates to the primary site for
any table. It sends everything that was inserted between the previous MARK
and the current MARK (At the end it gets moved to previous-
local-high-date-time.)
- Previous-local-high-date-time
[T]- timestamp - Used in the Table record. This was an arbitrary MARK time
when the site starts to send inserts and updates on any table. The site
sends everything new up to this time during the previous update. The MARK
time is used since the system has to send either inserts and have them
processed and then updates and have them processed or vice versa. We don't
want to use a different time for inserts than we use for updates. And we
really don want to use different times for different tables.
NOTE: When a site
tries to replicate and their Previous-local-high-date-time does NOT match the
Previous-local-high-date-time at the primary server an error record is written
and an email is sent to the System administrators. This usually means that two different locations are trying to use
the same site number. The only
exception is when the One-time-mismatch-of-times-allowed flag is set to
“Y”.
- Current-primary-high-date-time
[T]- timestamp - The system updates the "S" record on the
primary server and then queries for the lastChangedDate for that row. This
becomes the PRIMARY-MARK. We will query all tables for inserts and changes
made between the last PRIMARY-MARK and the current PRIMARY-MARK.
- Previous-primary-high-date-time
[T]- timestamp - The PRIMARY-MARK time determined during the last
replication. We will ask for all changes between this date/time and the
new PRIMARY-MARK.
- One-time-mismatch-of-times-allowed
– char(1) Usually “N”. When this
is “Y” the remote site is allowed to replicate even though their times do
not match the times stored on the primary server. Requiring the times to match stops two
sites from using the same site number.
Allowing a one-time relaxation of this rules allows a site to
recreate its database from a backup and then bring itself up-to-date.
- rep-count-started [SMT]-
int - Number of replications started.
- rep-count-finished [SMT]-
int - Number of replications completed.
- out_of_date_after_minutes -
[SM]int - How many minutes (days) can pass before a site is considered
out-of-date. This number is used
by local software to notify local users that they need to replicate.
- out_of_date_after_minutes_error
[SM]- int - How many minutes (days) can pass before a site is considered
out-of-date and an error is logged and an email is sent to the system
administrators. This is used when
the local users did not respond to their system prompts that they were
out-of-date. Each replicating site
sends an email about other sites who have a error-1-email-sent" =
"N" and haven't replicated in this many minutes (days). (The
sender ignores the site if the error-1-email-sent field is already set to
"Y". It sets the error-1-email-sent field to "Y" once
this message has been sent.)
- Flag [all]- Char(1) - This
column indicates which "type" of row you are looking at.
- M row - There is one
row marked "M" for master. This row contains the site number
for this site and other key information.
- S row - Each target
site that attaches to a primary site creates a Site row. This row will
contain counts and dates that show a sites replication status.
- T for Table entries
(using the columns described above)
- L for Log entries
using the next set of columns.
- U rows -Update rows
are marked with a "U". These are used to pass table schema
changes to target sites.
- E rows - these
indicate Error conditions. Error messages are left in these rows.
- J rows - (under
development) special rows used by JUnit tests to pass information between
replicating sites.
- Site_number [all]- int -
site number doing the replication (locally this will always be the local
site's number. At the primary site this will indicate which site
started/finished a table replication.)
- Site_name [all]- Char(100)
- English name for the site.
- Date_rep_started [LSM]-
timestamp - when a table replication was started
- Date_rep_ended
[LSM]-timestamp - when the table replication ended. (If this is empty the
replication aborted somehow.)
- record_created_date
[MSEUL]- timestamp - used by all facilities to indicate when a record was
originally placed in the system. (e.g. when a site record was first
created.)
- Notes [EMLS]- Text - a
field used by the JUnit tests or by administrators to help with debugging
problems. A log of all replications
is kept in the field in the “S” record.
- JUnit-Work [J]- Char(60) -
JUnit test can use this column to pass information between replicating
sites.
- Primary-site [M]- Char(1) -
"M" for the master primary, "S" for secondary
primaries, blank for others.
- Number_of_rows_inserted_by_primary
[L]- int - used in "L" records.
- Number_of_rows_inserted_by_target
[L]- int - used in "L" records.
- Number_of_rows_updated_by_primary
[L]- int - used in "L" records.
- Number_of_rows_updated_by_target
[L]- int - used in "L" records.
- Next_Available_Site_number-1
[M]- int. - next site number to dole out
- Next_Available_Site_number-2
[M]- int. - only used by secondary masters.
- Next_Available_Site_number-3
[M]- int. - only used by secondary masters.
- Next_Available_Site_number-4
[M]- int. - only used by secondary masters.
- Next_Available_Site_number-5
[M]- int. - only used by secondary masters.
- next_available_Site_number_locked
[M]- char(1) - Y/N used to make sure that two sites don't get the same
site number.
- Send_email_on_completion
[MES]- char(1) - Y/N - should each site send the administrator an email
when they successfully complete a replication. (Used for debugging and
early in deployment.) This can be turned on for just one
"problem" site.
- process_mgmt_by_exception [M]-
char(1) ="R" = remotes do it (default), "L" = local do
it (requires a local process to be fired off) - else do not do them at all
- email-bad_rep_days [M]- int
- each replicating site sends an email about other sites whose last
completed replication is more than this number of days old. (The sender
ignores this site if the "error-2-email-sent" field is already
set to "Y". It sets the "error-2-email-sent" field to
"Y" once this message is sent.)
- email-bad-rep-count[m] =
int - each replication site sends an email when they start to replicate
and their count of unsuccessful replications exceeds this.
- update_number [US]- int - a
sequential number that identifies "U" records. Each schema
update should be one higher than the last. On the "S" record,
this field contains the highest update the site completed.
- update_decripton [U]-
char(50) - short description of schema change
- update_spec [U]- char(150)
- this is the schema change that should be made ("ADD" var_name,
var_type, default value, length,, decimals, nullable).
- error-1-email-sent [S]-
char(1) - "N" if last replication date is too old message has
not been sent. "Y" if it has. (These are set to "N"
when a site completes a replication.)
- error-2-email-sent [S]-
char(1) - "N" if the too-many-consecutive-unsuccessful-replications
email message has not been sent. "Y" if it has. (These are set
to "N" when a site completes a replication.)
- error-3-email-sent [S]-
char(1) - extra
- error-4-email-sent [S]-
char(1) - extra
Back
to Top
Over-Booking
the JGReplicate Table
This table is "over-booked". It has at least nine different things
that it takes care of.
- It holds the site number,
name, and type of database in the "M" record
- It holds the list of tables
being updated and the critical times needed to replicate in the
"T" records. On the primary sites these rows indicate if a Table
is currently being replicated. It also indicates what kind of replication
should be done - full replication, download only, or upload only.
- On the primary site it holds
a record for each site that replicates. The "S" record has the
site number and name, the number of times replication started, the number
of times it completed successfully, the date and time for the last
replication (The Note: this column holds a list of all replications in the
“S” record)
- It holds a list of all table
schema update that need to be applied in the "U" records
- On the primary site it hold
the next-available-site-number in the "M" record
- It contains a table by table
log of replication dates and times in the "L" records
- It is used by the
JUnit-tests to pass info between sites for the test ("M" record)
- There can be optional
"J" records used by JUnit test to pass information.
- There are "E"
records that contain error messages.
There are two main values from the previous replication kept
in the "Table" row for each table in the JGReplicate table
- The first major value is
the date/time of the last replication. When a table is being replicated,
one of the first steps is to ask the primary site for all rows at the
primary site that have been inserted or changed since this date. This
date/time will be exactly the same for each table even though the system
processes tables one at a time. We basically want a snapshot at the time
we are calling MARK. "Send me everything new up on tine the MARK
time."
Note: To originally fill a site with data this date/time is set to a
date/time before the system existed. This will cause all of the rows from
the primary site to be sent.
Note: The JGuiGen replication system is not transaction safe. It is
possible with this system if two sites are replicating at the same time to
get part of one site's new data, but not all of it. For example if site 2
asks for all updates on a table and gets lots of rows to process, site 3
could then process the same table and another table before site 2
finishes. This would result in site 2 not having site 3's data for the
first table and having site 3's data for the second table. The next
replication takes care of this.
- The second important value
is the date/time of the most recently updated record in the local table as
of the last replication. Any rows with a lastChangedDate date/time stamp
higher than this date/time will have been inserted or changed locally
since the last replication.
Back
to Top
VIII. RULES-
- Each site gets a site number
- Primary site is usually 1, others are 2 through ?. Sites -99 is special.
- Each row to be replicated
has an Owner (This is the site number of the site that originated the
record.)
- Owner's data always wins if
two sites are different
- Who the Owner is can be
changed from one site to another. If the primary site changes who owns a
row this change is honored by all other sites. This allows the local
administrators to "take control" of a record if a site goes down
or a time critical change must be made and the other site is not available
for any reason.
- Note: The system must use
"Removed rows flags" They cannot really DELETE a row. If a
remote site DELETES a row, it won't exist when they try to tell the base
site about the changes they made and the row won't get deleted at the
primary site. There is a way to waive this restriction. It is described
below and involves a JG_Killed_Rows table and a stored procedure fired off
by a trigger when a row is deleted.
- Any site can fill in blank
text fields (real data wins over empty data - if the owner doesn't like
the data they must put something like "missing" in the field and
their version will overwrite the data they don't like - Owner wins.)
- The ability of sites to fill
in empty text fields in rows they do not own can be turned off in the
JGReplicat