Here is the narrative of my CS532 Distributed Database project.
Why did I do this? What motivated me? A year of two ago, a co-worker took CS530 and CS532. We were both involved with databases and interested in them. What I understood from him of JMS sounded intriguing. Also, I have a long-standing interest in programming, especially object-oriented programming. I have only flirted with object-oriented analysis and design however. Most of the programming that I have done at work, while web and databases, has been scripting (Perl, PHP, and ASP.) So, even though the JMS Framework project was an obvious stretch for me, it is something that I was very motivated to try. (Motivation is a fickle thing.)
My plan was to run the code on my laptop. This meant, as far as I could tell, that I wanted to use Oracle and MS SQL. This is because Oracle and MS SQL were used in the sample. Another strategy was to make as few configuration changes to the sample as possible.
I initially invested some time investigating whether I could work from home with the servers in the lab. This didnt seem clear. There are directions for access an Oracle database using putty. But it turned out that I didnt have sufficient right to that Oracle database, on UNIX in downtown Minneapolis. The Oracle database that I was to use was on a Windows server on the Saint Paul campus.
I also saw the project as a justification to go out a buy a new, barebones PC for $400. (I got the computer from Nanosystems on University Avenue, on the recommendation of co-workers. I went to the store to pickup the PC. Going into that store was like going into an auto parts stores. I found the experience to be profound.)
I put Linux on the new PC and then loaded the free server version of VMWare. Into VMWare, I put a copy of XP. Then, on XP, I put a developer version of MS SQL 2003. I spent a lot of frustrating time attempting to get Oracle running on Linux. I dont think that I was ever successful with that. As it turned out, Oracle doesnt run on the version of Linux that I was using. Ubuntu is a Debian-based distribution.
Also, with this home PC: When I very first set it up, I wasnt able to get VMWare to run. Actually, at first it ran, and then it stopped running. I had upgraded my initial Ubuntu version. It turned out that VMWare runs only on the previous version of Ubuntu. This led to my having to do a total re-install of the Linux base OS on that PC.
One last thing about my efforts with getting this barebones, Linux with VMWare box running: The computer came with a 40 MB hard drive. Once I got it setup and put a couple of virtual machines on it, it ran out of space. Each virtual machine requires a minimum of 8 MB. So, I took the extra, 120 MB hard drive out of the family PC, and installed (third time) Linux, this time on the larger hard drive.
On my laptop, I also loaded MS SQL 2003 developer version and Oracle version 9.
One last scare as to the laptop: After rebuilding it, I started getting messages that SP2 was ready to install. At first, my thought was everything is fine without it; I dont want to risk breaking what I have with it. But then I remembered that I wouldnt be able to get my laptop on the Saint Thomas network if it wasnt up-to-date. So, one evening, I closed my open programs and clicked on the install.
First problem: The install hung. The next morning, I went ahead and rebooted with my fingers crossed. There was a message about an incompatible DLL. But, thankfully, the OS came up.
Second problem: I wasnt able to get Oracles Listener to work. This looked grim. Luckily, recycling the Listener eventually solved the problem.
I initially spent some time trying to make sense of the framework code. This was not very successful, though. Mostly it seemed to be a bewildering maze of batch and properties files. So, I was looking forward to the second class meeting when we were to learn about the framework. I was disappointed, however, when we went into the lab to work on the example program and the teacher said that I was the only one that he couldnt get set up as a DBA for Oracle. I would have to follow along with someone else. Instead of coming away from the class with a working sample that day, I ended up not getting my sample to run until a week later, on the following Saturday.
There were a couple of days of emails back-and-forth trying to get my credentials set properly. The root cause of my ID not working as expected was traced to an answer I had given on a form back in September when I enrolled. Having indicated on the form that I didnt want my personal information published in a directory put me outside the norm. Then a several more days passed until I could get to the lab.
Just like at my work, when trying to get a project going: If there is a problem, a very likely cause is a security issue. (This is Axiom #1, so to speak.)
I remember spending a late Friday night in Room 326, sitting at OSS326-14, getting the sample to work. But, eureka! It did run. My next hope was that I could save this successful DDBMS folder to my U: drive, go home, copy it to my laptop, change some settings in my hosts file to match the machine name in the lab, and presto! The sample would run on my laptop. No such luck, though.
So, at the next opportunity, which was the following week, I brought my laptop to campus to have Sahil assess why the example wasnt working. The root cause, he suspected, was that the version of MS SQL that I was using was incompatible with the framework. He suggested that I put MS SQL 2000 on. And it was then that I learned about the free, student software that was available. So, as I remember it, I was there until 11:00 PM as we uninstalled and reinstalled MS SQL. I think that it was sometime later, with some more tweaking, that I was able to get the sample to run with out issue on my laptop.
Also, in general, having Oracle on my laptop was not the greatest idea. Sahil and others in the lab found this humorous. Live and learn. I also note that I had to delete many things from my laptop to find room for Oracle.
Which leads me to Axiom #2: If the problem isnt security related, it probably has to do with version mismatches. (See the Linux reinstall previously mentioned.)
Next, I intended to take baby steps changing the configurations of the AP, DP, and Client to match what I had created in my plan. I remember going through enough to get the names to change, but that results were not correct. I remember getting an error expected -1 but didnt understand what that meant.
The next real milestone of my project experience was the hard drive crash on my laptop crash. This to
ok several days to resolve. I had to determine that it was the hard drive. I had to check wit co-workers about options. I made another trip to the auto parts-like Nanosystems store, this time to buy a USB-hard drive adaptor and a new hard drive. Then I had to make an assessment as to whether the hard drive data could be saved. I decided that it could not be saved, since, even when plugging the failed hard drive into another computer using the USB-to-hard drive adaptor, the hard drive would not spin up. So: install the new hard drive, reload XP, reload SP2, Oracle, MS SQL 2000 (another trip back to campus to get the software. It is a good thing that I live close to campus.) But re-load I did, and copied down the original, working DDBMS from my U: drive and I was back in business. I more or less estimate that was another lost week.
Also, the laptop hard drive crash gave rise to Axiom #3: hardware failures happen.
In my attempt to understand the framework code, I spent a good deal of time with Eclipse. I wanted to find a way to see what was going on in the code, to get information for debugging. This was quite frustrating. One issue was getting all the right jar files and configuration fill in the right places. Also, it was a challenge figuring out how to create a project in Eclipse from this set of existing files. I went through several iterations of creating packages and importing java files before coming on a combination that worked.
With that done, I was able to get a DP or the AP to execute in Eclipse. But that raised the problem of getting multiple DP to run. It turned out that I could run one DP or the AP in Eclipse and the other programs, including OpenJMS, in their own command windows. This worked. They did talk to each other.
The program that I most wanted to get to run in Eclipse, however, was the Client, and I couldnt figure out how to do that. I was able to feed one properties file or command line argument to a program in Eclipse, but only one. The Client needs two: it reads the properties file and it reads a command line argument (-commit or abort.) This investigation also took a chunk of time. After this, I devoted myself to running the program in the command window and writing System.out.println statements in the code.
For all the effort that I put into the Linux/VMWare PC, I never did go back to it. I originally planned, once I got the code to run on my laptop, to use the virtual machines to test run it on different boxes. I have never gone back and done that.
So begins a long process of debugging and change.
Assess what went wrong
Close all the command windows
Restart OpenJMS to clear the messages (no other way to clear messages)
Google or lookup in my Java book how to do somethinghoping for tutorials or code samples
Modify and apply to code
A decision I made fairly early on in the process was that I didnt want to add
classes. I know that the suggestion was to extend existing classes. I didnt do this, however. One reason is that I thought it needlessly complicated to add to the existing list of classes. I would have to add to the source list, for example. Instead, my general strategy was to either
add logic to existing clauses
add a method to an existing class, and model that method after another, pre-existing method in that class
add import statements to existing classes
From an object-orient, Java-perspective, these were probably the wrong things to do. But, like I said earlier, I come from a Perl, scripting background, and from that perspective, the approach I outlined makes (I think) perfect sense.
I wanted to make as few changes as possible to the sample code. So, I included my SQL statements in the existing SQL files. I just appended my GDD create and insert statements to the existing AP SQL script, and I added my DP SQL scripts to the existing DP-MN, DP-TX, and DP-OK scripts.
I went through the existing batch and properties files and added the creation of my sites to the existing ones so that when I ran the build and go statements, my information would be updated along with the existing sample information. This was a painstaking process.
My objective was to get a global query to work. In my opinion, given where I was starting from as far as Java knowledge and given that I was working alone, I find that satisfactory.
I used the existed structure of the scenario_1_query property in the Client.properties file. I changed that query to point to a table fragment for my data. My next modification was to change the query for the AP.properties file GDD query to the names required for my GDD table names. Once I was able to return the results of this query of the table fragment through the GDD to the client, the next step was to add logic to the ExeMgr class that would parse the scenario_1_query and extract the table name. I rely on several assumptions here. First, the scenario_1_query is simply in the form of SELECT x FROM y. There is no WHERE clause. My justification for this is two-fold:
If the objective of the project is learn about distributed databases, spending too much time parsing SQL statements seems out-of-scope
It seems realistic to present users with interfaces which provide them with limited choices. I know that at my work, there are many examples of web-based database interfaces that do just that.
Using Javas substring function, I extract the table name. I feed the table name to a query asks if the table is fragmented or not. I assume that it is, and do nothing further with processing a non-fragmented table. In fact, I think that way I have coded, querying against a non-fragmented table is broken. This would need to be fixed.
Next, I make another assumption, also based on the previously listed justifications. That is, if a table is fragmented, it has exactly two fragments, and the fragments are vertical fragments. Therefore, gl
obal table x will return SQL statements for fragments y and z.
These respective queries are then put into messages and the destinations are added to the messages. These messages go to the proper local databases and return to the DdbmsConnection class. Here, they are parsed. If the results contain information from columns of the target tables, the results of the query are dumped out of the message into a List that is inserted into one of two temporary tables. The second result is similarly parsed and inserted into a second temporary table. Both temporary tables are in the local database of the machine of the DdbmsConnection class. For the DdbmsConnection class to talk to the local database, I added an import statement for the LocalDB to the DdbmsConnection class.
Also, the way the sample is set up, the DdbmsConnection shuts down after processing one message. I needed it two process two messages, one from each remote DB. So, I added a counter. Each query result message increments the counter by one. When the counter gets 2, after two messages have dumped their contents to the local DB, the DdbmsConnection shuts down.
I have a canned join statement that does a join of the query results returned by the scenario_1_query. I would need either a join statement for each returned result or more sophisticated parse to make this truly functional.
The results of the join are dumped to the Client command window with a simple System.out.println statement.
One last issue that I am aware of, but didnt get around to investigating and fixing: The process ends with the transaction rolling back from the Client instead of committing. This may or may not be related to one of the DPs complaining about losing its JDBC connection.