QT(1) USER COMMANDS QT(1) NAME dbappend - Database data file manager. SYNOPSIS dbappend file1 file2 DESCRIPTION DBAPPEND version 1.0 dbappend appends file2 to file1 in a manner that is robust against catastrophic system failures. It can be used to maintain autonomous database transaction log/journal files in a database disaster recovery program, or to maintain "flat file" data files. Both semaphore and fcntl(1) multi-user locking are supported. It is possible to use the program for network wide locking protocols. From the architectural specification: I) the objective of the physical architecture of the data file is to implement a robust data file structure that addresses the following issues: A) physical consistency of the data file is of paramount importance 1) immunity to catastrophic system failures during any and all data file operational procedures should be addressed 2) immunity to failed data file transactions should be addressed a) recovery from failed data file transactions should be automatic without requiring roll back/forward intervention 3) immunity to partially failed data file transactions should be addressed to avoid transaction atomicity issues a) likewise, recovery from partially failed data file transactions should be automatic without requiring roll back/forward intervention B) the data file should consist of only ASCII text: 1) this facilitates ancillary database operations using standard Unix text tools: a) for example, grep, sed, awk, etc., (see "Data Processing in Unix," R. S. Tare, McGraw-Hill, New York, New York, 1989, pp 31,) and b) editors, for example, vi, emacs, etc., which, c) could be used as an adjunct to the database program (for example, during disaster recovery procedures) in performing data file operations d) the Operator/Stream DBMS paradigm, (see "A 4GL Language," Unix Review, March, 1991, pp 24) could be used to build data base management systems 2) the compromise is that the data file will be larger, and suffer performance degradation when compared to binary data file schemas 3) the counter argument is that disk space is inexpensive, and the additional expenditure is justified by the flexibility afforded in ASCII text file schemas (see "Unix Relational Database Management," Rod Manis and Evan Schaffer and Robert Jorgensen, Prentice-Hall, Englewood Cliffs, New Jersey, 1988, pp 5,) and, 4) using contemporary record indexing methodologies (ie., binary trees, extensible hashing algorithms, etc.) will afford performances that are competitive with binary data file schemas C) the data file structure should be compatible with the usual Unix shell script database systems: 1) the data file structure should consist of (see "RDB: A Relational Database Management System," Walter V. Hobbs, RAND Corporation, 1993): a) EOL terminated sequential records, with b) tab delimited fields of ASCII text, and c) comment records signified by a '#' character in the first column of the record 2) this facilitates importing and exporting of data to and from the database system, (see "Unix Shell Programming," Lowell Jay Arthur, John Wiley & Sons, New York, New York, 1990, pp 175) D) the data file records are variable length, and no presumptions or limits can be imposed concerning: 1) the record length 2) the field length in a record 3) the compromise is that these two requirements will aggravate the complexity of the data file I/O caching, and index system key storage methodologies E) the data file should function as a journal/audit file: 1) the records in the data file should not be physically modified in place (or removed,) but modifications to the records appended to the end of the data file in a manner that will facilitate roll back/forward and recovery procedures without separate journal or audit files 2) the compromise is that this will necessitate routine maintenence (in a manner similar to maintaining the journal files in other database systems) of the data file to release unused space, and remove the originals of modified records 3) note that this scheme facilitates implementation of distributed databases with replication servers, since entire transactions may be appended to the database's distributed data files across a network F) the data file system is completely independent of the record/field index file system: 1) the data file must maintain all information as to the status of records which have been added, modified, or deleted in a fashion that any record indices can be reconstructed on demand G) the data file should provide a provision for detection of data file physical inconsistency and correction: 1) detection (and, possibly, correction, depending on severity) should be automatic, and should be initiated, if necessary, when the data file is opened for database operations H) the data file structure should have a provision to accommodate roll back/forward procedures for maintenence of database logical consistency II) the physical structure of the data file is as follows: A) the data file consists of sequential transactions: 1) the transactions consist of sequential records: a) data records that have been added to the data file b) "delete record operation" records, consisting of: i) "# DEL" (which must be a restricted key word in the data file,) and ii) a single unsigned integer which contains the data file offset (index) of the deleted record c) "undelete record operation" records, consisting of: i) "# UNDEL" (which must be a restricted key word in the data file,) and ii) a single unsigned integer which contains the offset (index) of the record to be undeleted d) comment records, which have no significance in the data file structure, and are signified by a '#' character in the first column of the record (but not the restricted key words listed above) i) comment records can contain information/data that does not pertain to the data file, but has significance to other functions in the database system, for example the data file record schema, transaction commit information, index system information, etc. 2) these records are appended only to the data file's physical EOF a) record updates are implemented by: i) first deleting the record, (by appending a "delete record" operation record to the data file's EOF) then, ii) adding the new updated record (leaving the original intact) III) operational specifics: A) no assumptions can be made about the capabilities of the record indexing system (or even if it exists, for that matter): 1) specifically, it may not be assumed that the record indexing system has index element locking capabilities (see "Concurrency Control and Recovery in Database Systems," P. A. Bernstein and V. Hadzilacos and N. Goodman, Addison-Wesley, Reading, Massachusetts, 1987, pp 66) a) the data file operational functionality must have provisions for addressing the "phantom record" problem, without assistance from the indexing system (see "Concurrency Control and Recovery in Database Systems," P. A. Bernstein and V. Hadzilacos and N. Goodman, Addison-Wesley, Reading, Massachusetts, 1987, pp 64) b) unfortunately, the compromise is that this will require write locking the data file's physical EOF i) although the write lock does not preclude data file read operations during the write operation, only one write operation can be performed at any one time 2) it is assumed that updating the indices will require exclusive write locks on the index files during the indices update procedure, which will preclude indexed read operations a) additionally, during the updating of the indices, it is assumed that a read lock (to preclude writing to the data file) will be required on the data file's physical EOF 3) it can not be assumed that the database system, both its data file functions, and its index file functions, are the only processes that have access to the data file a) however, it can be assumed that any other processes that access the data file will observe the cooperative locking scheme outlined below 4) since data file operations are independent of the index file operations, it is possible to update the data file, and then update the indices at a later time, when convenient (the indices, at any time, reference records in the data file that are logically consistent, complete transactions, since the records/operations are only appended to the data file, never altered "in place") B) no assumptions can be made about the data file functions residing under a client/server arrangement that serializes transactional access to the data file 1) specifically, the data file operational functionality must have provisions for addressing concurrency control in a multiuser environment C) it is assumed that the host system has resident fcntl(2) file control provisions D) a robust implementation for writing to the data file: 1) this procedure assumes: a) the index system is current, and has updated the indices to the data file's current physical EOF b) transaction" record(s) are resident as a contiguous, sequential "block" of records in a file that constitutes a complete data file transaction (ie., they are an "image" of what will be appended to the data file, with fwrite (1) operation(s), for example) i) this scheme will minimize the amount of time that a write lock on the data file is required c) all open files synchronize the file system caches and buffers with the stable storage media, perhaps using the fsync(1) function, prior to closing the file 2) four files will be involved in the transaction: a) the file to be appended to the data file, formatted as a contiguous, sequential "block" of records in a file that constitutes a complete data file transaction (ie., they are an "image" of what will be appended to the data file, with fwrite (1) operation(s), for example) b) the lock file, which will be created if it does not exist, and will have a write lock acquired for the duration of the transaction i) this file is a "place holder" only, and nothing is ever written to this file, except for the process pid that is modifying the data file-this is for compatability with other unix data systems ii) since this file is a "place holder" only, it can be placed in a temporary directory iii) this file is the first opened and locked for any data file operations c) the fault file, which is a status file: i) if this file exists when the transaction starts, the data file is in a fault/exception state ii) before any modifications are made to the data file, the data file's EOF is written to this file, three times iii) this file is removed upon a successful modification to the data file iv) this file and the data file are never open simultaneously v) this file contains mandatory information in case of a catastrophic system fault, and should be placed in the same directory as the data file d) the data file, formatted as above 3) the transaction process is as follows: a) open the file to be appended to the data file, in read only mode, and acquire a read lock on the entire file, including the file's EOF i) if the open and/or lock fails, abort b) open the lock file, in write mode, and acquire a write lock on the entire file, including the file's EOF i) wait for the lock, unless doing so would create a deadlock situation, in which case abort ii) if the open and/or lock fails for any other reason, abort c) open the data file, in read mode, and acquire a read lock on the entire file, including the file's EOF i) if the open and/or lock fails, abort d) get the data file's EOF, possibly using fseek(1) i) if the data file's EOF can not be obtained, abort e) close the data file i) if the data file can not be closed, abort f) open the fault file, in write mode, and acquire a write lock on the entire file, including the file's EOF i) if this file exists (eg., the open fails because the file exists) then the data file is in a fault/exception state-verify that the three numbers in the only record in the fault file are identical, and truncate the data file to this length * note that this situation was created by a failed transaction (probably a catastrophic system failure during a transaction being appended to the data file, below); the data file will have to be reopened in write mode, and a write lock obtained on the data file's EOF-the write lock on the lock file has precluded any other write processes from accessing the data file-the index system has not updated the indices (since this would have been initiated after a successful completion of the data file operation,) so any currently running data file read operations can not access the section of the data file that will be truncated * note that if any of this process fails, the fault file should be left in place, and the process aborted ii) otherwise, truncate the fault file to zero length * note that this situation was created by a failed transaction (probably a catastrophic system fault during the writing, three times, of the data file's EOF to the fault file)-the data file was not open when this situation ensued; the data file's physical EOF is valid-this is the reasoning for writing the data file's EOF three times to the fault file * note that if any of this process fails, the fault file should be left in place, and the process aborted iii) if the open and/or lock fails for any other reason, abort g) since the data file's EOF will change, the current EOF index (offset) must be stored in stable storage to facilitate truncation of the data file record(s) appended by the write process in case of catastrophic system failure during the write process-write the data file's EOF three times to the fault file i) if the write fails, attempt to remove the fault file, and abort h) synchronize and close the fault file i) if the synchronize and/or close fails, attempt to remove the fault file, and abort i) open the data file, in write mode, and acquire a write lock on the data file's EOF i) note that this constitutes an exclusive write lock on the data file-note that read operations are still permitted, since any data file records are appended to the data file's current physical EOF (ie., the data file records that are added will not be "recognized" by the index system until the indices are updated-which is the last step of this procedure) ii) if the open and/or lock fails, attempt to remove the fault file, and abort j) append the file that is to be appended to the to the data file i) if the append fails: * attempt to truncate the appended record(s) from the data file (eg., truncate the data file to its previous length, which was determined when this process began); if this was successful, attempt to remove the fault file, if it was not, leave the fault file in place, then abort k) synchronize and close the data file 1) if the synchronize and/or close failed: * attempt to truncate the appended record(s) from the data file (eg., truncate the data file to its previous length, which was determined when this process began); if this was successful, attempt to remove the fault file, if it was not, leave the fault file in place, then abort l) remove the fault file i) the successful conclusion of this step commits the transaction ii) if the remove failed, it does not make any difference m) close the lock file i) if the close and/or lock failed, it does not make any difference n) notify the index system to update the indices to include the added records between the data file's old physical EOF, and the data file's new physical EOF i) if the index system failed to update the indices, then the data file should not be truncated (eg., the transaction was committed, above) E) synopsis for a robust implementation for writing to the data file: 1) read lock acquired on file to be appended to data file a) file to be appended to the data base opened 2) write lock acquired on lock file a) precludes all other process from accessing lock file, data file, and fault file 3) read lock acquired on data file a) data file open for read, fault file does not exist 4) read data file's EOF 5) read lock released on data file a) data file closed, fault file does not exist 6) fault file created a) if the fault file existed prior to the open: i) write lock acquired for the fault file ii) read the 3 data file EOF values iii) if they are identical: * close the fault file * open the data file * truncate the data file to these values * close the data file * write lock acquired for the fault file * truncate the fault file to zero length iv) if they are not identical * truncate the fault file to zero length b) fault file open 7) write lock acquired on fault file 8) data file's EOF written three times to fault file 9) write lock released on fault file a) fault file closed 10) write lock acquired on data file a) data file open 11) records appended to the data file 12) write lock released on data file a) data file closed 13) fault file removed a) fault file does not exist, transaction committed 14) write lock released on lock file a) other processes may access data file for write 15) read lock released on file to be appended to the data file a) file to be appended to the data base closed 16) note that the section of the data file that existed prior to the invocation of this process was never locked (except under fault/exception conditions,) so data file read access to this section was not restricted 17) note that the vulnerability is in the location of the data file's EOF, and either the data file's EOF could be queried, or the EOF was contained in the fault file-neither were open simultaneously; one or the other was/is correct; which one is correct can be found by a complete record, (eg., the only record in the fault file contains three identical values for the data file's EOF) 18) note that a previous catastrophic system failure during the physical append of the records to the data file can be detected by the existence of a fault file, and this file contains the data file's valid EOF-which was the value before the system failed (eg., truncating the data file to this length, restores the data file to its state before the failure) 19) interruption procedure (from steps 1 through 15, above): a) steps 1 through, and including step 10, remove the fault file, (if it exists,) and abort i) steps 6)a)i through and including 6)ai)v, leave the fault file intact, and abort b) steps 11 through, and including step 12, truncate the data file, and if that is successful, remove the fault file, otherwise leave the fault file, and abort c) steps 13 through, and including step 15, remove the fault file (if it exists,) and abort F) a robust implementation for updating the indices: 1) it is assumed that the index system has a provision for maintaining the index (offset) of the last record of the last transaction for which the indices were updated 2) acquire a lock on the data file's physical EOF if a lock does not already exist, (perhaps acquired during the write process, above) a) if the data file's physical EOF is locked by another process: i) wait until the lock can be acquired, unless, ii) waiting will create a deadlock situation, in which case, abort the locking process (and perhaps try again, later) 3) acquire write locks on all indices a) if any of the indices can not be write locked: i) wait until the write lock can be acquired, unless, ii) waiting will create a deadlock situation, in which case, abort the locking process (and perhaps try again, later) 4) update the indices through the data file's physical EOF: a) scanning forward from the last record indexed: 5) release all locks acquired by this indices update operation G) a robust implementation for reading data from the data file: 1) read operations are unrestricted, provided they are consistent with the requirements imposed by the index system a) note that the above write operations (to both data file and indices) placed no write locks on any area in the data file that could be referenced by the index system IV) testing procedures: A) to test locking protocol: 1) make a short text file of three or four records 2) running two concurrent emacs sessions under X-Windows: a) start gdb on one emacs session, on this program i) set a break at main ii) run the program on the text file b) start gdb on the other emacs session, on this program i) set a break at main ii) run the program on the text file c) switching between the two sessions, single step first one session, then the other to step through locking statements B) to test the recovery procedures: 1) make a large text file, perhaps from /usr/lib/dict/words a) permissions should be user read and write 2) using emacs, make a bogus fault file, words.Fault, etc. a) this file should contain a single record, with 3 numbers, separated by tab characters i) the numbers can be different if testing the recovery of a corrupt fault file ii) the numbers must be the same (but smaller than the length/size of the test file) to test the recovery of a corrupt data file b) the tests can be run under emacs/gdb for single step verification