Introduction to Computer Information Systems/Database - Wikibooks, open books for an open world
Search to access the best journals, databases, and ebook resources from your favorite History Vault: 18 million pages of insights into America's watershed moments. Security Archive reveal an era of cooperation in U.S.-Russian relations. "This paper briefly discusses the relationship between XML and databases and lists Instead of users surfing the Internet via HTML pages linked with hyperlinks, we . In XML Magazine Volume 3, Number 3 (April/May ), pages Knowledge and use of medline was even lower with only 18% using it regularly. . mostly for word processing (50%, n = 41), the Internet (18%, n = 15) and . The strongest relationship was between use of computer at home.
Programmers and designers began to treat the data in their databases as objects. That is to say that if a person's data were in a database, that person's attributes, such as their address, phone number, and age, were now considered to belong to that person instead of being extraneous data.
This allows for relations between data to be relations to objects and their attributes and not to individual fields. Object databases and object-relational databases attempt to solve this problem by providing an object-oriented language sometimes as extensions to SQL that programmers can use as alternative to purely relational SQL. On the programming side, libraries known as object-relational mappings ORMs attempt to solve the same problem. XML databases are mostly used in applications where the data is conveniently viewed as a collection of documents, with a structure that can vary from the very flexible to the highly rigid: NoSQL databases are often very fast, do not require fixed table schemas, avoid join operations by storing denormalized data, and are designed to scale horizontally.
In recent years, there has been a strong demand for massively distributed databases with high partition tolerance, but according to the CAP theorem it is impossible for a distributed system to simultaneously provide consistencyavailability, and partition tolerance guarantees. A distributed system can satisfy any two of these guarantees at the same time, but not all three. For that reason, many NoSQL databases are using what is called eventual consistency to provide both availability and partition tolerance guarantees with a reduced level of data consistency.
NewSQL is a class of modern relational databases that aims to provide the same scalable performance of NoSQL systems for online transaction processing read-write workloads while still using SQL and maintaining the ACID guarantees of a traditional database system. This section does not cite any sources. Please help improve this section by adding citations to reliable sources. Unsourced material may be challenged and removed.
March Learn how and when to remove this template message Databases are used to support internal operations of organizations and to underpin online interactions with customers and suppliers see Enterprise software. Databases are used to hold administrative information and more specialized data, such as engineering data or economic models.
Examples include computerized library systems, flight reservation systemscomputerized parts inventory systemsand many content management systems that store websites as collections of webpages in a database. Classification[ edit ] One way to classify databases involves the type of their contents, for example: Another way is by their application area, for example: A third way is by some technical aspect, such as the database structure or interface type.
This section lists a few of the adjectives used to characterize different kinds of databases. An in-memory database is a database that primarily resides in main memorybut is typically backed-up by non-volatile computer data storage. Main memory databases are faster than disk databases, and so are often used where response time is critical, such as in telecommunications network equipment.
An active database includes an event-driven architecture which can respond to conditions both inside and outside the database. Possible uses include security monitoring, alerting, statistics gathering and authorization. Many databases provide active database features in the form of database triggers.
Cover Pages: XML and Databases
A cloud database relies on cloud technology. Both the database and most of its DBMS reside remotely, "in the cloud", while its applications are both developed by programmers and later maintained and used by end-users through a web browser and Open APIs. Data warehouses archive data from operational databases and often from external sources such as market research firms. The warehouse becomes the central source of data for use by managers and other end-users who may not have access to operational data.
For example, sales data might be aggregated to weekly totals and converted from internal product codes to use UPCs so that they can be compared with ACNielsen data. Some basic and essential components of data warehousing include extracting, analyzing, and mining data, transforming, loading, and managing data so as to make them available for further use. A deductive database combines logic programming with a relational database.
A distributed database is one in which both the data and the DBMS span multiple computers. A document-oriented database is designed for storing, retrieving, and managing document-oriented, or semi structured, information. Document-oriented databases are one of the main categories of NoSQL databases. An embedded database system is a DBMS which is tightly integrated with an application software that requires access to stored data in such a way that the DBMS is hidden from the application's end-users and requires little or no ongoing maintenance.
Examples of these are collections of documents, spreadsheets, presentations, multimedia, and other files. Several products exist to support such databases. A federated database system comprises several distinct databases, each with its own DBMS. It is handled as a single database by a federated database management system FDBMSwhich transparently integrates multiple autonomous DBMSs, possibly of different types in which case it would also be a heterogeneous database systemand provides them with an integrated conceptual view.
Sometimes the term multi-database is used as a synonym to federated database, though it may refer to a less integrated e. In this case, typically middleware is used for distribution, which typically includes an atomic commit protocol ACPe. A graph database is a kind of NoSQL database that uses graph structures with nodes, edges, and properties to represent and store information.
General graph databases that can store any graph are distinct from specialized graph databases such as triplestores and network databases.
In a hypertext or hypermedia database, any word or a piece of text representing an object, e. Hypertext databases are particularly useful for organizing large amounts of disparate information.
For example, they are useful for organizing online encyclopediaswhere users can conveniently jump around the text. The World Wide Web is thus a large distributed hypertext database. Also a collection of data representing problems with their solutions and related experiences.
A mobile database can be carried on or synchronized from a mobile computing device. Operational databases store detailed data about the operations of an organization. They typically process relatively high volumes of updates using transactions. Examples include customer databases that record contact, credit, and demographic information about a business's customers, personnel databases that hold information such as salary, benefits, skills data about employees, enterprise resource planning systems that record details about product components, parts inventory, and financial databases that keep track of the organization's money, accounting and financial dealings.
A parallel database seeks to improve performance through parallelization for tasks such as loading data, building indexes and evaluating queries. The major parallel DBMS architectures which are induced by the underlying hardware architecture are: Shared memory architecturewhere multiple processors share the main memory space, as well as other data storage. Shared disk architecture, where each processing unit typically consisting of multiple processors has its own main memory, but all units share the other storage.
Shared nothing architecturewhere each processing unit has its own main memory and other storage. Probabilistic databases employ fuzzy logic to draw inferences from imprecise data.
Real-time databases process transactions fast enough for the result to come back and be acted on right away. A spatial database can store the data with multidimensional features. The queries on such data include location-based queries, like "Where is the closest hotel in my area? A temporal database has built-in time aspects, for example a temporal data model and a temporal version of SQL.
More specifically the temporal aspects usually include valid-time and transaction-time. A terminology-oriented database builds upon an object-oriented databaseoften customized for a specific field.
An unstructured data database is intended to store in a manageable and protected way diverse objects that do not fit naturally and conveniently in common databases. It may include email messages, documents, journals, multimedia objects, etc. The name may be misleading since some objects can be highly structured. However, the entire possible object collection does not fit into a predefined structured framework. Database interaction[ edit ] Database management system[ edit ] Connolly and Begg define Database Management System DBMS as a "software system that enables users to define, create, maintain and control access to the database".
Other extensions can indicate some other characteristic, such as DDBMS for a distributed database management systems. The functionality provided by a DBMS can vary enormously. The core functionality is the storage, retrieval and update of data. Codd proposed the following functions and services a fully-fledged general purpose DBMS should provide: Often DBMSs will have configuration parameters that can be statically and dynamically tuned, for example the maximum amount of main memory on a server the database can use.
The trend is to minimise the amount of manual configuration, and for cases such as embedded databases the need to target zero-administration is paramount. The large major enterprise DBMSs have tended to increase in size and functionality and can have involved thousands of human years of development effort through their lifetime. The client—server architecture was a development where the application resided on a client desktop and the database on a server allowing the processing to be distributed.
This evolved into a multitier architecture incorporating application servers and web servers with the end user interface via a web browser with the database only directly connected to the adjacent tier.
For example an email system performing many of the functions of a general-purpose DBMS such as message insertion, message deletion, attachment handling, blocklist lookup, associating messages an email address and so forth however these functions are limited to what is required to handle email. Application[ edit ] External interaction with the database will be via an application program that interfaces with the DBMS. Application Program Interface[ edit ] A programmer will code interactions to the database sometimes referred to as a datasource via an application program interface API or via a database language.
Database languages[ edit ] Database languages are special-purpose languages, which allow one or more of the following tasks, sometimes distinguished as sublanguages: Data control language DCL — controls access to data; Data definition language DDL — defines data types such as creating, altering, or dropping and the relationships among them; Data manipulation language DML — performs tasks such as inserting, updating, or deleting data occurrences; Data query language DQL — allows searching for information and computing derived information.
Database languages are specific to a particular data model. SQL combines the roles of data definition, data manipulation, and query in a single language. It was one of the first commercial languages for the relational model, although it departs in some respects from the relational model as described by Codd for example, the rows and columns of a table can be ordered.
The standards have been regularly enhanced since and is supported with varying degrees of conformance by all mainstream commercial relational DBMSs. DBMS-specific configuration and storage engine management Computations to modify query results, like counting, summing, averaging, sorting, grouping, and cross-referencing Constraint enforcement e. Computer data storage and Database engine Database storage is the container of the physical materialization of a database.
It comprises the internal physical level in the database architecture. It also contains all the information needed e. Putting data into permanent storage is generally the responsibility of the database engine a. Though typically accessed by a DBMS through the underlying operating system and often using the operating systems' file systems as intermediates for storage layoutstorage properties and configuration setting are extremely important for the efficient operation of the DBMS, and thus are closely maintained by database administrators.
A DBMS, while in operation, always has its database residing in several types of storage e. The database data and the additional needed information, possibly in very large amounts, are coded into bits.
Data typically reside in the storage in structures that look completely different from the way the data look in the conceptual and external levels, but in ways that attempt to optimize the best possible these levels' reconstruction when needed by users and programs, as well as for computing additional types of needed information from the data e.
Some DBMSs support specifying which character encoding was used to store data, so multiple encodings can be used in the same database.
Various low-level database storage structures are used by the storage engine to serialize the data model so it can be written to the medium of choice. Techniques such as indexing may be used to improve performance. As a result, dbXML only supports these modes of access. This project has evolved quite a bit since version 1. It is already a mature product, with some rather high profile users, and is in a very good position to become the dominant open source XML database, if not one of the more popular XML databases in general Natural version 6 also allows developers working in Windows to access Natural programs running in UNIX or on a mainframe -- a capability the company calls Single-Point-of-Development.
Both capabilities are designed to increase the speed and convenience of using Natural in an open systems environment. Thanks to an expanded XML tool kit and new language constructs, users of Natural version 6 can process XML documents with greater ease and flexibility. The Single-Point-of-Development interface allows a Windows PC running Natural version 6 to access Natural programs running on Unix and mainframes -- thus combining the flexible development potential found on a Windows operating system with the stability and performance of mainframe and Unix.
Using Single-Point-of-Development, programs created in Windows can be modified directly on the server platform, thereby addressing versioning and synchronizing issues flowing from the need to save code separately on multiple platforms. Single-Point-of-Development is available not only for the core Natural system, but also for four Natural add-ons: These additional Natural engineering tools can therefore also be used via the Single-Point-of-Development interface Net, and Microsoft Corp.
That deadline was pushed out to the second half of next year after customers said they expected Yukon to fit hand-in-glove with the next version of. The Yukon beta was released in July to some 2, customers and partners. Net integration that customers demanded, along with upcoming features such as native XML and Web Services support, will benefit enterprises.
XML is ultimate interoperability -- it's an industry-standard format, and it's self-describing. You know both the schema of the data as well as the data itself. You don't lose the context when you pass your data around. We upped the level of XML support in Yukon through a number of things. In we had XML support but -- it was shredding.
Shredding is the parsing of XML tag components into corresponding relational table columns. In Yukon the key thing is we have an XML type. Although we had XML support inand many leveraged it and were happy with it, now we have native support You can take the relational sorts of queries you're used to in the database world, where people select things from tables with filters on that data.
You can combine XQuery statements with such relational queries Don't expect a simple rerun of the last movie, though. We've always known that most of the information that runs our businesses resides in the documents we create and exchange, and those documents have rarely been kept in our enterprise databases. Now that XML can represent both the documents that we see and touch -- such as purchase orders -- and the messages that exchange those documents on networks of Web services, it's more critical than ever that our databases can store and manage XML documents.
A real summer blockbuster is in the making. No one knows exactly how it will turn out, but we can analyze the story so far and make some educated guesses.
Introduction to Computer Information Systems/Database
The traditional approach required programmatic access to the result set and programmatic construction of the Web page. Most of the information in an enterprise lives in documents kept in file systems, not in relational databases.
There have always been reasons to move those documents into databases -- centralized administration, full-text search -- but in the absence of a way to relate the data in the documents to the data in the database, those reasons weren't compelling. XML cinches the argument. As business documents morph from existing formats to XML -- admittedly a long, slow process that has only just begun -- it becomes possible to correlate the two flavors of data A new query language developed by SQL veterans is promising to smooth things over and get everything talking again.
It's impossible to discuss the future of the software industry without discussing XML. XML has become so important that SQL is no longer the stock reply to the question, 'What query language is supported by all the major database software companies? Some are shredded or decomposed before their content is inserted into an SQL database. Others are stored in native XML format, with no decomposition. XML documents are hierarchical or tree-structured data.
They're self-describing in that they consist of content and markup tags that identify the content. In SQL databases, such as DB2, individual rows don't contain column names or types because that information is in the system catalog. The XML model is different. As with SQL, schemas that are external to the content they describe define names and type information.
However, it's possible to process XML documents without using schemas. XML documents contain embedded tags that label the content. The nesting and order of elements in a document must be preserved in XML documents. Many queries against documents require positional logic to navigate to the correct node in a document tree.
When shredding documents and mapping them to columns, it's necessary to store information about the document structure. Other requirements for querying XML documents include pattern matching, calculations, expressions, functions, and working with namespaces and schemas The database supports XPath 1. It offers flexible indexing, giving application developers the ability to control query performance and tune data retrieval Having Berkeley DB as the base engine for the XML offering means that the new product will inherit advanced database features such as concurrent access, transactions, recovery and replication, officials said.
It will scale up to terabytes for the database and up to 4GB for individual keys and values The release of the open source code heralds the end of a month beta program that comprised some 5, companies, many of them huge names such as 3M Co. Those big names are testimony to the traction XML is gaining in the enterprise, said Sleepycat officials, in Lincoln, Mass.
Sleepycat's software is sold using a typical open-source scheme: The company has paying customers, according to officials The new product enables businesses to abstract a common data model across data and content sources and to access and manipulate them as though they were a single source. IBM's DB2 software helps businesses increase efficiencies by enabling them to centrally manage data, text, images, photos, video and audio files stored in a variety of databases.
The new IBM product is most appropriate for projects whose primary data sources are relational data augmented by other XML, Web, or content sources. The federated data server allows administrators to use integrated graphical tools to configure data source access and define integrated views across diverse and distributed data; XML schema can be automatically mapped into relational schema.
But SQL DBMS products will increasingly be judged on how well they support traditional tasks such as transaction processing while evolving to provide new capabilities such as integrated business analytics. The latest releases of data management software from the big three vendors unite SQL with multidimensional and document-centric XML data and grid computing. Whether an organization follows a best-of-breed approach or taps a single vendor to build an IT infrastructure, problems can arise with interoperability, data aggregation, and data and application integration.
But XML is only one of the fields on which the database software giants are competing. Each company tries to gain an edge over the others by complementing their database platforms with broad-spectrum software offerings such as vertical market applications and developer tools. The products diverge when it comes to programming database server plug-ins, querying multidimensional data sets, persisting message queues, orchestrating the flow of Web services, and processing audio, video, and other rich data types.
Ipedo, for example, late this month will release Version 3. The auto-organization component organizes, merges and transforms inbound content according to business rules, said Ipedo officials, in Redwood City, Calif.
The third new piece, a universal XML Query engine, provides local and remote content and data source searching and updating using the XQuery standard. Some of the Ipedo upgrade's new features are compelling for user Thor Anderson, who is manager of program development at Collegis Inc. Sleepycat next month will release Version 4. For many years the open source Berkeley DB libraries have been a popular choice for embedded database applications.
It has been so ubiquitously used that chances are, you rely on some software product that embeds Berkeley DB. It is therefore pretty exciting when SleepyCat, the maintainers of Berkeley DB, announce that they will be releasing an XML-aware version of their database software.
It's built on top of Berkeley DBa 'key-value' database which provides record storage and transaction management. These can then be matched and retrieved, either as complete documents or as fragments, via the XML query language XPath.
An XML database has several advantages over key-value, relational, and object-oriented databases: According to [John] Merrells, it is being evaluated by "several serious commercial enterprises.
SleepyCat allows for commercial licensing of their open source tools, which may make this solution attractive for corporations that are skittish about open source. This combination may provide a strong alternative to relational and object-oriented databases Since any data storage technology requires a significant investment in time and effort, this strong level of community and corporate support is encouraging; Berkeley DB XML, currently in its infancy, seems likely to be around for a long time, and by offering a standard embedded interface it may provide a very useful tool for programmers in need of robust data storage who want to avoid the overhead of a relational database.
The tool has some growing to do, but even in its current form many programmers will find it a useful tool with a logical, powerful interface Reading reference for the course "Modern Database Management Systems" Winter Term, ; "this course covers research topics in advanced database management systems as well as emerging database techonologies, with emphasis on XML data and XML support for object-oriented database management systems However, the same problem of distinguishing well designed databases from poorly designed ones arises in other data models, in particular, XML.
While in the relational world the criteria for being well designed are usually very intuitive and clear to state, they become more obscure when one moves to more complex data models. Our goal is to provide a set of tools for testing when a condition on a database design, specified by a normal form, corresponds to a good design. We use techniques of information theory, and define a measure of information content of elements in a database with respect to a set of constraints.
Finally, we look at information theoretic criteria for justifying normalization algorithms. Several [other] papers attempted a more formal evaluation of normal forms, by relating it to the elimination of update anomalies.
Another criterion is the existence of algorithms that produce good designs: Our [research] goal was to find criteria for good data design, based on the intrinsic properties of a data model rather than tools built on top of it, such as query and update languages. We were motivated by the justification of normal forms for XML, where usual criteria based on update anomalies or existence of lossless decompositions are not applicable until we have standard and universally acceptable query and update languages.
We proposed to use techniques from information theory, and measure the information content of elements in a database with respect to a set of constraints. We tested this approach in the relational case and showed that it works: In general, the approach is very robust: It has also acquired derivation by extension and attribute groups, adding to its existing W3C schemas support.
The database now has full-text search that features LIKE wildcard matching, found word marking, phrase search and proximity search.
The database also supports XQuery with update extensions, full text search, shared resource management and XML schemas with automatic validation. That standard indicates that the database supports "all or nothing" transactions -- those that either work to their conclusion or refrain from changing data. The book is divided into five parts each containing a coherent and closely related set of chapters; these are self-contained and can be read in any order: See also the online Table of Contents.
The update describes new work as of June The first part provides a mapping from a single table, all tables in a schema, or all tables in a catalog to an XML document. The second of these parts includes the creation of an XML data type in SQL and adds functions that create values of this new type. Finally, the 'infrastructure' work that we described in our previous article included the mapping of SQL's predefined data types to XML Schema data types.
This mapping has been extended to include the mapping of domains, distinct types, row types, arrays, and multisets Yes, an XSLT processor, just like any other application which expects legitimate XML as input, will choke on ampersands, less-than symbols, and so on instead of their entity-reference forms. If you examine the raw source behind the GUI cosmetics, though, you'll find entity references scattered around even though you well know you didn't key them in yourself.
The editor is in effect mediating between the markup- and non-markup-based worlds in the same way that your preprocessor would need to do Dobb's Journal Volume 28, Issue 3 Marchpages