Data Warehouse Issues
There are certain issues surrounding data warehouses that companies need to
be prepared for. A failure to prepare for these issues is one of the key reasons
why many data warehouse projects are unsuccessful. One of the first issues
companies need to confront is that they are going to spend a great deal of time
loading and cleaning data.
A D V E R T I S E M E N T
Data Warehousing Security Issues
Data warehousing systems present special security issues including:
- The degree of security appropriate to summaries and aggregates of data
- The security appropriate for the exploration data warehouse, specifically designed for browsing and ad hoc queries
- The uses and abuses of data encryption as a method of enhancing privacy
Many data structures in the data warehouse are completely devoid of sensitive individual identities by design and, therefore, do not require protection appropriate for the most private and sensitive data. For example, when data has been aggregated into summaries by brand or region, as is often the case with data warehousing, the data no longer presents the risk of compromising the private identities of individuals. However, the data can still have value as competitive intelligence of market trends, and thus requires careful handling to keep it out of the hands of rival firms. Relaxed security does not mean a lack of commitment to security. The point is that differing levels of security requirements ought to remind us that one-size-fits-all solutions are likely to create trouble.
Another special security problem presented by data warehousing is precisely the reason why such systems exist. Data warehouses are frequently used for browsing and exploring vast reams of data – undirected exploration and knowledge discovery is provided by an entire class of data mining tools. The point is to find novel combinations of products and issues. Whether authentic or mythical, the example of market basket analysis whereby diapers are frequently purchased with beer is now a classic case. The father going to the convenience store for "emergency" disposable diapers and picking up a six-pack on the way out suggests a novel product placement. The point is that it is hard to say in advance what restrictions would disable such an exploratory data warehouse; therefore, the tendency is to define an unrestricted scope to the exploration. A similar consideration of undirected knowledge discovery applies to simple ad hoc access to the data warehouse. Examples where a business analyst uses end-user self-service tools such as those by Business Objects, Information Builders, Cognos or Oracle to issue queries directly against the data without intermediate application security give the end user access to all the data in the data warehouse. Given privacy and security imperatives, it may be necessary to render the data anonymous prior to unleashing such an exploratory, ad hoc process. That will create complexity where the goal is (sanctioned) cross-selling and up-selling. The identity must be removed in such a way that it can be recovered, as the purpose is often to make an offer to an individual customer.
Encryption of data has its uses, especially if the data must be transmitted over an insecure media such as the Internet. There, Secure Sockets Layer (SSL), which is an implementation of X.509 public-private key cryptography, serves well in transmitting credit card numbers, passwords and other identifying data. However, encryption is a poor method of access control. An employee, his or her manager and the human resources clerk all require access to the employee's record. Therefore, encrypting the data will not distinguish between their access levels. It is misguided to believe that if encrypting some data improves security, encrypting all the data improves security even more. Blanket, global encryption degrades performance, lessens availability and requires complex encryption key administration. Encryption is a computationally intense operation. It may not impact performance noticeably when performed for one or two data elements; but when performed arbitrarily for an entire table, the result may very well be a noticeable performance impact. It might make sense to encrypt all the data on a laptop PC that is being taken off site if the data is extremely sensitive. If the PC is lost or stolen, only encryption will guarantee that the data is not compromised. However, an even better alternative would be selective encryption and organizational steps to make sure the physical site is secure and the media containing the data is handled diligently.
As a general rule, proven security practices and solutions developed to secure networks will be appropriate and extended to protect the data warehouse. In other cases, data warehouses present special challenges and situations because the data is likely to be the target that encourages hackers to try to gain access to the system. These practices extend from organizational practices to high-technology code. The requirement for authentication implies certain behavior – on-site staff should wear their corporate identification badges and be required to sign an agreement never to share a user ID or password with anyone. Based on your enterprise's specific confidentiality rules, make selective use of new areas where technologies are still emerging. It is essential that database administrators work together with their security colleagues to define policies and implement them using the role-based access control provided with the standard relational database data control language (DCL).
Maintenance Issues for Data Warehousing Systems
Another important aspect of data warehousing is maintenance of these systems. Adequate knowledge about business and feeder system changes that will affect the DW/DSS systems is of utmost concern for anyone doing systems maintenance. In data warehousing environment data is fed from more sources than typical transaction processing system. Though intelligent use of the data extraction, cleaning, and loading tools and the information catalogs can greatly ease the burden here, many changes will require a fair amount of effort. By the way, keeping informed and assessing the impact of technically driven changes to the feeder systems may be more difficult than keeping track of the business driven changes.
You will have to figure out if, when, and how to purge data. There comes a point when it does not make business sense to hold certain data in the warehousing system. This usually happens because of some type of capacity limit or restructuring of data and it is not worth the effort to restructure certain data. At this point, purging of data is necessay with less expensive, alternative means of storage.
You will have to determine which queries and reports should be IS written and which should be user written.
You will be motivated to store data in the data warehouse "for data’s sake"
You will find endless opportunities to tune DW/DSS system databases
You will have to balance the need for building aggregate structures for processing efficiency with the desire not to build a maintenance nightmare
You will be uncertain whether to create certain reports/queries in the data warehousing system or in the "feeder" transaction processing system.
You will be pressured to implement a means to interactively correct data in the data warehouse (and perhaps send back corrections to the transaction processing system)
You will be uncertain which tools are most appropriate for a certain task.
You will have to figure out how to test the effect of structure changes on end user written queries and reports.
You will have to determine how problems with feeder system update processing affect DW/DSS system update processing.
You will find that maintaining a data warehouse architecture may be much harder than establishing the architecture.
You will find that the business changes the meanings of attributes over time and that these changes can be overlooked.
You will have to rework how you have implemented security.
You will have to keep reconciling feeder systems with the DW/DSS systems.
You will have to perform euthanasia on some DW/DSS systems.
You will find it is far more expensive (and complex) to maintain a data warehouse than to build one.