![]() |
Last week in my post, I discussed why you would choose to govern data outside of the database. Today I will discuss how you do it. Most conversations about Big Data Governance focus on why it is necessary but rarely on how it is done. The activities performed to set up a Data Governance program, whether involved with structured or unstructured data, still include defining an organizational structure, creating a charter, defining common business terms, and identifying the Data Stewards in the organization. In fact, when you hear Data Governance described by a document oriented organization, there is almost no difference in document governance and the governance of data that is found in databases. Data Steward Responsibilities by Data Store Data Governance is implemented by Data Stewards who are responsible for one or more data stores against which they perform the following initial and on-going activities:
Organizing Data Stewards in the Organization Since Data Stewardship is usually organized by data store, and structured and unstructured data are stored in separate data stores, it is usually true that the Data Stewards may be by data type, as well as the other breakdowns of responsibility. For example, the Data Steward focused on the governance of documents in a business area may be different from the Data Steward responsible for customer master data. Data Stewards are usually organized across various dimensions:
Data Stewardship Tools The tools used for Data Stewardship include
The tool sets for unstructured data stewardship may be significantly different to those focused on the management of data in databases. The tools for unstructured data management will also include:
NoSQL Data Stewardship Most NoSQL databases (non-relational databases) can be governed in the same way as relational databases, although the profiling tools used on relational databases will usually have to be replaced by profiling using utilities specific to the database in question. Text search tools work on document databases and Hadoop data structures. Managing Data In Motion Traditional Data Governance programs may not be including managing the non-persistent data passing through the organization, or the “data in motion”. Even Data Governance programs that focus on the data in databases, and certainly Big Data Governance programs, should be establishing responsibility for the rules that govern the movement and transformation of data in the organization. These things are not just technical “code” but the business decisions on how critical data in the organization is transformed and calculated.
See my new book on Data Integration for more on “Managing Data In Motion.” Data Governance Maturity For organizations that are focused on the creation or management of unstructured data or documents, such as mortgage companies, publishers, media companies, and pharmaceutical companies who file drug submissions, the governance of unstructured data is crucial to their franchise and Data Governance of this data is very mature. For most organizations, some policies and tools for the governance of email and documents probably exist, but having a full Big Data Governance program is usually limited to organizations with very mature Data Governance capabilities. |
Update your feed preferences |
