Click to view poster presentation in PDF format.
PSWhat Do Geologists Need to Know about Metadata?*
By
Donald W. Downey1
Search and Discovery Article #40235 (2007)
Posted May 10, 2007
*Adapted from poster presentation at AAPG Annual Convention, Long Beach, CA, April 1-4, 2007
1Chevron Corporation, San Ramon, CA ([email protected])
Over the life of an average interpretation project, gigabytes of information are stored. A geologist may generate an average of 2 to 3 important documents each week resulting in hundreds of files created yearly. The problem is how find the critically important files needed for the next project. What is the solution? Metadata is searchable information about a data resource. For an example, look at how the government manages data, they had to get organized and created geospatial metadata standards.
What are the metadata elements and standards needed by geologists? The goal is populate the metadata, make the metadata searchable and to maintain metadata elements for ownership and data retention time. Properties of the datasets such as title, author and creation date are harvested and recorded in metadata documents automatically, but the most useful search items will be manually entered through a metadata editor program. Theme codes and keywords from the metadata standards can be added in a special metadata profile that allows entry of metadata elements useful for hydrocarbon exploration. Custom metadata editors, metadata templates and xml stylesheets are especially useful for populating multiple datasets within a project.
Metadata standards and formats of image metadata, geospatial metadata for ArcGIS and the “document properties” of Microsoft Office files and Adobe Acrobat pdfs differ; however, the basic principles of data ownership and responsibility apply. By practicing adherence to standards, metadata is the key to sharing the data needed for geologic interpretation projects.
|
Introduction to Purpose of Poster I wanted to update metadata for some feature classes that I created to analyze the precision of IHS well locations in Iran. I was sending the files back to HIS and I thought I should think about entering some metadata to show them that I realized just whose data I was using. After investigating how to use metadata within ArcCatalog, I started to develop workflows for my projects. I then started to look at current implementations of metadata workflows and became frustrated with the lack of tools and usage of metadata in the earth science community.
Metadata Metadata is...searchable information about a data resource. or…“Metadata is hidden information in a computer file that may contain potentially dangerous or embarrassing information or lead to an accidental disclosure.”
http://blogs.adobe.com/acrolaw/2005/10/metadata_and_pd.html
Metadata is used to... Catalog/Search/Determine Usability/Document
For spatial datasets, provide enough information for users to work with those datasets in a GIS
Metadata philosophy: Taggers vs. Searchers •The taggers believe in adding complete metadata so we can search •The searchers believe in searching everything using powerful search tools Neither viewpoint gets the job done... We are getting overwhelmed by data! Many datasets have legal restrictions!
• How to edit metadata in ArcCatalog • What are the key metadata elements? • Auto-updating: When does it happen? • How to auto-generate keywords • Using .xml templates and enclosures
Remember... • Metadata can get lost when re-saving files. • Don’t put all the metadata into a single document. • Don’t put metadata into all of your documents. • Do use metadata .xml templates.
Metadata Completeness • What is the type of data being documented? Is it a geodatabase, shapefile, coverage, folder? • What is the purpose of the data? Is it for internal use only, will it be made public, will it be shared internationally? • Who is the audience for this dataset? Specialists or generalists? How much do they need to know? • What is your organization's policy regarding metadata completeness?
Metadata Workplan A metadata workplan is essential as we need to work together to create a successful implementation. • Create metadata • Maintenance • QC and Validation • Sharing permissions
What is the problem? • Complicated metadata editors • Generic style sheets • Numerous metadata fields • Poor synchronization • Poor metadata persistence • Very few geologists are actively entering metadata!
What is the desired state? • Author and owner recognized • Administration of data enhanced • Legal restrictions complied with • Metadata entered for all of our important data files!
What is my solution? • Remember that AAPG Ethics requires citation and protection of others' datasets and intellectual property. • Auto-fill higher-level information by cascading directory path information from enclosing folders and project work orders. • Auto-fill lower-level metadata fields using data analysis. • Improve synchronization between bibliographic and spatial metadata.
• Members shall not use or divulge any employer's or client's confidential information without their permission and shall avoid conflicts of interest that may arise from information gained during geological investigations. • Members shall freely recognize the work done by others, avoid plagiarism, and avoid the acceptance of credit due others. • Members shall endeavor to cooperate with others in the profession and shall encourage the ethical dissemination of geological knowledge. http://www.aapg.org/business/codethic.cfm
Maintain the spirit of the standard bibliographic citation for digital metadata!
Citation Examples California Division of Mines and Geology, 1992c, Geologic Atlas of California: San Luis Obispo, compiled by Jennings, C.W., California Department of Conservation, Sacramento, CA, GAM015, 1:250,000. Crowell, J.C., 1974, Origin of late Cenozoic basins in southern California, in Tectonics and sedimentation, Dickinson, W.R., ed., Society of Economic Paleontologists and Mineralogists Special Publication 22: Tulsa, Oklahoma, SEPM, p. 190-204. Graham, S.A., Ingersoll, R.V. and Dickinson, W.R., 1976, Common provenance for lithic grains in Carboniferous sandstone from Ouachita Mountains and Black Warrior Basin: Journal of Sedimentary Petrology v. 46(3), p. 620-632. Hall, C.A., Jr., 1973b, Geology of the Arroyo Grande 15' quadrangle, San Luis Obispo County, California, California Division of Mines and Geology Map Sheet 24: Sacramento, CA, California Division of Mines and Geology, 8 p. (Figures 1 and 2)
Metadata creation during... • Project approval • Initial image generation • Image processing • Georeferencing • Vector feature editing • Project results report
These programs all have their own unique metadata! (Figure 2): • Exchangeable Image File (EXIF) • Intl. Press Telecomm Council (IPTC) • Adobe Photoshop Document Properties • ACDSee database • ArcGIS (ArcCatalog) .xml • Microsoft Windows NTSF File Properties • Microsoft Office File Properties • Adobe Acrobat Document Properties
Metadata is fragile, hard to transport and is input by individuals with a variety of software tools, languages, and formats.
Metadata standards and formats of image metadata, geospatial metadata for ArcGIS and the “document properties” of Microsoft Office files and Adobe Acrobat pdfs differ; however, the basic principles of data ownership and responsibility apply. By practicing adherence to standards, metadata is the key to sharing the data needed for geologic interpretation projects.
(Figure 3)
Models are geoprocessing workflows.
Model metadata for casual users • Primary purpose is to search for data • Does not need details of processing • Create metadata after processing steps are done
Model metadata for specialists • Primary purpose is for QC analysis • Details of processing and results of analysis • Need to create and edit while doing processing
Possible enclosures: • project overview • graphic index map • model history files
Do geologists need tools to document the geoprocessing steps used in model building?
(Figures 4 - 6)
Digest existing metadata sources to create a database.
• Capture text in: • project summary* • attribute tables • text in document • similar datasets • Sort unique words • Sort capitalization • placenames • stratigraphy • lithology • paleontology • Select keywords • Validation with look-up tables • Intersection with spatial features
Metadata is data about data... Extract and generate it from the data!
*Project Summary Workflow Project proposals (Work Orders) can serve as the basis for a metadata database containing the Project Name, Project Owner and Project Purpose fields. The internal Charge Code field can serve a key field to interlink with project time-writing and with the data files generated for the project. Fill the project-level charge code information into the data files created during the project (auto-insert during creation or cascade from folder metadata) and then update Project Name, Project Owner and Project Purpose as needed. (Figures 7 - 9)
ArcCatalog provides an excellent GIS metadata editor, geared towards the GIS Analyst. Specialized (simplified) style sheets allow geologists to view only the needed and customized fields. Theme codes and keywords from the metadata standards can be added in a special metadata profile that allows entry of standardized metadata elements useful for hydrocarbon exploration.
ISO metadata standards • ISO metadata has more "optional" elements that address deficiencies in the FGDC standard. • Developed by ISO Technical Committee with FGDC, European, Australian Data Agencies. • Extensible and allows metadata profiles customized for a particular user community. • Includes Data Dictionary which characterizes the dataset, including its intended use and limitations. • Models - Unified Modeling Language (.uml). • Exports - eXtensible Markup Language (.xml).
FGDC Metadata Standards The Federal Geographic Data Committee (FGDC) is an interagency committee that promotes the coordinated development, use, sharing, and dissemination of geospatial data on a national basis. This nationwide data publishing effort is known as the National Spatial Data Infrastructure (NSDI). The NSDI is a physical, organizational, and virtual network designed to enable the development and sharing of this nation's digital geographic information resources. FGDC activities are administered through the FGDC Secretariat, hosted by the National Geospatial Programs Office (NGPO) of the U.S. Geological Survey. http://www.fgdc.gov/
Definitions from the ESRI Dictionary XML EXtensible Markup Language. Developed by the World Wide Web Consortium (W3C), XML is a standard for designing text formats that facilitates the interchange of data between computer applications. XML is a set of rules for creating standard information formats using customized tags and sharing both the format and the data across applications.
Style Sheet A file or form that provides style and layout information, such as margins, fonts, and alignment, for tagged content within an XML or HTML document. Style sheets are frequently used to simplify XML and HTML document design, since one style sheet may be applied to several documents. Transformational style sheets may also contain code to transform the structure of an XML document and write its content into another document.
What can we do?
Workflow Suggestion
The plan must be easier than the present metadata editing workflow!
Raw data versus interpreted data We can also think about raw data versus interpretations. We may think that the most important data is interpreted data, but for new ventures hydrocarbon exploration, the most important data is the original raw data. Even excellent previous interpretations become data for the next round of interpretations because most new big discoveries come from interpretation of NEW PLAY CONCEPTS.
Geologists are not geographers. In first job with Gulf Oil before the Chevron merger, I indexed or created metadata. They would wheel in map racks full of maps, and we counted how many maps we indexed. We were using metadata as an administrative tool. During the latest downturn, the well files personnel moved to IT positions and the earth scientists were to enter the metadata. Those of us who maintain metadata for geologic projects know that it hasn’t worked out so well. Geologist are creating data seismic lines and coring wells, but it difficult to enter all the metadata required in these projects when you work for a business. I have contacted other GIS analysts, even some who have presented metadata management solutions at the ESRI Users Forum and I found that metadata solutions that are not managed and that depend too much on casual user input are not successful. I believe we need 80-90% compliance to metadata standards across the industry to be successful in long-term management of our geologic data. Earth scientist are not GIS experts, they may like to look at maps, they may author a lot of maps, but they are not GIS experts and the vast majority of geologists do not know how to use ArcGIS beyond displaying data. They ARE very intelligent, they do get excited when they see a pretty map, but they don’t REALLY want to spend their time doing ArcGIS or they would have majored in GEOGRAPHY not GEOLOGY. ArcGIS is a tool that they may use to investigate a problem, but it is not the center of their work and side issues like data management and metadata editing are going to get short shrift. What we need is digital data librarians or GIS Analysts, they know the specialized nomenclature of the petroleum industry and they like to index and file things! Asking a geologist to be your GIS expert is cheaper in the short-term, but long-term it is inherently inefficient.
The easier the metadata workflow, the more it will adopted. I believe we will need a heavily-automated solution, no matter how complex the programming is.
ESRI Petroleum User’s Group Metadata Working Group Wiki |