Natural Hazards Archive
HAZHub, Michigan Tech's Natural Hazards Archive was developed as the Data Archive Plan of the NSF project, "Remote Sensing for Hazard Mitigation and Resource Protection in Pacific Latin America". This project was sponsored by the National Science Foundation's Office of International Science and Engineering (OISE). The initial repository data was directly related to the research performed by grad students and faculty involved in the project, but the repository is a pro-active archive system that will continue to maintain archives of data sets from current and future geologic hazards and resources research performed at Michigan Tech and other collaborating institutions. Geologic hazards and resources include research in the areas of geohydrology, volcanology, and seismology.
The backbone of the database is the open-source integrated Rule-Oriented Data System (iRODS), developed by the Data Intensive Cyber Environments Center (DICE Center) at the University of North Carolina at Chapel Hill. Development of the core iRODS system has been funded by the National Science Foundation and the National Archives and Records Administration.
HAZHub UserID/Password Request
Userids and passwords are required to access the data repository, and the level of use will determine each user's permission level. The root directory of the archive is \hazhub\home\ . From there, data is divided into "\PublicAccess\" , "\PrivateAccess\" , and "\Users\" directories.
\PublicAccess\ contains datasets that are accessible by the general public. Data should still be sited correctly when referenced in publications and presentations, but it is freely accessible and usable for research and publication. All users will be given access to view and download data in this directory.
\PrivateAccess\ contains proprietary data that is only accessible by Michigan Tech researchers and students, and is limited in use by copyright and licensing restrictions. General public users will be able to see directory and file names, but will not have permissions to view or download data.
\Users\ contains a subdirectory for each user with access to the HAZHub archive. This directory is used when Michigan Tech researchers and collaborators need to upload new data into the archive. The repository administrators will review the data and make sure the file & directory names are in compliance with the HAZHub naming conventions, and will also verify that a complete and accurate set of metadata has been provided for each file before transferring data sets to the correct public or private access directory.
If you are requesting access to the \PrivateAccess\ directory, please provide a Michigan Tech advisor or referral name.
Access to HAZHub Data
Access to HAZHub: Natural Hazards Archive
iRODS is the open-source software installed on Michigan Tech's server that manages the data archive. HAZHub iDROP and HAZHub Online are two web-based user interface programs that allow users to view and access the iRODS-managed database. Both interfaces access the same data, but they each provide unique features that will be helpful to users, depending on their use of the Natural Hazards Archive.
- HAZHub iDROP - is a Java Web Start (.jnlp) application,run directly from Mozilla Firefox
For off-campus users: if accessing the database from off of Michigan Tech's campus, users must create a vpn connection to Michigan Tech before they can access HAZHub iDROP or HAZHub Online. Use your MTU ISO Userid/password for your "Remote Access Logon for Michigan Tech" at vpn.mtu.edu. For users who are not part of Michigan Tech, an MTU ISO UserID/password will be created for you when you receive your HAZHub UserID/password.
This double password security is necessary with Michigan Tech's secure network policies.
1. HAZHub iDROP
Works with: Mozilla Firefox
Link to idrop.jnlp: http://iren-web.renci.org/idrop-release/idrop.jnlp
Allow the iDrop Java program to run
(Currently, Firefox will run idrop.jnlp, but other browsers download the idrop.jnlp file and then do not execute correctly because of missing support files)
Please log in to your iDrop data grid
User Name: UserID
Login Mode: Standard
Check the 'Advanced Login Settings' to change your password or to verify that the Default Resource is set to 'demoResc' (necessary if uploading data files).
The iDROP suite of programs is the newest data interface under development with the iRODS organization. By using the "iDrop (stable)" link on the RENCI GForge website, the most recent version of the program will be accessed without updates on the Michigan Tech server.
- The directory structure is similar to the Mac environment
- Multiple files may be uploaded or downloaded at once
- Searches can be done on file names, but the metadata searches are still under development
- Text searches are case sensitive
- Clicking on a file in the Search results box, shows the selected file in the directory structure
- Files can be renamed if needed
- Metadata can be Copied to the clipboard to paste in other applications (Select all metadata values, or use CTL-click to select specific items)
- Metadata can be added and deleted, but values cannot be modified.
Sign on to iRODS
- The directory structure is similar to the Windows environment
- Only one file at a time may be uploaded or downloaded
- Searches can be done on file names and on metadata values (ex. Lat/Long, Keywords, etc.)
- Text searches are case sensitive
- Files can be opened or downloaded from the Search Results box
- Files can not be renamed (use HAZHub iDROP for this)
- Metadata can only be copied/pasted one value at at time
- Metadata can be added, deleted, and modified
- Includes a Map feature for locating data sets and searching by Latitude and Longitude
HAZHub Archive Content
The Natural Hazards Archive is a collection of all types of data collected, generated, and processed by faculty and students in support of their research in the fields of volcanology, seismology, and hydrology. The data includes, but is not limited to, remote sensing, GIS, thermal infrared imaging, and seismic data, along with video and photograph recordings of events. The following outlines the types of data expected to be uploaded into the repository:
- Basic data or raw information used for a project, which otherwise is not easily accessible, and which we have permission to put in the repository either for the general public to have access to it, or for only a restricted group of people. Proprietary data will only be accessible by those who have permissions to use it, such as ASTER data downloaded under Michigan Tech’s account. Michigan Tech does not have permissions to redistribute ASTER data to the general public, but it is available for use by Michigan Tech researchers. Imagery data such as Landsat and MODIS will not be included in the repository since it is available for free on the Internet.
Typical datasets uploaded to the repository include:
- Satellite remote sensing datasets in GIS raster format (e. g. geotiff, .img, etc. files).
- GIS datasets acquired for the research purpose (e. g. shapefiles, raster datasets, etc.).
- Instrumental and measurement datasets directly collected during field work, including geophysical (e. g. seismic, acoustic, thermal (FLIR), VLF, 2D_ER, differential GPS, etc), geochemical (volcanic SO2, water chemistry, etc.), structural rock-mechanics (Schmidt hammer, joint orientation, etc.) and geo-hydrological (e. g. flow rate, well level, etc.) datasets.
- Visual (photos and video), and traditional field (notebook, hand-held GPS, etc.) recorded datasets.
- Analytical and laboratory results from the analysis of samples collected in the field, including geochemical (e. g. rock composition, thin section photographs, etc.).
- Interviews, surveys, and other human subject research datasets, restricted by Institutional Review Board (IRB) constraints).
2. Diverse products generated from the different research projects (e. g. thesis). This should include information in a variety of formats. The most condensed level will be the documents (thesis and papers) describing and presenting the research results, but a variety of other datasets are also expected. Such datasets include:
- Datasets and tables in spreadsheet or other general purpose database format (e. g. excel, Dbase, etc.).
- GIS files (raster files, shapefiles, .mxd ArcGis document files) and associated metadata files, including final maps and GIS layers.
- Datasets in specific electronic file formats associated to each discipline (e. g. outputs form seismic source modeling, corrected SO2 maps/images from the UV camera, etc.).
- Computer code and programming outputs (e. g. Matlab SO2 software analysis, other code written for specific data processing routines, etc.).
- Documents published in conferences and congresses, including abstracts, posters, presentations, etc.
- Papers published in peer reviewed journals, congress memoirs, etc.
- Other presentations given at workshops, internal (GMES) activities, etc.
- Thesis and report documents.
HAZHub Directory and File Name Conventions
Directories in the repository are referred to as Collections. The archived data is stored in the root directory of "hazhub/home/" and is divided into 2 main collections, "PublicAccess" and "PrivateAccess". The PrivateAccess collection contains proprietary data with use restricted to Michigan Tech researchers and collaborators, under the guidelines of the data's use policies. The PublicAccess collection will be made available to all archive users, with the request that data be sited correctly in any published documents and presentations.
The structures of the PublicAccess and PrivateAccess directories are set up in the following way:
File Name Structure
The file naming conventions were planned to assist users in searches of the data, either by using the search features or just by browsing the collections of data. Many data sets are a combination of multiple files and directories that have been tar'd and zipped. To keep individual file sizes between 1 and 2 GB, some data sets consist of multiple tar.gz files that end with a set number to indicate multiple files for the same data set.
- Basic filename format: TTTTTT_Geologic_Feature_yyyy-mm-dd_description_Set#.tar.gz
- Example: FLIR-SEQ_Pacaya_2009-01_BuenaVista_Set01.tar.gz
- TTTTTT: general type of data (FLIR, FLIR-SEQ, DOAS, ASTER, TIME-LAPSE, etc.)
- Geologic_Feature: name of volcano or other specific geologic feature location if data is not volcano-related
- yyyy-mm-dd: date of data, drop dd and mm if unknown. For a range of dates, 2010.05.15-31, has also been used.
- description: extra description to help identify data
- Set01: if data is a set of related files, they will be tar’d and zipped in groups of files 1-2 GBs in size. Some data sets will be a set of tar.gz files.
If there are 10 or more sets, they should be numbered with 2 digits for file sorting within directories. Drop the Set portion of the name if there is only one file.
- Filename for thesis: ThesisMSorPhD_Volcano_PublYear_LastName.pdf
- Filename for data directly supporting a thesis:
- Other Filename conventions:
Do not leave blank spaces in file names. use “-” inside dates. Use “_” for other spaces, even volcano names, Santa_Ana
Each file uploaded into the Natural Hazards Archive will have a set of associated Metadata. The Metadata will describe the data in more detail than the file name, and will allow the users to search for data on other values, such as key words and latitude/longitude locations.
- Contributor Name: Person contributing the data to the archive. This is usually the faculty member responsible for the data.
- Contributor Organization: Will most often be Michigan Technological University
- Country: Name of country where data was collected
- DataID: Identification number from the database. This will identify individual files in the repository, and will also be included in the Reference Citation field. This field is automatically filled in when new files are uploaded into the system.
- Data Description (only populated when more detailed information than the keywords has been given)
- Data Use Permissions: Reminder that data used for presentations and publication should be cited correctly, and proprietary data cannot be distributed.
- Field of Study: (i.e., Volcanology, Geohydrology, Seismology)
- Interest Level: (Technical, General Public). If a data set is labeled 'General Public', it will contain photos or video footage that would be of interest to the general population as well as researchers. If it is labeled 'Technical', then it is a set of data, mostly of interest to researchers.
- Keywords: list of key words that describe the data, and will be available for searches
- Latitude: decimal format, positive values for N, negative values for S
- Longitude: decimal format, positive values for E, negative values for W
- Reference Citation: citation to be used when referencing this data in publications. Include "DataID: nnnnn" at the end of the citation so the reference data can be easily found again.
- Geologic Feature/Location Name: If the data pertains to a volcano, this will be the volcano name. Otherwise it will be the name of the feature or location the data pertains to, such as the Quito Aquifer System.
- zLatitude (x1000): Latitude times 1000
- zLongitude (x1000): Latitude times 1000
These metadata values will be automatically created/updated by a script. They were added to the system because of a database issue discovered in iRODS where numeric comparisons of decimal metadata gave inaccurate results. With this fix, searches will be accurate for Lat/Long values with precision to 3 digits.
HAZHub Search Tips
For searches by File Name and other text fields
- The current text searches are case sensitive. Volcano names and Country names are named with the first letter as uppercase, and remaining letters in lowercase (ex. El_Salvador, Santiaguito, Guatemala).
- Only English alphabet characters are used in file names and metadata.
- both programs (HAZHub iDROP and HAZHub Online) will look for the search text in any part of the File Name field. Searching for 'Pacaya' will find every file with Pacaya in the File Name
- since file names are constructed with no spaces, search for 'Santa Ana' as 'Santa_Ana'
Searches on Metadata fields (HAZHub Online only)
- To open the Advanced Search, click on the "Adv Search" toolbar icon.
- To limit a search to just the PublicAccess files,
Select the "PublicAccess" directory in the Collections window
In the Advanced Search dialog, check the "Under Current Collection" box
- When the 'like' operator is used in metadata searches, the search text is found anywhere within the metadata value field. When the compare operators (=, <, <=, >, >=) are used, the comparison is performed with the first characters of the search text and the metadata value.
Keywords like FLIR will find all files with 'FLIR' listed anywhere in the Keywords field
Keywords = FLIR will only find the files where 'FLIR' is the only value listed in Keywords
Keywords > FLIR will only find files where the 1st Keyword follows 'FLIR' in the alphabet
Search by Latitude/Longitude (HAZHub Online only)
- Latitude and Longitude are metadata fields.
- If values are entered in the "Search by Latitude/Longitude" section of the Advanced Search, Latitude or Longitude search parameters entered in the Metadata list above will be ignored.
Search with Map (HAZHub Online only)
- Click "Map" icon on HAZHub Online toolbar
- A new window opens where all map icons represent dataset locations in the Natural Hazards Archive
- Double-clicking the icon shows the map feature name, Country, and lat/long coordinates
- Select the "Draw a Rectangle" box (top, center of map) and draw a search area
- Click the "Search" button to send the selected Latitude/Longitude coordinates to the Advanced Search dialog in HAZHub Online
Contributing Data to HAZHub
Contributions to the Natural Hazards Archive are encouraged, especially from current and former grad students involved in the PIRE project, "Remote Sensing for Hazard Mitigation and Resource Protection in Pacific Latin America".
- Data Review
Check with Carol Asiala (firstname.lastname@example.org) or John Gierke (email@example.com) before uploading data. We would like to know what type of data is being added to the archive and make sure contributors follow the archive guidelines we have set up.
- Upload to your user directory
Once you have received a userid and password, a directory under /hazhub/home/Users/<your userid>/ will be set up for you. This is the only place you will have permissions to upload data to the archive.
- Combining datasets
Please combine files that belong together using tar and zip. (ex. files & directories needed for an ArcInfo Map project)
- Keep file sizes under 2 GB
Files over 2 GB may be difficult for some users to download.
If a combined dataset will be over 2 GB, then create several files with the same naming convention, but add "Set01", Set02", etc. to the end of the file names. Users will know the data files belong together, and will download them together.
- Project Files, layers, and links
If you are uploading a project that contains links to layer files, please include all of the files needed to open and view your project.
When you upload a file in your Users directory, a set of blank metadata will be generated. Please fill in this data (access through HAZHub Online), including Latitude and Longitude values. If your data is focused on a particular volcano, use the lat/long coordinates from the Smithsonian Institution Global Volcanism Program database.
- Private or Public Access
Please indicate the type of data you are submitting. If it can be freely accessible by the general public, it will be moved it to /hazhub/home/PublicAccess/. If the data is proprietary and should only be used by Michigan Tech Researchers, then we will move it to /hazhub/home/PrivateAccess/.
- Notify Carol Asiala (firstname.lastname@example.org) when your upload is complete, and after reviewing, we will move it to the correct directory.