ADT - Australasian Digital Thesis Program

Member Information

Notes

  1. Tags/symbols
    HTML tags, angle brackets and double quotes are invalid in DC metadata. Their use causes the metadata gathering to fail. The gathered (or not gathered metadata) forms the central ADT metadata database. The failure or non gathering of metadata results in critical data not being available in the database (eg the whole abstract, or title or even author) which adversely affects the ability to search this information.

    The following conventions need to be followed in order to maintain the integrity of the metadata database:
    • HTML tags: is these are used to improve the formatting (eg the abstract) then they should be removed from the metadata. This will prevent the gathering from failing.
    • Common symbols - greater/less than brackets [< >], double quotes ["..."]: these are used commonly in scientific/mathematical material for example. They should be represented as code in the Submission Form and thus in the metadata. See examples below:

      Example:
      " [double quotes] use code &quote;
      < [less than] use code &lt;
      > [greater than] use code &gt;
  2. Diacritics
    The use of these is problematic when it comes to searching the metadata database. As use of diacritics is most likely to be language specific - and probably relate to names in particular.

    For ADT purposes, the recommendation is that either the author, or agent (library staff, etc..) be responsible for either using code in the Submission Form if the absolute correct spelling is required (eg: acute, grave, umlaut, etc..) or determine the appropriate and acceptable anglicised form of the word or name.

    Again the code could be stripped out of the metadata but this would have to be thought if desired.

    It is important to keep in mind that names can have different acceptable anglicised spellings which makes global changes difficult. Eg - Schäfer has two acceptable versions; Schafer and Schaefer. Stripping out the code for the umlaut would not necessarily reflect the preferred choice of the author and therefore the common anglicised use of the name.

    Example:
    Author name: Concepción
    • Concepci&oacute;n (with code in order to reflect the exact spelling in the metadata view. This makes the metadata database searchable either by truncating the name or using the full name with code, provided the exact use of the code is known to the searcher.
    • Concepcion (the anglicised version could be used in place of the exact name for both the metadata and public HTML view; alternatively the code could be stripped out of the metadata making the metadata database view reflect the anglicised form while the public HTML reflects the correct spelling which includes the acute.
    Either of the above approaches is acceptable.

    See examples:
    • UNSW - [author] wilson + [title words] bibliometric analysis
    • Griffith - [author] hellsten + [title words] educational discourses

If other code is required please refer to the Standard ISO/IEC8859-1 "Information technology -- 8-bit single-byte coded graphic character sets -- Part 1") - see: http://www.ramsch.org/martin/uni/fmi-hp/iso8859-1.html

The two items above (tags/symbols/quotes and diacritics) do not necessarily mean that the problem will not be fixable in the future, but for the moment the issue remains problematic. Following the above conventions will at least make the metadata loadable, viewable and therefore searchable. It's not pretty but it works.

The ADT team at UNSW has deliberately steered away from making global changes as these are not necessary the best option when it comes to maintaining faithful copies of significant documents such as theses. The situation will be monitored and options investigated for a better way of handling tags and symbols within metadata.

Copyright © Council of Australian University Librarians 1997 - Updated: Wednesday March 26, 2008 14:33 - Web Co-ordinator