[KATALIST] Új ISO szabvány WARC File Format Published as an International Standard
Berke
berke.barnabas at chello.hu
2009. Jún. 3., Sze, 15:53:00 CEST
Tisztelt kollégák!
Azoknak továbbítom, akik szakmailag illetékesek.
Üdvözlettel
Berke Barnabásné
----- Original Message -----
From: "Abbie M Grotke" <abgr at loc.gov>
To: "Abbie M Grotke" <abgr at loc.gov>
Sent: Tuesday, June 02, 2009 10:33 PM
Subject: [DIGLIB] WARC File Format Published as an International Standard
> The International Internet Preservation Consortium is pleased to
> announce the publication of the WARC file format as an international
> standard: ISO 28500:2009, Information and documentation -- WARC file
> format.
> [http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=44717]
>
> For many years, heritage organizations have tried to find the most
> appropriate ways to collect and keep track of World Wide Web material
> using web-scale tools such as web crawlers. At the same time, these
> organizations were concerned with the requirement to archive very large
> numbers of born-digital and digitized files. A need was for a container
> format that permits one file simply and safely to carry a very large
> number of constituent data objects (of unrestricted type, including many
> binary types) for the purpose of storage, management, and exchange.
> Another requirement was that the container need only minimal knowledge
> of the nature of the objects.
>
> The WARC format is expected to be a standard way to structure, manage
> and store billions of resources collected from the web and elsewhere. It
> is an extension of the ARC format
> [http://www.archive.org/web/researcher/ArcFileFormat.php ], which has
> been used since 1996 to store files harvested on the web. WARC format
> offers new possibilities, notably the recording of HTTP request headers,
> the recording of arbitrary metadata, the allocation of an identifier for
> every contained file, the management of duplicates and of migrated
> records, and the segmentation of the records. WARC files are intended to
> store every type of digital content, either retrieved by HTTP or another
> protocol.
>
> The motivation to extend the ARC format arose from the discussion and
> experiences of the International Internet Preservation Consortium [
> http://netpreserve.org/ ], whose core mission is to acquire, preserve
> and make accessible knowledge and information from the Internet for
> future generations. IIPC Standards Working Group put forward to ISO
> TC46/SC4/WG12 a draft presenting the WARC file format. The draft was
> accepted as a new Work Item by ISO in May 2005.
>
> Over a period of four years, the ISO working group, with the
> Bibliothèque nationale de France [http://www.bnf.fr/ ] as convener,
> collaborated closely with IIPC experts to improve the original draft.
> The WG12 will continue to maintain [http://bibnum.bnf.fr/WARC/ ] the
> standard and prepare its future revision.
>
> Standardization offers a guarantee of durability and evolution for the
> WARC format. It will help web archiving entering into the mainstream
> activities of heritage institutions and other branches, by fostering the
> development of new tools and ensuring the interoperability of
> collections. Several applications are already WARC compliant, such as
> the Heritrix [http://crawler.archive.org/ ] crawler for harvesting, the
> WARC tools [http://code.google.com/p/warc-tools/ ] for data management
> and exchange, the Wayback Machine
> [http://archive-access.sourceforge.net/projects/wayback/ ], NutchWAX
> [http://archive-access.sourceforge.net/projects/nutch/ ] and other
> search tools [http://code.google.com/p/search-tools/ ] for access. The
> international recognition of the WARC format and its applicability to
> every kind of digital object will provide strong incentives to use it
> within and beyond the web archiving community.
>
> A press release is available on the IIPC website:
> http://netpreserve.org/press/pr20090601.php
>
> General information about the IIPC can be found at:
> http://netpreserve.org
>
> —--------------------
> Abbie Grotke
> Library of Congress
> IIPC Communications Officer
> netpreserve.org
>
>
>
További információk a(z) Katalist levelezőlistáról