Text corpus: Difference between revisions

    From Consumerium development wiki R&D Wiki
    (what this is, which ones exist)
     
    (Wikitravel uses mediawiki and the Creative Commons license )
     
    Line 7: Line 7:
    The [[GFDL text corpus]] is also quite large and somewhat more robust, although under the influence of several [[enemy projects]] it is becoming less robust, and more likely to reflect the viewpoints of those projects, and not those who actually understand the subjects.  Contributions to the [[R&D wiki]] are part of this corpus, but, that does not mean that other contributions to the GFDL corpus, notably not those to [[Wikipedia]] or [[Disinfopedia]], should be taken as equivalently credible to those that are made via the Consumerium interface.  As it stands, this is the most credible of the various [[mediawiki]] front ends as it has the least autocratic and overtly politically biased editorial policy - most likely due to its [[value system]] which promotes [[transparency]] and [[accountability]], as opposed to being a "hobby" or "lobbying" project.  In time, the standards of the GFDL text corpus will likely be set here, not in the larger/other projects, which will become less trustworthy over time.
    The [[GFDL text corpus]] is also quite large and somewhat more robust, although under the influence of several [[enemy projects]] it is becoming less robust, and more likely to reflect the viewpoints of those projects, and not those who actually understand the subjects.  Contributions to the [[R&D wiki]] are part of this corpus, but, that does not mean that other contributions to the GFDL corpus, notably not those to [[Wikipedia]] or [[Disinfopedia]], should be taken as equivalently credible to those that are made via the Consumerium interface.  As it stands, this is the most credible of the various [[mediawiki]] front ends as it has the least autocratic and overtly politically biased editorial policy - most likely due to its [[value system]] which promotes [[transparency]] and [[accountability]], as opposed to being a "hobby" or "lobbying" project.  In time, the standards of the GFDL text corpus will likely be set here, not in the larger/other projects, which will become less trustworthy over time.


    The [[Creative Commons text corpus]] is also quite large, and some think it may eventually eclipse the GFDL - if so it is the editorial standards of the CC, not those of the GFDL, that should succeed, as the latter are rigorously defined by [[Lawrence Lessig]] and others with some integrity that the operators of GFDL editing sites simply cannot claim.
    The [[Creative Commons text corpus]] is also quite large, and some think it may eventually eclipse the GFDL - if so it is the editorial standards of the CC, not those of the GFDL, that should succeed, as the latter are rigorously defined by [[Lawrence Lessig]] and others with some integrity that the operators of GFDL editing sites simply cannot claim.  [[Wikitravel]] for instance uses both the [[mediawiki]] user interface and the CC license.

    Latest revision as of 04:46, 28 December 2003

    A text corpus is a body of text following certain standards, e.g. markup standards such as a wikitext standard or an XML DTD, or even just using a common glossary. The entire World Wide Web can be thought of as a single text corpus divided by languages, and by documentation licenses. There are tools specific to large text corpus analysis that are increasingly available for public use.

    The public domain text corpus is quite large including all the US federal government material, Project Gutenberg, and almost anything published prior to the 20th century. Most of this is actually visible and searchable now.

    There is a lot of material under ambiguous copyright in Netnews posted over the 1980s and 1990s. Most of it is copyright the original author, but some is in the public domain explicitly.

    The GFDL text corpus is also quite large and somewhat more robust, although under the influence of several enemy projects it is becoming less robust, and more likely to reflect the viewpoints of those projects, and not those who actually understand the subjects. Contributions to the R&D wiki are part of this corpus, but, that does not mean that other contributions to the GFDL corpus, notably not those to Wikipedia or Disinfopedia, should be taken as equivalently credible to those that are made via the Consumerium interface. As it stands, this is the most credible of the various mediawiki front ends as it has the least autocratic and overtly politically biased editorial policy - most likely due to its value system which promotes transparency and accountability, as opposed to being a "hobby" or "lobbying" project. In time, the standards of the GFDL text corpus will likely be set here, not in the larger/other projects, which will become less trustworthy over time.

    The Creative Commons text corpus is also quite large, and some think it may eventually eclipse the GFDL - if so it is the editorial standards of the CC, not those of the GFDL, that should succeed, as the latter are rigorously defined by Lawrence Lessig and others with some integrity that the operators of GFDL editing sites simply cannot claim. Wikitravel for instance uses both the mediawiki user interface and the CC license.