GFDL corpus

From Consumerium development wiki R&D Wiki
Jump to navigation Jump to search

The GFDL corpus is the body of all material licensed under GFDL. It includes at least the GFDL text corpus material made available by:

A GFDL corpus access provider permits retrieval of, and editing of, this material. There are numerous requirements for these due to the GFDL itself - covered in that article.

There are some proponents of a true unified GFDL Corpus with a single set of editing and forking and reintegration rules. They claim that this would require bypassing organizations like Wikimedia, and perhaps bringing in more ethical players like FSF and an independent board for the corpus itself.

The Consumerium Governance Organization will probably need to take some interest in this, as it is not going to be possible to integrate input from all the above without some way of making who believes what, why. The Research Wiki may or may not be part of the corpus. In any case it will have to define itself as being wholly independent of any Wikimedia interference or harassment.


I do feel strongly that the purpose of the Wikipedia community is to build an open content encyclopedia, I don't know what it would mean for it to be the other way around. I thought that the intent of the project was to build a free and open encyclopedia, that would not be 'owned' by anyone in a restrictive sense. The fact that the license allows use by others and forks means that the total amount of free and open encyclopedia that can be built is bigger than Wikipedia itself, also, the GFDL material contributed by Internet Encyclopedia, WikiTravel*, WikiQuote and Wiktionary to name but a few are part of the GFDL corpus, but not Wikipedia. I don't think that to point that out is disrespectful at all. Perhaps I was misunderstood.

* Note: Wikitravel is not GFDL but CC-SA. - Huttite

I don't really understand your points about the relationship between the GFDL text and the community and software. Props are due to the founders, their vision, the contributors and all involved, but the fact is that when someone presses 'submit', they maintain copyright over the material, and grant generous terms of use to anyone who wants to use them under those terms. That includes, but is not exclusive to, Wikipedia. Wikipedia is an effective and good way to build this corpus of open and free content, and it is a great front end to edit and view it, it is also an excellent group of people who do this, but it is not the same thing as the material licensed under the GDFL, which is not licensed exclusively to it (that's what 'free' means). I don't understand how pointing out this fact about the license is disrespectful or an afront. It's there in black and white, and it's not a bad thing. Mark Richards 02:15, 5 Jul 2004 (UTC)

Mark Richards 16:39, 4 Jul 2004 (UTC)

No time for a careful essay at the moment, but in short:

There is much released under GFDL, and a good deal of it is not encyclopedia articles. In particular, documentation and books for FSF-related software are generally released under GFDL. It is a disjoint set of materials authored under widely varying circumstances. Referring to this as a "corpus," which, I believe, is latin for "body," is misleading because it implies a degree of cohesion that is simply not present. In particular, there is no common indexing, no unified means of access, and no uniform standards of authorship and editorial review. Wikipedia (the content), on the other hand, is deserving of distinct treatment, because it does have a method of authorship and review that, in practice, provides some degree of uniformity of quality and style. Projects like Fred Bauder's (internet-encyclopedia), though they have some parallel content, diverge from Wikipedia in important ways. Those who offer read-only access to a Wikipedia snapshot are not "GFDL text corpus access providers" because they do not provide access to the totality of material released under said license. They are best described as Wikipedia mirrors, because they mirror content from Wikipedia in particular, and not, generally, content from other sources. While we cannot and do not wish to force the rest of the world to do so, it is appropriate for us to give due credit to Wikipedia (the project) for producing Wikipedia (the content). The effort by so many hands at building the community and the encyclopedia in ways that mutually reinforce is considerable. The articles are not mere aggregations of individually authored prose. UninvitedCompany 04:15, 5 Jul 2004 (UTC)

For sure - I agree with you. We should absolutely give credit to Wikipedia for producing a lot of good content. I really don't think I am suggesting that we don't. I agree, using the term to refer to ALL GFDL materials wouldn't be very useful very often, but using 'GFDL encyclopedia/dictionary corpus' to refer to the text and images generated by Internet Encyclopedia, Wikipedia and Wiktionary could be, and it wouldn't be disrespectful to any. There is no common indexing, no unifed means of access, no uniform means of access, but that isn't the point, the point is that one could download all of this material, print it out, and publish it as 'printopedia'. What would you call the text and images? Well, it seems like GFDL encyclopedia corpus would be a useful term, since it isn't Wikipedia (although, of course, it would be appropriate to give credit to Wikipedia for parts of it). What you call mirrors are offering access of a kind to some of the GFDL corpus - if they allowed edits on their own dumps then they would be forks. They are mirrors in a sense, because they regularly update, but they are also GFDL text corpus access providers. I agree that they are next to useless, and pretty irritating, but Wikipedia does not give access to the entire GFDL encyclopedia corpus either (Internet Encyclopedia, Wikiquote, Wikitravel etc). Using the term GFDL corpus emphasises the freedom of the text, and the ability of people to use it under broad terms, and is inclusive of other projects contributions to this corpus. I'm not belittling the contribution by Wikipedia, I'm not suggesting that you should use the term to refer to the Wikipedia project, just that there are times when one is talking about just the material generated, and not only the material generated by Wikipedia. Mark Richards 15:59, 5 Jul 2004 (UTC)