Tamil Wikipedia: A case study
The author thanks Natkeeran, Ravishankar, Selvakumar, Karthickbala, and fellow Tamil Wikipedia editors for feedback and other inputs.
Three distinct growth phases have been identified in this case study of Tamil Wikipedia since late 2003. Several distinct characteristics of the Wikipedia and its editors are identified. Outreach efforts and sibling projects are also discused in this study. Challenges and future plans are outlined.
Tamil is a classical language spoken by more than 78 million people across the world with a rich literary tradition spanning millennia. Significant acclaimed Tamil literature has existed for over two thousand years. The Tamil language Wikipedia has 18,021 articles (as of writing), a number of them of good quality. This case study attempts to characterise the Tamil Wikipedia, its editorial team, growth trends, challenges faced, and plans to take it to the next important stage.
Extant Tamil literature consists of works on poetics, philosophy, ethics, grammar, etc. Notable among the early Tamil encyclopaedias were Abidhaanakosam, written by Muthuthambiyaar and published in Jaffna in 1902, and Abidhaana Chindhaamani, a 1050 page work which took 42 years of determined work by Singaravelanar and published in Chennai in the year 1910. Later, a 18-volume encyclopaedia on science and a 15-volume work on humanities were published by the Thanjavur Tamil University, in an intended series of 20 and 15 volumes respectively. The first comprehensive modern encyclopaedia was published from 1954 to 1968 as a 10 volume set. It was a collaborative effort by scholars, philanthropists and the Government of Tamil Nadu. More recently, in 2007, a collection of 28,000 articles from the concise edition of Encyclopaedia Britannica was translated and published in Tamil by Vikatan Publishers.
Tamil Wikipedia was started on September 30, 2003 by an anonymous person by posting a link to their Yahoo! Group and the text manitha maembaadu (மனித மேம்பாடு), fittingly, a phrase that means human development, on the Main Page. However, for several weeks after that, the site had an all-English interface with little activity. Mayooranathan, in response to a request posted in a mailing list, completed 95% of the localisation between November 4, 2003 and November 22, 2003. He made some anonymous edits alongside. On November 12, 2003 Amala Singh from the United Kingdom wrote the first article in Tamil, but with an English title Shirin Ebadi. The earliest editor who continues to edit actively, Mayooranathan, has written more than 2760 articles and has kept the project alive during an intervening period when practically nobody else was editing. Around five active editors including the author joined the project in the second half of 2004. Some occasional editors turned out to become regular editors and the Wiki started growing steadily. Bugs were reported to fix the interface, policies partially deriving from the English Wikipedia were initiated, and editors started to specialise in tasks like stub sorting, creating templates, copyediting, wikifying, translation, original writing etc. Even at this early stage, the Tamil Wikipedia had a global editorial team representing almost every continent.
After registering a period of high linear growth in several metrics on a lower base, the Tamil Wikipedia started witnessing, around April 2007, a low linear growth on a higher base in several quantitative metrics. This period, however, also showed a perceivably super-linear growth in article quality aspects like length, standard of prose, image use, inline citation usage, etc. Late 2008 to early 2009 was a period characterised by a near constant number of active and very active editors, a steady influx of new and occasional editors, a healthy, enthusiastic and continuity-preserving churn, and, above all, optimism for a promising future.
The three distinct phases noted in the History section are shown in the accompanying chart. The number of very active Wikipedians (not in chart) has also grown well. With the recent workshops and the planned events, we hope to hit a hockey-stick growth phase in the second half of 2009.
The premise behind the hope is the following: a linear growth in active editors results in a super linear growth in number of articles due to accumulative effect. Other metrics like article length etc., might improve at a greater rate. Given this, if the number of active users increases super-linearly due to the recent outreach efforts and the consequent mainstream media attention, content growth will really take on to a higher plane.
Nirojan, from Canada, one of the youngest editors, wrote more than two thousand articles on Tamil films, ancient tamil kings, theatre and drama
Prof. VK, the senior most editor, has so far written 188 articles in Mathematics, Astronomy and Philosophy, contributing from the US and India.
Tamil Wikipedia has had a diverse set of editors from the beginning. Editors came from various disciplines like Architecture, Biotechnology, Economics, Electronics, Information Technology, Mathematics, Music, Social Welfare etc. The editors are from various professions—engineers, scientists, academics, students, administrators, self-employed people, etc. Editors are aged between 15 and 85 years, with a non-uniform but remarkably not power law distribution in between. Educational qualifications and income levels too vary across the spectrum.
More information regarding the profiles of editors as well as visitors to Tamil Wikipedia will come out when the results of the UNU-MERIT survey are published. Based on some available monitoring tools, it has been identified that there are approximately 60,000 page requests each day.
- General cordiality and assumption of good faith among regular editors
- Quality focus from early on (concern about article diversity when Ganeshbot, a bot similar to Rambot of the English Wikipedia, was proposed)
- Early emphasis on citing sources
- Individual editors writing full-length articles later copyedited by others
- Specialist roles chosen by editors even when a handful of editors were actively editing
- 'In the news' and 'Selected anniversaries' sections meticulously updated, almost on a daily basis, by a dedicated user
- Several topics, on diverse areas, are being covered for the first time in Tamil. Tamil Wikipedia editors endeavour to attain currency of knowledge, by writing articles on topics that are emerging in science, technology, politics etc. As is customary, especially in agglutinative languages, suitable terminologies are coined as needed from existing words and roots.
- In English Wikipedia, the primary and nearly the singular motivation for editors, is to document and spread knowledge. English as a medium is incidental. However, in the case of Tamil Wikipedia, most of the editors view this as a way to spread precious knowledge in Tamil. Many editors are motivated for being able to enrich the modern Tamil corpus, by adding quality content in Tamil.
- Low internet penetration among the majority of the population
- Low awareness about Tamil typing tools
- Low awareness about Tamil Wikipedia
- Less than 2% editors female
- Disconnect between skilled writers and internet access
- Still not reached critical mass of tech-savvy editors who can fix interface issues
Except a small initiative to display Wikipedia badges in blogs in late 2004, and one instance of media outreach, there have not been any planned activities to bring more readers and editors to Tamil Wikipedia. But, from the beginning of 2009, three workshops were organised by Wikipedians during which the participants were introduced to the Tamil Wikipedia, explained about its philosophy and usefullness, and tutored on typing in Tamil and basic editing. Half a dozen introductory talks were delivered in meetups of other groups. These have been conducted in colleges including the prestigious Indian Institute of Science, workplaces, and special interest clubs. These workshops and talks have shown a good impact by way of bringing new active editors from various backgrounds.
Based on the feedback from each workshop the following have been observed:
- Tutor-learner ratio should be around 1:5 for useful practical training. Having multiple tutors handling different aspects of editing is helpful.
- A classroom is good, a computer science lab environment is better.
- Asking some uninitiated person from the audience to come forward and edit is a good approach--convinces others about ease of use, gives feedback to the tutors about difficulties faced by new editors.
- If a remote editor leaves a message of appreciation at the new user's talk page as soon as they make the first trial edit, it encourages them a lot.
- Articles to cite as examples should be picked based on audience composition.
- Emailing all those who attended, thanking them as well as inviting them to edit, leads to more conversions.
- In the Indian Wikipedia context, the first session after introduction should be about typing in the Indian language concerned.
Following is the agenda of a typical workshop:
- Introductions by the host and the Tamil Wikipedia member who acted as an interface with the host
- A short presentation on what Wikipedia is, its history, philosophies, software, etc.,
- A tutorial on Tamil typing tools
- Tea break
- Tutorial on editing through someone from the audience. The newbie picks the topic and content.
- Q & A session
Other Tamil Wiki projects are Wiktionary, Wikinews, Wikisource, Wikibooks, and Wikiquotes. However, Tamil Wiktionary is the one project that has matured and grown well. Mainly seeded by an automated bot adding entries from technical dictionaries, the Tamil Wiktionary reached more than 1,00,000 entries and was featured on the main Wiktionary page for sometime. It has attracted more editors since then, and, at this stage, its sustenance and future growth is guaranteed. Tamil, with a long and rich literary tradition, has numerous public domain works available. Because of this, there is ample scope for Wikisource to grow. The other Tamil Wiki projects are still in bootstrapping stage and there is also some new-found interest in starting a Wikispecies project in Tamil as well.
|Language||Off count||> 200 Char||Mean bytes||Length 0.5K||Length 2K||Size||Words||Images|
|Tamil||16 k||16 k||1619||81%||21%||74 MB||3.0 M||3.0 k|
|Bengali||19 k||12 k||1113||49%||11%||61 MB||3.1 M||8.5 k|
|Marathi||21 k||6.4 k||623||20%||5%||44 MB||1.8 M||0.769K|
|Telugu||42 k||13 k||578||16%||5%||64 MB||3.0 M||2.6 k|
|Hindi||24 k||14 k||1128||35%||11%||76 MB||4.6 M||1.4 k|
|Malayalam||8.3 k||7.8 k||2425||78%||30%||58 MB||2.1 M||5.4 k|
|Kannada||6.1 k||5.3 k||1282||53%||14%||23 MB||0.965M||0.211K|
|Table showing comparison of top Indian language Wikipedias (as of Nov 2008) |
Tamil and Malayalam Wikipedias top the quality metrics. Tamil Wikipedians monitor the changes regularly.
- firming up policies and guidelines
- media outreach
- bringing out an offline collection of wiki articles
- The 28,000 articles in the Tamil edition of the concise Britannica, currently being sold in the market, are of stub-quality. A collection of 5,000 selected articles from Tamil Wikipedia, published after manual perusal, will definitely have a number of takers. In fact, a collection of wildlife articles for school children and an assorted collection of good articles given to scientific research students have been well-received.
- liaising with the Indian Wikimedia Chapter being formed and other bodies
- conducting article-writing contests, local conferences, etc.,
A case study on Tamil Wikipedia has revealed 3 distinct growth phases so far. Important characterisations of the editors as well as the Wiki itself has been made. Main problems coming in the way of its growth have been identified and future plans are outlined. Conducting similar studies on other language Wikipedias that are in a similar phase of growth could reveal commonalities as well as distinct characteristics.
- ↑Kamil V. Zvelebil (1992). Companion Studies to the History of Tamil Literature. BRILL Academic. பக். 12. ISBN 9004093656. "p12 - ...the most acceptable periodisation which has so far been suggested for the development of Tamil writing seems to me to be that of A Chidambaranatha Chettiar (1907–1967): 1. Sangam Literature - 200BC to AD 200; 2. Post Sangam literature - AD 200 - AD 600; 3. Early Medieval literature - AD 600 to AD 1200; 4. Later Medieval literature - AD 1200 to AD 1800; 5. Pre-Modern literature - AD 1800 to 1900..."
- ↑Abidhaanakosam in the Noolaham archive
- ↑Author Jeyamohan on Abidhaana Chindhaamani
- ↑Ma. Po. Sivagnanam. 1978 The history of Tamil Development after (Indian) independence. Chennai: Poongodi Publications.
- ↑"Karunanidhi releases Encyclopaedia Brittanica in Tamil". The Hindu. 2007-04-29. http://www.hindu.com/2007/04/29/stories/2007042902840300.htm. பார்த்த நாள்: 2009-05.
- ↑The article titled in English was moved to the Tamil title, and the redirect page was subsequently deleted. It has been recently restored for the record.
- ↑Karthikeyan, a school student from Singapore, wrote several articles on herbs from this user account and anonymously prior to that.
- ↑Möller, Erik (2008-10-24). "Multilingual Wikipedia Survey Launched". Wikimedia Foundation. பார்த்த நாள் 2009-04-16.
- ↑Tamil Wikipedia quality monitor
- ↑"Wikipedia discussion prior to bot approval". பார்த்த நாள் 2009-04-16.
- ↑Citation guidelines
- ↑"Articles using "Cite journal" template". பார்த்த நாள் 2009-04-16.
- ↑Kanags maintains these two sections
- ↑Homepage for workshops
- ↑Details of the workshop held at the IISc
- ↑"Wikipedia Academy in Bangalore". My Bangalore. 2009-02-05. http://mybangalore.com/article/wikipedia-academy-in-bangalore.html. பார்த்த நாள்: 2009-04-25.
- ↑SundarBot project page
- ↑Booklet given to participants of the workshop held at the Indian Institute of Science
A paragraph (from the Ancient Greek παράγραφος paragraphos, "to write beside" or "written beside") is a self-contained unit of a discourse in writing dealing with a particular point or idea. Though not required by the syntax of any language, paragraphs are usually an expected part of formal writing, used to organize longer prose.
The oldest classical Greek and Latin writing had little or no space between words and could be written in boustrophedon (alternating directions). Over time, text direction (left to right) became standardized, and word dividers and terminal punctuation became common. The first way to divide sentences into groups was the original paragraphos, similar to an underscore at the beginning of the new group. The Greek paragraphos evolved into the pilcrow (¶), which in English manuscripts in the Middle Ages can be seen inserted inline between sentences. The hedera leaf (e.g. ☙) has also been used in the same way.
In ancient manuscripts, another means to divide sentences into paragraphs was a line break (newline) followed by an initial at the beginning of the next paragraph. An initial is an oversized capital letter, sometimes outdented beyond the margin of the text. This style can be seen, for example, in the original Old English manuscript of Beowulf. Outdenting is still used in English typography, though not commonly. Modern English typography usually indicates a new paragraph by indenting the first line. This style can be seen in the (handwritten) United States Constitution from 1787. For additional ornamentation, a hedera leaf or other symbol can be added to the inter-paragraph whitespace, or put in the indentation space.
A second common modern English style is to use no indenting, but add vertical white space to create "block paragraphs." On a typewriter, a double carriage return produces a blank line for this purpose; professional typesetters (or word processing software) may put in an arbitrary vertical space by adjusting leading. This style is very common in electronic formats, such as on the World Wide Web and email.
Widows and orphans occur when the first line of a paragraph is the last line in a column or page, or when the last line of a paragraph is the first line of a new column or page.
A recent trendy idea in English is not to indent the first paragraph, but indent those that follow. For example, Robert Bringhurst states that we should "Set opening paragraphs flush left." Bringhurst explains as follows:
The function of a paragraph is to mark a pause, setting the paragraph apart from what precedes it. If a paragraph is preceded by a title or subhead, the indent is superfluous and can therefore be omitted.
The Elements of Typographic Style states that "at least one en [space]" should be used to indent paragraphs after the first, noting that that is the "practical minimum". An em space is the most commonly used paragraph indent.Miles Tinker, in his book Legibility of Print, concluded that indenting the first line of paragraphs increases readability by 7%, on the average.
See also: Newline
In word processing and desktop publishing, a hard return or paragraph break indicates a new paragraph, to be distinguished from the soft return at the end of a line internal to a paragraph. This distinction allows word wrap to automatically re-flow text as it is edited, without losing paragraph breaks. The software may apply vertical whitespace or indenting at paragraph breaks, depending on the selected style.
How such documents are actually stored depends on the file format. For example, HTML uses the <p> tag as a paragraph container. In plaintext files, there are two common formats. Pre-formatted text will have a newline at the end of every physical line, and two newlines at the end of a paragraph, creating a blank line. An alternative is to only put newlines at the end of each paragraph, and leave word wrapping up to the application that displays or processes the text.
A line break that is inserted manually, and preserved when re-flowing, may still be distinct from a paragraph break, although this is typically not done in prose. HTML's <br /> tag produces a line break without ending the paragraph; the W3C recommends using it only to separate lines of verse (where each "paragraph" is a stanza), or in a street address.
See also: Newline
Paragraphs are commonly numbered using the decimal system, where (in books) the integral part of the decimal represents the number of the chapter and the fractional parts are arranged in each chapter in order of magnitude. Thus in Whittaker and Watson's 1921 A Course of Modern Analysis, chapter 9 is devoted to Fourier Series; within that chapter §9.6 introduces Riemann's theory, the following section §9.61 treats an associated function, following §9.62 some properties of that function, following §9.621 a related lemma, while §9.63 introduces Riemann's main theorem, and so on. Whittaker and Watson attribute this system of numbering to Giuseppe Peano on their "Contents" page, although this attribution does not seem to be widely credited elsewhere.
Main article: Section (typography)
Many published books use a device to separate certain paragraphs further when there is a change of scene or time. This extra space, especially when co-occurring at a page or section break, may contain an asterisk, three asterisks, a special stylistic dingbat, or a special symbol known as an asterism.
Purpose and style advice
A common English usage misconception is that a paragraph has three to five sentences; single-word paragraphs can be seen in some professional writing, and journalists often use single-sentence paragraphs.
The crafting of clear, coherent paragraphs is the subject of considerable stylistic debate. Forms generally vary among types of writing. For example, newspapers, scientific journals, and fictional essays have somewhat different conventions for the placement of paragraph breaks.
English students are sometimes taught that a paragraph should have a topic sentence or "main idea", preferably first, and multiple "supporting" or "detail" sentences which explain or supply evidence. One technique of this type, intended for essay writing, is known as the Schaffer paragraph. For example, the following excerpt from Dr. Samuel Johnson's Lives of the English Poets, the first sentence is the main idea: that Joseph Addison is a skilled "describer of life and manners". The succeeding sentences are details that support and explain the main idea in a specific way.
As a describer of life and manners, he must be allowed to stand perhaps the first of the first rank. His humour, which, as Steele observes, is peculiar to himself, is so happily diffused as to give the grace of novelty to domestic scenes and daily occurrences. He never "o'ersteps the modesty of nature," nor raises merriment or wonder by the violation of truth. His figures neither divert by distortion nor amaze by aggravation. He copies life with so much fidelity that he can be hardly said to invent; yet his exhibitions have an air so much original, that it is difficult to suppose them not merely the product of imagination.
This advice differs from stock advice for the construction of paragraphs in Japanese (translated as danraku 段落).
- The American Heritage Dictionary of the English Language. 4th ed. New York: Houghton Mifflin, 2000.
- Johnson, Samuel. Lives of the Poets: Addison, Savage, etc.. Project Gutenberg, November 2003. E-Book, #4673.
- Rozakis, Laurie E. Master the AP English Language and Composition Test. Lawrenceville, NJ: Peterson's, 2000. ISBN 0-7645-6184-7 (10). ISBN 978-0-7645-6184-9 (13).
- The dictionary definition of paragraph at Wiktionary