Template talk:New articles

This section strikes me as a rather important. The title suggests that it is provided for in some kind of automatic way, whereas we know that it must be changed by hand. I hope that users will keep this section constantly in motion, and not be shy about adding new contributions. If a user clicks on an item that is disappointing, fine. Perhaps he or she will feel inspiration to add to it --Jatucker 15:41, 14 November 2010 (CST)
 * I agree. I've edited the link title at the bottom right, and will directly encourage users who create articles to add their articles to this box as I see them. --Forgottenman (talk) 17:12, 14 November 2010 (CST)
 * I would agree with important, as do at least three recent vandal editors. Do you think it is time to protect this page? Pestergaines 05:01, 21 November 2010 (CST)
 * I'd prefer not to, if possible. I'm not sure what the threshold is for "new" users... on WP I think it's four days or so but I'm not sure here.  I'd rather be able to encourage new users to add their texts here, without having to worry about that restriction. --Forgottenman (talk) 13:28, 23 November 2010 (CST)
 * This is getting rather ridiculous. I've protected the page to prevent anonymous users and new users from editing.  We need to find out what the definition is for "new user". --Forgottenman (talk) 15:49, 27 November 2010 (CST)

Automation
It's probably possible to automate this. The code to automate a simple bulleted list of the newest articles is already available, and I've deployed it on several of my wikis; see mw:Extension:Recent Pages. However, I think we would want more advanced functionality; specifically, the first sentence of the article should be displayed, much like what we have made available manually at Template:New articles. What I have in mind is an extension to go through each of the new pages and search for and display that first sentence: Nathan Larson (talk) 23:34, 5 October 2012 (MSD)
 * Skip past all opening templates (e.g. infoboxes).
 * A cue that the sentence is over would be a punctuation mark followed by a whitespace character.
 * Wikilink the first item that is bolded in that first sentence; that's the page title.
 * If no suitable sentence is found, ignore that page.
 * Unfortunately, it looks like it could be technically difficult. If you google around, you'll find that a lot of people have tried to tackle this, without finding a perfect solution. Supposedly we might be able to get the error rate down to .25% using Splitta or Tactful Tokenizer or the like. The problem was summed up: "There is no simple trick. To do this properly, you need to do a syntactic analysis of the text. Nobody can do that. At least not yet. At least not 100% of the time. Mainly because it also entails a semantic analysis of the text. You see, contrary to what the type of linguists that taught you grammar in school think, what makes up a sentence is pretty hard to sum up in a set of rules a computer could follow without understanding the text."
 * There are all sorts of situations in which a punctuation mark followed by whitespace is not the end of sentence, such as when there are ellipses in the middle or abbreviations such as "e.g." or "Mr." or "Murray N. Rothbard" or the like.
 * Some first sentences will end not with a space but with a reference. I suppose we could parse to get rid of those from the string before interpreting it.
 * Now if we didn't mind having it show up as a feed that resembles, say, Special:Newpages (except a bit more aesthetically appealing and without all the "created page with..." openings), or a google search listing, both of which break off results in the middle of sentences, we'd be okay. It all depends on what we want. Clearly we don't yet have the labor to keep this page updated manually, so it's a question of whether we want to let it go stale, or use an automated solution that won't give up exactly what we have now.


 * Then of course there's the simple fact that any new page and its first sentence, regardless of quality, would immediately appear on the main page. Usually the main page is for content that's been thoroughly reviewed to make sure it's ready for prime time. It would be great for showing people the sort of content that our new articles tend to consist of, for good or for bad. Nathan Larson (talk) 13:41, 6 October 2012 (MSD)