Evaluation Category Enter Your Score
Appropriateness (1-5)
Does this paper fit in NAACL HLT?

The focus of NAACL HLT is all areas of natural language and computational linguistics. This includes research aiming to enable intelligent systems to interact with humans using natural language, to understand computational and other linguistic properties of language, and to enhance human-human communication through services such as speech recognition, automatic translation, information retrieval, text summarization, and information extraction. Both empirical and theoretical results are welcome; see the Call for Papers.)
  • 5 = Appropriate for NAACL HLT. (Most submissions)
  • 4 = Computational linguistics/NLP, IR or Speech though not typical NAACL HLT material.
  • 3 = Possibly relevant to the audience, though it's not quite computational linguistics/NLP, IR or Speech.
  • 2 = Only marginally relevant.
  • 1 = Inappropriate.
Originality/Innovativeness (1-5)
How original is the approach or problem presented in this paper? Does this paper break new ground in topic, methodology, or content? How exciting and innovative is the research it describes? (Note that a paper could score high for originality even if the results did not show a convincing benefit. A low score here does not necessarily imply a low overall score in the case of some short submissions.)
  • 5 = Surprising: Noteworthy new problem, technique, methodology, or insight.
  • 4 = Creative: Relatively few people in our community would have put these ideas together.
  • 3 = Somewhat conventional: A number of people could have come up with this if they thought about it for a while.
  • 2 = Rather uninspiring: Obvious, or a minor improvement on familiar techniques.
  • 1 = Significant portions have actually been done before or done better. (Be sure to include references justifying this rating.)
Soundness/Correctness (1-5)
Is the technical approach sound and well-chosen? Second, can one trust the claims of the paper -- are they supported by proper experiments, proofs, or other argumentation?
  • 5 = The approach is very apt, and the claims are convincingly supported.
  • 4 = Generally solid work, though I have a few suggestions about how to strengthen the technical approach or evaluation.
  • 3 = Fairly reasonable work. The approach is not bad, and at least the main claims are probably correct, but I am not entirely ready to accept them (based on the material in the paper).
  • 2 = Troublesome. There are some ideas worth salvaging here, but the work should really have been done or evaluated differently, or justified better.
  • 1 = Fatally flawed.
Impact of Ideas/Results (1-5)
How significant is the work described? If the ideas are novel, will they also be useful or inspirational? If the results are sound, are they also important?
  • 5 = Will affect the field by altering other people's choice of research topics or basic approach.
  • 4 = Some of the ideas or results will substantially help other people's ongoing research.
  • 3 = Interesting but not too influential. The work will be read and used, but mainly for comparison or as a source of minor contributions.
  • 2 = Marginally interesting. May or may not be read or used.
  • 1 = Will have no impact on the field, or will have negative impact.
Meaningful Comparison (1-5)
Are any experimental results meaningfully compared with appropriate prior approaches or other baselines? Do the authors place their work well with respect to prior work? Do the authors do a good job of explaining why their method is better or worse than alternatives, and in what circumstances?
  • 5 = Comparison to prior approaches and baselines are superbly carried out.
  • 4 = Comparisons are mostly solid, and I have a reasonable idea of limitations, but I have some suggestions for alternative approaches or baselines.
  • 3 = Comparisons are somewhat helpful, but it could be hard for a reader to determine exactly how this work relates to previous work.
  • 2 = This paper does not sufficiently compare to alternatives, or has an uninformative comparison.
  • 1 = Little awareness of related work, or lacks necessary empirical comparison.
Thoroughness (1-5)
Does this paper have enough substance, or would it benefit from more ideas or results? Do the authors identify potential limitations of their work? (Note that this question mainly concerns the amount of work; its quality is evaluated in other categories.
  • 5 = Contains more ideas or results than most publications of this length at NAACL and is clear about potential limitations
  • 4 = Represents an appropriate amount of content for a NAACL paper of this length (most submissions).
  • 3 = Leaves open one or two natural questions that should have been pursued within the paper.
  • 2 = Work in progress. There are enough good ideas, but perhaps not enough results yet.
  • 1 = Seems thin. Not enough ideas here for a full-length/short paper.
Replicability (1-5)
Will members of the NAACL community be able to reproduce or verify the results described in this paper? A lower score might be assigned if an insufficient amount of detail has been provided, if there is a highly subjective component to the setting of certain parameters, or if proprietary data have been used in the experiments.

Members of the ACL community...
  • 5 = could easily reproduce the results and verify the correctness of the results described here.
  • 4 = could mostly reproduce the results described here, although there may be some variation because of sample variance or minor variations in their interpretation of the protocol or method.
  • 3 = could possibly reproduce the results described here with some difficulty. The settings of parameters are underspecified or very subjectively determined; the training/evaluation data required are not widely available.
  • 2 = could not reproduce the results described here no matter how hard they tried. The author simply has not provided a sufficient amount of detail nor access to resources for us to do anything more than accept their conclusions without question.
  • 1 = not applicable (please use this very sparingly, such as for short submissions that are opinion pieces).
A low score here does not necessarily imply a low overall recommendation.
Clarity (1-5)
For the reasonably well-prepared reader, was the paper sufficiently clear to understand what was done and why? (Keep in mind that minor writing issues can be fixed before final submissions.) Is the paper well-written and well-structured, need some cleaning or additional examples/pictures?
  • 5 = Readers would have no problem understanding what was done and why.
  • 4 = Most readers be able to understand what was done and why.
  • 3 = Most readers would understand this paper with some effort.
  • 2 = Important questions were hard to resolve even with effort.
  • 1 = Much of the paper is confusing.
Overall (1-5)
Will people learn a lot by reading this paper or seeing it presented? There are many good submissions to NAACL HLT 2013; how important is it to feature this one?

In deciding on your ultimate recommendation, please think over all your scores above. But remember that no paper is perfect, and remember that we want a conference full of interesting, diverse, and timely work. If a paper has some weaknesses, but you really got a lot out of it, feel free to fight for it. If a paper is solid but you could live without it, let us know that you're ambivalent. Remember also that the author has a couple of weeks to address reviewer comments before the camera-ready deadline.

Please do take the length of the submission into account. Rank short submissions relative to other short submissions, and full-length submissions relative to other full-length submissions. Acceptable short submissions include small, focused contributions, works in progress, negative results, opinion pieces and interesting application notes.
  • 5 = This paper changed my thinking on its topic and I would fight for it to be accepted.
  • 4 = I learned a lot from this paper and would like for it to be accepted.
  • 3 = Borderline: I'm ambivalent about this one.
  • 2 = Mediocre: I'd rather not see it in the conference.
  • 1 = Poor: I'd fight to have it rejected.
Confidence (1-5)
  • 5 = Positive that my evaluation is correct. I read the paper very carefully and am very familiar with related work.
  • 4 = Quite sure. I tried to check the important points carefully, and checked for uncited prior work. It's unlikely, though conceivable, that I missed something that should affect my ratings.
  • 3 = Pretty sure, but there's a chance I missed something. Although I have a good feel for this area in general, I did not carefully check the paper's details, e.g., the math, experimental design, or novelty.
  • 2 = Willing to defend evaluation, but it is fairly likely that I missed some details, didn't understand some central points, or can't be sure about the novelty of the work.
  • 1 = Not my area, or paper is very hard to understand. My evaluation is just an educated guess.
Suggested Presentation Mode
Papers at NAACL HLT can be presented either as poster or as oral presentation, depending on what is most likely to be beneficial to convey its ideas to its audence. If this paper were selected for presentation, which form of presentation would you find more appropriate (this is not a measure of quality)?

A poster is not a second-rate paper. Papers that make good posters make substantial and effective use of graphical elements, provide large amounts of information as results that can be quickly assimilated in poster form, and do not require lengthy, abstract arguments to contextualize or motivate the contribution. Posters are also appropriate for papers that lend themselves better to interactive, one-on-one discussions that an oral presentation cannot provide.
Best Paper?
Choose 'Yes' to nominate this paper for the NAACL-HLT 2013 best paper award. Keep in mind that there will be an award for Best Full-length Paper and an award for Best Short Paper.