Abstract
Menzerath’s law, the tendency of Z (the mean size of the parts) to decrease as X (the number of parts) increases, is found in language, music and genomes. Recently, it has been argued that the presence of the law in genomes is an inevitable consequence of the fact that Z=Y/X, which would imply that Z scales with X as Z∼1/X. That scaling is a very particular case of Menzerath-Altmann law that has been rejected by means of a correlation test between X and Y in genomes, being X the number of chromosomes of a species, Y its genome size in bases and Z the mean chromosome size. Here we review the statistical foundations of that test and consider three non-parametric tests based upon different correlation metrics and one parametric test to evaluate if Z∼1/X in genomes. The most powerful test is a new non-parametric one based upon the correlation ratio, which is able to reject Z∼1/X in nine out of 11 taxonomic groups and detect a borderline group. Rather than a fact, Z∼1/X is a baseline that real genomes do not meet. The view of Menzerath-Altmann law as inevitable is seriously flawed.
Acknowledgments
This article has benefited enormously from the comments of anonymous reviewers. We are grateful to P. Delicado, R. Gavaldà and E. Pons for their valuable mathematical insights. We owe the counterexample showing that uncorrelation does not imply mean independence to P. Delicado. We are also grateful to G. Bel-Enguix and N. Forns for helpful discussions. This work was supported by the grant Iniciació i reincorporació a la recerca from the Universitat Politècnica de Catalunya, the grants BASMATI (TIN2011-27479-C04-03) and OpenMT-2 (TIN2009-14675-C03) from the Spanish Ministry of Science and Innovation and the grant 2/0038/12 from the VEGA funding agency (JM).
References
Altmann, G. (1980): “Prolegomena to Menzerath’s law,” Glottometrika 2, 1–10.Suche in Google Scholar
Baixeries, J., A. Hernández-Fernández and R. Ferrer-i-Cancho (2012): “Random models of Menzerath-Altmann law in genomes,” Biosystems, 107, 167–173.10.1016/j.biosystems.2011.11.010Suche in Google Scholar PubMed
Baixeries, J., A. Hernández-Fernández, N. Forns and R. Ferrer-i-Cancho (2013): “The parameters of Menzerath-Altmann law in genomes,” J. Quant. Linguistics, 20, 94–104.Suche in Google Scholar
Boroda, M. G. and G. Altmann (1991): “Menzerath’s law in musical texts,” Musikometrika, 3, 1–13.Suche in Google Scholar
Cameron, A. C. and P. K. Trivedi (2009): Microeconometrics: methods and applications, Cambridge: Cambridge University Press.Suche in Google Scholar
Conover, W. J. (1999): Practical nonparametric statistics, New York: Wiley, 3rd edition.Suche in Google Scholar
Cramer, I. M. (2005): “The parameters of the Altmann-Menzerath law,” J. Quant. Linguistics, 12, 41–52.Suche in Google Scholar
Crathorne, A. R. (1922): “Calculation of the correlation ratio,” J. Am. Stat. Assoc., 18, 394–396.Suche in Google Scholar
DeGroot, M. H. and M. J. Schervish (2012): Probability and statistics, Boston: Wiley, 4th edition.Suche in Google Scholar
Dunn, M., S. J. Greenhill, S. C. Levinson and R. D. Gray (2011): “Evolved structure of language shows lineage-specific trends in word-order universals,” Nature, 473, 79–82.10.1038/nature09923Suche in Google Scholar PubMed
Ferrer-i-Cancho, R. and N. Forns (2009): “The self-organization of genomes,” Complexity, 15, 34–36.10.1002/cplx.20296Suche in Google Scholar
Ferrer-i-Cancho, R., J. Baixeries, and A. Hernández-Fernández (2013a): “Erratum to ‘Random models of Menzerath-Altmann law in genomes’ (BioSystems 107 (3), 167–173),” Biosystems, 111, 216–217.10.1016/j.biosystems.2013.01.004Suche in Google Scholar
Ferrer-i-Cancho, R., N. Forns, A. Hernández-Fernández, G. Bel-Enguix and J. Baixeries (2013b): “The challenges of statistical patterns of language: the case of Menzerath’s law in genomes,” Complexity, 18, 11–17.10.1002/cplx.21429Suche in Google Scholar
Hernández-Fernández, A., J. Baixeries, N. Forns and R. Ferrer-i-Cancho (2011): “Size of the whole versus number of parts in genomes,” Entropy, 13, 1465–1480.10.3390/e13081465Suche in Google Scholar
Khanin, R. and E. Wit (2006): “How scale-free are biological networks,” J. Comput. Biol., 13, 810–818.Suche in Google Scholar
Kolmogorov, A. N. (1956): Foundations of the theory of probability, New York: Chelsea Publishing Company, 2nd edition.Suche in Google Scholar
Kruskal, W. H. (1958): “Ordinal measures of association,” J. Am. Stat. Assoc., 53, 814–861.Suche in Google Scholar
Li, W. (1992): “Random texts exhibit Zipf’s-law-like word frequency distribution,” IEEE T. Inform. Theory, 38, 1842–1845.Suche in Google Scholar
Li, W. (2012): “Menzerath’s law at the gene-exon level in the human genome,” Complexity, 17, 49–53.10.1002/cplx.20398Suche in Google Scholar
May, R. M. and M. P. H. Stumpf (2000): “Species-area relations in tropical forests,” Science, 290, 2084–2086.10.1126/science.290.5499.2084Suche in Google Scholar PubMed
McCowan, B., L. R. Doyle, J. M. Jenkins and S. F. Hanser (2005): “The appropriate use of Zipf’s law in animal communication studies,” Anim. Behav., 69, F1–F7.10.1016/j.anbehav.2004.09.002Suche in Google Scholar
Menzerath, P. (1954): Die Architektonik des deutschen Wortschatzes, Bonn: Dümmler.Suche in Google Scholar
Miller, G. A. (1968): Introduction. In: The psycho-biology of language: an introduction to dynamic psychology (by G. K. Zipf), Cambridge, MA, USA: MIT Press, v–x.Suche in Google Scholar
Miller, G. A. and N. Chomsky (1963): Finitary models of language users. In: Luce, R. D., Bush, R., and Galanter, E. (Eds.), Handbook of mathematical psychology, volume 2, New York: Wiley, 419–491.Suche in Google Scholar
Poirier, D. J. (1995): Intermediate statistics and econometrics: a comparative approach, Cambridge: MIT Press.Suche in Google Scholar
Ritz, C. and J. C. Streibig (2008): Nonlinear regression with R, New York: Springer.10.1007/978-0-387-09616-2Suche in Google Scholar
Sokal, R. R. and F. J. Rohlf (1995): Biometry. The principles and practice of statistics in biological research, New York: W. H. Freeman and Co., 3rd edition.Suche in Google Scholar
Solé, R. V. (2010): “Genome size, self-organization and DNA’s dark matter,” Complexity, 16, 20–23.10.1002/cplx.20326Suche in Google Scholar
Stumpf, M. P. H. and P. J. Ingram (2005): “Probability models for degree distributions of protein interaction networks,” Europhys. Lett., 71, 152.Suche in Google Scholar
Stumpf, M. P. H. and M. A. Porter (2012): “Critical truths about power laws,” Science, 335, 665–666.10.1126/science.1216142Suche in Google Scholar PubMed
Stumpf, M., P. Ingram, I. Nouvel and C. Wiuf (2005): “Statistical model selection methods applied to biological network data,” Trans. Comp. Syst. Biol., 3, 65–77.Suche in Google Scholar
Suzuki, R., P. L. Tyack, and J. Buck (2005): “The use of Zipf’s law in animal communication analysis,” Anim. Behav., 69, 9–17.Suche in Google Scholar
Tanaka, R., T.-M. Yi and J. Doyle (2005): “Some protein interaction data do not exhibit power law statistics,” FEBS Letters, 579, 5140–5144.10.1016/j.febslet.2005.08.024Suche in Google Scholar PubMed
Teupenhayn, R. and G. Altmann (1984): “Clause length and Menzerath’s law,” Glottometrika, 6, 127–138.Suche in Google Scholar
Tjørve, E. (2003): “Shapes and functions of species-area curves: a review of possible models,” J. Biogeogr., 30, 823–832.Suche in Google Scholar
Wilde, J. and H. Schwibbe (1989): Organizationsformen von Erbinformation Im Hinblick auf die Menzerathsche Regel. In: Altmann, G. and Schwibbe, M. H. (Eds.), Das Menzerathsche Gesetz in informationsverarbeitenden Systemen, Hildesheim: Olms, 92–107.Suche in Google Scholar
Wooldridge, J. M. (2010): Econometric analysis of cross section and panel data, Cambridge: MIT Press.Suche in Google Scholar
Zou, K., K. Tuncali and S. G. Silverman (2003): “Correlation and simpler linear regression,” Radiology, 227, 617–628.10.1148/radiol.2273011499Suche in Google Scholar PubMed
©2014 by De Gruyter
Artikel in diesem Heft
- Frontmatter
- Research Articles
- When is Menzerath-Altmann law mathematically trivial? A new approach
- Covariate adjusted differential variability analysis of DNA methylation with propensity score method
- P-value calibration for multiple testing problems in genomics
- Robust methods to detect disease-genotype association in genetic association studies: calculate p-values using exact conditional enumeration instead of simulated permutations or asymptotic approximations
- Markovianness and conditional independence in annotated bacterial DNA
- Exploring homogeneity of correlation structures of gene expression datasets within and between etiological disease categories
- Corrigendum
- Biological pathway selection through Bayesian integrative modeling
Artikel in diesem Heft
- Frontmatter
- Research Articles
- When is Menzerath-Altmann law mathematically trivial? A new approach
- Covariate adjusted differential variability analysis of DNA methylation with propensity score method
- P-value calibration for multiple testing problems in genomics
- Robust methods to detect disease-genotype association in genetic association studies: calculate p-values using exact conditional enumeration instead of simulated permutations or asymptotic approximations
- Markovianness and conditional independence in annotated bacterial DNA
- Exploring homogeneity of correlation structures of gene expression datasets within and between etiological disease categories
- Corrigendum
- Biological pathway selection through Bayesian integrative modeling