Analyzing natural human language from the point of view of dynamic of a complex network

Wachs-Lopes G.A.; Rodrigues P.S.

Analyzing natural human language from the point of view of dynamic of a complex network

dc.contributor.author	Wachs-Lopes G.A.
dc.contributor.author	Rodrigues P.S.
dc.date.accessioned	2019-08-17T20:00:30Z
dc.date.available	2019-08-17T20:00:30Z
dc.date.issued	2016
dc.description.abstract	© 2015 Elsevier Ltd.With increasing amount of information, mainly due to the explosive growth of Internet, the demand for applications of automatic text analysis has also grown. One of the tools that has increased in importance in the understanding of problems related to this area are complex networks. This tool merges graph theory and statistical methods for modeling important problems. In several research fields, complex networks are studied from the various points of view, such as: topology of networks, extraction of physical features and statistics, specific applications, comparison of metrics and study of physical phenomena. Linguistic is one area that has received great attention, particularly due to its close relationship with issues arising from the emergence of large text databases. Thus, many studies have emerged for modeling of complex networks in this area, increasing the demand for efficient algorithms for feature extraction, network dynamic observation and comparison of behavior for different types of languages. Some works for specific languages such as English, Chinese, French, Spanish, Russian and Arabic, have discussed the semantic aspects of these languages. On the other hand, as an important feature of a network we can highlight the computation of average clustering coefficient. This measure has a physical impact on the network topology studies and consequently on the conclusions about the semantics of a language. However its computational time is of O(n3), making its computing prohibitive for large current databases. This paper presents as main contribution a modeling of two complex networks: the first one, in English, is constructed from a specific medical database; the second, in Portuguese, from a journalistic manually annotated database. Our paper then presents the study of the dynamics of these two networks. We show their small-world behavior and the influence of hubs, suggesting that these databases have a high degree of Modularity, indicating specific contexts of words. Also, a method for efficient clustering coefficient computation is presented, and can be applied to large current databases. Other features such as fraction of reciprocal connections and average connection density are also calculated and discussed for both networks.
dc.description.abstractalternative	With increasing amount of information, mainly due to the explosive growth of Internet, the demand for applications of automatic text analysis has also grown. One of the tools that has increased in importance in the understanding of problems related to this area are complex networks. This tool merges graph theory and statistical methods for modeling important problems. In several research fields, complex networks are studied from the various points of view, such as: topology of networks, extraction of physical features and statistics, specific applications, comparison of metrics and study of physical phenomena. Linguistic is one area that has received great attention, particularly due to its close relationship with issues arising from the emergence of large text databases. Thus, many studies have emerged for modeling of complex networks in this area, increasing the demand for efficient algorithms for feature extraction, network dynamic observation and comparison of behavior for different types of languages. Some works for specific languages such as English, Chinese, French, Spanish, Russian and Arabic, have discussed the semantic aspects of these languages. On the other hand, as an important feature of a network we can highlight the computation of average clustering coefficient. This measure has a physical impact on the network topology studies and consequently on the conclusions about the semantics of a language. However its computational time is of O(n3), making its computing prohibitive for large current databases. This paper presents as main contribution a modeling of two complex networks: the first one, in English, is constructed from a specific medical database; the second, in Portuguese, from a journalistic manually annotated database. Our paper then presents the study of the dynamics of these two networks. We show their small-world behavior and the influence of hubs, suggesting that these databases have a high degree of Modularity, indicating specific contexts of words. Also, a method for efficient clustering coefficient computation is presented, and can be applied to large current databases. Other features such as fraction of reciprocal connections and average connection density are also calculated and discussed for both networks.	en
dc.description.firstpage	8
dc.description.issuenumber	1
dc.description.lastpage	22
dc.description.volume	45
dc.identifier.citation	LOPES, Guilherme; RODRIGUES, Paulo. Analyzing natural human language from the point of view of dynamic of a complex network. Expert Systems with Applications, v. 45, n.1, p. 8-22, 2015.
dc.identifier.doi	10.1016/j.eswa.2015.09.020
dc.identifier.issn	0957-4174
dc.identifier.uri	https://repositorio.fei.edu.br/handle/FEI/1009
dc.identifier.url	https://doi.org/10.1016/j.eswa.2015.09.020
dc.relation.ispartof	Expert Systems with Applications
dc.rights	Acesso Restrito
dc.subject.otherlanguage	Clustering coefficient
dc.subject.otherlanguage	Complex networks
dc.subject.otherlanguage	Physical measures
dc.subject.otherlanguage	Textual information retrieval
dc.title	Analyzing natural human language from the point of view of dynamic of a complex network
dc.type	Artigo
fei.scopus.citations	33
fei.scopus.eid	2-s2.0-84944474694
fei.scopus.subject	Amount of information
fei.scopus.subject	Annotated database
fei.scopus.subject	Clustering coefficient
fei.scopus.subject	Computational time
fei.scopus.subject	Important features
fei.scopus.subject	Physical measures
fei.scopus.subject	Physical phenomena
fei.scopus.subject	Specific languages
fei.scopus.updated	2025-02-01
fei.scopus.url	https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=84944474694&origin=inward

Coleções

Artigos

Analyzing natural human language from the point of view of dynamic of a complex network

Arquivos

Coleções