Analyzing natural human language from the point of view of dynamic of a complex network

dc.contributor.authorWachs-Lopes G.A.
dc.contributor.authorRodrigues P.S.
dc.date.accessioned2019-08-17T20:00:30Z
dc.date.available2019-08-17T20:00:30Z
dc.date.issued2016
dc.description.abstract© 2015 Elsevier Ltd.With increasing amount of information, mainly due to the explosive growth of Internet, the demand for applications of automatic text analysis has also grown. One of the tools that has increased in importance in the understanding of problems related to this area are complex networks. This tool merges graph theory and statistical methods for modeling important problems. In several research fields, complex networks are studied from the various points of view, such as: topology of networks, extraction of physical features and statistics, specific applications, comparison of metrics and study of physical phenomena. Linguistic is one area that has received great attention, particularly due to its close relationship with issues arising from the emergence of large text databases. Thus, many studies have emerged for modeling of complex networks in this area, increasing the demand for efficient algorithms for feature extraction, network dynamic observation and comparison of behavior for different types of languages. Some works for specific languages such as English, Chinese, French, Spanish, Russian and Arabic, have discussed the semantic aspects of these languages. On the other hand, as an important feature of a network we can highlight the computation of average clustering coefficient. This measure has a physical impact on the network topology studies and consequently on the conclusions about the semantics of a language. However its computational time is of O(n3), making its computing prohibitive for large current databases. This paper presents as main contribution a modeling of two complex networks: the first one, in English, is constructed from a specific medical database; the second, in Portuguese, from a journalistic manually annotated database. Our paper then presents the study of the dynamics of these two networks. We show their small-world behavior and the influence of hubs, suggesting that these databases have a high degree of Modularity, indicating specific contexts of words. Also, a method for efficient clustering coefficient computation is presented, and can be applied to large current databases. Other features such as fraction of reciprocal connections and average connection density are also calculated and discussed for both networks.
dc.description.abstractalternativeWith increasing amount of information, mainly due to the explosive growth of Internet, the demand for applications of automatic text analysis has also grown. One of the tools that has increased in importance in the understanding of problems related to this area are complex networks. This tool merges graph theory and statistical methods for modeling important problems. In several research fields, complex networks are studied from the various points of view, such as: topology of networks, extraction of physical features and statistics, specific applications, comparison of metrics and study of physical phenomena. Linguistic is one area that has received great attention, particularly due to its close relationship with issues arising from the emergence of large text databases. Thus, many studies have emerged for modeling of complex networks in this area, increasing the demand for efficient algorithms for feature extraction, network dynamic observation and comparison of behavior for different types of languages. Some works for specific languages such as English, Chinese, French, Spanish, Russian and Arabic, have discussed the semantic aspects of these languages. On the other hand, as an important feature of a network we can highlight the computation of average clustering coefficient. This measure has a physical impact on the network topology studies and consequently on the conclusions about the semantics of a language. However its computational time is of O(n3), making its computing prohibitive for large current databases. This paper presents as main contribution a modeling of two complex networks: the first one, in English, is constructed from a specific medical database; the second, in Portuguese, from a journalistic manually annotated database. Our paper then presents the study of the dynamics of these two networks. We show their small-world behavior and the influence of hubs, suggesting that these databases have a high degree of Modularity, indicating specific contexts of words. Also, a method for efficient clustering coefficient computation is presented, and can be applied to large current databases. Other features such as fraction of reciprocal connections and average connection density are also calculated and discussed for both networks.en
dc.description.firstpage8
dc.description.issuenumber1
dc.description.lastpage22
dc.description.volume45
dc.identifier.citationLOPES, Guilherme; RODRIGUES, Paulo. Analyzing natural human language from the point of view of dynamic of a complex network. Expert Systems with Applications, v. 45, n.1, p. 8-22, 2015.
dc.identifier.doi10.1016/j.eswa.2015.09.020
dc.identifier.issn0957-4174
dc.identifier.urihttps://repositorio.fei.edu.br/handle/FEI/1009
dc.identifier.urlhttps://doi.org/10.1016/j.eswa.2015.09.020
dc.relation.ispartofExpert Systems with Applications
dc.rightsAcesso Restrito
dc.subject.otherlanguageClustering coefficient
dc.subject.otherlanguageComplex networks
dc.subject.otherlanguagePhysical measures
dc.subject.otherlanguageTextual information retrieval
dc.titleAnalyzing natural human language from the point of view of dynamic of a complex network
dc.typeArtigo
fei.scopus.citations33
fei.scopus.eid2-s2.0-84944474694
fei.scopus.subjectAmount of information
fei.scopus.subjectAnnotated database
fei.scopus.subjectClustering coefficient
fei.scopus.subjectComputational time
fei.scopus.subjectImportant features
fei.scopus.subjectPhysical measures
fei.scopus.subjectPhysical phenomena
fei.scopus.subjectSpecific languages
fei.scopus.updated2024-11-01
fei.scopus.urlhttps://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=84944474694&origin=inward
Arquivos
Coleções