Browsing Computer Science by Issue Date
Previous Page
Now showing items 21-40 of 40
-
Vujičić Stanković, Staša (Beograd , 2016)[more][less]
Abstract: The basic goal of this doctoral thesis is a research into different techniques and models which are applied in information extraction, and providing an informatic support in processing of natural language texts from culinary and gastronomy domain. Information extraction is a subfield of computational linguistics which includes techniques for natural languages processing, in order to find relevant information, define their meaning and establish relations between them. A very special attention is given to ontology based information extraction. It consists of the following: recognition of instances of ontology concepts in non‐structured or semistructured texts written in natural language, reasoning over the identified instances based on the rules defined in the ontology, as well as recognition of instances and their use for instantiating the proper ontology concepts. The main result of thesis reflects in the presentation of a new model for ontology based information extraction. Besides solving tasks of information extraction, the new model includes not only upgrade of existing lexical resources and ontologies, but also creation of the new ones. Its application resulted in development of a system for extraction of information related to the culinary domain, but this new model can be used in other fields as well. Beside this, the food ontology has been developed, Serbian WordNet is extended for another 1.404 synsets from the culinary domain, while electronic dictionary of Serbian is enlarged with 1.248 entries. The significance of the model application comes from the fact that the new and enriched linguistic resources can be used in other systems for natural language processing. The opening chapter of the thesis elaborates the need of providing an informatic model for processing a huge linguistic corpus related to culinary and gastronomy domain, through methodologically precise and solid approach integrating pieces of information on the domain. Also, the formalization of the basic research subject, text in electronic form, has been presented. Further on, the chapter contains a description of the natural languages approximations introduced in order to enable modern information technologies to process texts written in natural languages, and it emphasizes the need to make the characterisation of the text language with corresponding corpus and sublanguage. Further on in the first chapter, the task of information extraction, and the models for informatic processing of non‐structured or semi‐structured texts, used by the computer to interpret the meaning that the author (not necessarily a human) has intended to give while writing the text, are defined. Additionally, this chapter contains the description of the methods used in information extraction field – methods based on rules and methods based on machine learning. Their advantages and shortcomings are listed, so as the reasons why in this thesis are used techniques based on linguistic knowledge. As a conclusion to the introduction chapter, a special attention is given to ontologies, WordNet, and the significance of its usage as ontology. The second chapter contains the presentation of the linguistic resources and tools exploited in this thesis. It describes morphological dictionaries and local grammars used for solving the problem of information extraction from texts written in Serbian. A review of information extraction systems is given subsequently. At the end of the second chapter, the stages in processing of Serbian written texts during the information extraction in the software systems Unitex and GATE are described. The main result of the thesis is presented in the third chapter. It is the model for solving the problem of information extraction by integrating linguistic resources and tools, which includes creation of a text corpus, definition of tasks for information extraction, establishment of finite state models for information extraction, and their application accordingly, iterative enlarging of electronic morphological dictionaries, enrichment and enhancement of WordNet, and creation of new ontologies. Each of these steps is described thoroughly. Even though the model was at first considered as a solution for problems in processing Serbian, it can be equally applied for processing texts written in other languages, with the development of suitable language resources accordingly. The implementation of the above explained steps is described in the fourth chapter, through a system for information extraction from the culinary texts written in Serbian. Then follows the description of a bond in the development and mutual complement of lexical resources through steps in creating domain corpus, identifying culinary lexica, expanding and upgrading of WordNet and electronic morphological dictionaries, and developing of domain ontologies – the food ontology, the approximate measure ontology, and the ontology of ingredients that can be used as mutual replacements in the culinary domain. This system, developed for information extraction, has served for creating an advanced search system which, based on a corpus of culinary texts, generates all possible answers to inquiries made by users. In the frame of this system is implemented a specific method which serves for creation of links between different recipes. This is used in case when the user reviews a text of a recipe and notices that in preparing description features some part which already had appeared in other recipe, but with additional or different explanation. Another contribution of this thesis is application of developed ontologies in tasks that convert approximate measures into standard measures, and establishment of similarities among the recipes. The similarity of the recipes is defined as similarity of texts which describe process of course preparation in accordance with a specific recipe. The last chapter contains final conclusions and directions for future research. URI: http://hdl.handle.net/123456789/4410 Files in this item: 1
teza_Stasa.pdf ( 10.38Mb ) -
Elfaghine, Halima (Beograd , 2016)[more][less]
Abstract: The subject of this thesis belongs the area of quality control, which represents the practical usage of statistics in following and improving the production process. In 1930 Walter Shewhart started studying quality control, based on control charts, and using statistical principles. Later on, after World War II, Edward Deming took this discipline to Japan, where it ourished. The design and the performance of control charts are the most important problems in this area. The purpose of studying the characteristics of control charts in this thesis is to compare the existing and the suggested control charts. The thesis is divided into four chapters. The rst chapter is introductory and contains motivation and basic de nitions related to this subject. In this study it is always assumed that the data are normally distributed, and that the in-control process data are stationary and uncorrelated. Shewhart control charts and the corresponding limits are constructed in order to meet the given speci cations for the quality characteristic that we investigate. Quality control that is applied to a production process always has costs related to the control. The important parameters connected to the cost of quality control are: width of control limits k, the sample size n and the interval between the samples h. In Chapter 2 a new loss function is given, which is connected to the production process and to X−bar quality control chart. Using Matlab program for optimization, values of ^k; ^n and ^h are found, which minimize the loss function for given costs. For given values of cost, a non-linear regression model is built using a package Sigma plot and the obtained values are compared to those obtained by numerical optimization. In Chapter 3, the time series model Yi = Xi + (1 − )Yi−1 is investigated, where 0 < B 1 is a constant, Xi are N( ; 2) distributed. Exponentially Weighted Moving Average (EWMA) control charts for this model are presented, and type I and type II errors are calculated in the case when i is large. For di erent sample sizes, the new comparison between the optimal design of the X-bar and EWMA control charts for Normally distributed quality characteristic is given, comparing the corresponding cost-loss functions, power functions and average run lengths. i The process of calibration is one of the methods in statistical process control, introduced for improving the quality of the products and for reducing the production costs. In Chapter 4, two new models of non-symmetrical loss function are introduced. Here, the loss function is connected to one product under control (not to the whole sample). Using our program, written in statistical software R, the value which minimizes the expected loss for Shewhart X control chart is found. This value is used as the new central target value of the quality characteristic, that is, the production process is calibrated with this new value. The thesis ends with Conclusions, where the results of the thesis are summarized, and with some open problems to be investigated. URI: http://hdl.handle.net/123456789/4340 Files in this item: 1
Disertacija_Halima_Elfaghihe.pdf ( 1.104Mb ) -
Vučković, Bojan (Beograd , 2017)[more][less]
Abstract: We present original results from the following fields of discrete mathematics: chromatic graph theory, extremal set theory and Boolean matrix theory. From the chromatic graph theory we investigate edge and total colorings satisfying the condition that neighboring vertices of a graph possess different values of multi-set, set or sum, induced by the giving coloring. Multi-set neighbor-distinguishing edge coloring of a graph is an assignment of colors to edges such that, for every edge uv of a graph, multi-set of the edges incident with the vertex u differs from the multi-set of the edges incident with the vertex v. The previous best result concerning the minimum number of colors required for such a coloring of an arbitrary graph states that four colors are sufficient. The author’s contribution is a proof that such a coloring is always possible with only three colors, which is in general case the optimal number of colors. We construct a graph for which we subsequently prove that a different number of colors is required to obtain a multi-set neighbor-distinguishing coloring and neighbor-distinguishing coloring by sum. As far as we know, this is the first example of such a graph. A few results concerning the neighbor expended sum distinguishing coloring are given. The main contribution is a proof that for an arbitrary graph there exists a total coloring from the set f1; 2; 3g, such that every two adjacent vertices have different sums of its adjacent vertices and incident edges. Also, for certain classes of graphs is proved that there exists such a coloring using only the colors from the set f1; 2g. Neighbor-distinguishing edge coloring of a graph G requires that every two adjacent edges receive different colors, while the sets of the edges incident with the vertices u and v differ for every edge uv of G. The author presents a procedure of edge coloring for an arbitrary graph without isolated edges, where we a smaller number of colors is used compared to all known results. For the adjacent vertex distinguishing total coloring of a graph G the condition is that every two adjacent and incident elements of V (G) [ E(G) receive different colors, while for every edge uv of G the set composed from the colors assigned to the edges incident with u together with the color of u, differs from such a set for v. The author improves the upper bound of the minimum number of colors needed for such a coloring, relative to the maximal degree of a graph. Frankl’s conjecture from the extremal set theory states that for every family closed under union there exists an element contained in at least half of the sets of the family. We give a proof that Frankl’s conjecture holds for every family contained from 12 elements, while it is known that this is true for families contained from 11 or less elements. Our proof is based on the efficient algorithm that exhausts all the possibilities, while using the results for subfamilies that eventual counter-example cannot contain, which we obtained in a number of consecutive steps. Family of sets G is an FC-family if for every family F containing G there exists an element from S G that appears in at least half of the sets of F. NonFC-family is every family that is not FC. The author’s contribution is the complete classification of all families consisting of 6 or less elements into FC and NonFC-families. From the Boolean matrices theory we present our results concerning the row space cardinality. Boolean matrices are the matrices whose all components are from the set f0; 1g, while the row space of a Boolean matrix is the set of vectors that can be obtained by disjunction from the rows of a matrix. We present the set consisted of all values a from the interval [2n2 + 2n3; 2n2] such that there exists a matrix of dimension n n having the row space cardinality equal to a. For the least positive integer an for which there exists no matrix of dimension n n having the row space cardinality equal to an, the author gives a lower bound that is an improvement over the previously known results. All proofs for the main results in the dissertation are constructive. Proofs of some of them require the use of computers where there is a calculation of a great number of possibilities. For other proofs this was not necessity, though algorithms following the steps of the proofs can be implemented to obtain a graph coloring or a matrix with the desired properties. URI: http://hdl.handle.net/123456789/4661 Files in this item: 1
Disertacija_-_Bojan_Vuckovic.pdf ( 1.143Mb ) -
Simić, Danijela (Beograd , 2017)[more][less]
Abstract: In this thesis is presented interactive formalization of various models of geometry and algebraic methods for automated proving geometry theorems. We present our current work on formalizing analytic (Cartesian) plane geometries within the proof assistant Isabelle/HOL. We give several equivalent definitions of the Cartesian plane and show that it models synthetic plane geometries (using both Tarski’s and Hilbert’s axiom systems). We also discuss several techniques used to simplify and automate the proofs. As one of our aims is to advocate the use of proof assistants in mathematical education, our exposure tries to remain simple and close to standard textbook definitions, but completely formal and mechanically verifiable. This formalization presents the develop of the necessary infrastructure for implementing decision procedures based on analytic geometry within proof assistants. Furthermore, we investigate complex numbers. Deep connections between complex numbers and geometry had been well known and carefully studied centuries ago. Fundamental objects that are investigated are the complex plane (usually extended by a single infinite point), its objects (points, lines and circles), and groups of transformations that act on them (e.g., inversions and Möbius transformations). In this thesis we treat the geometry of complex numbers formally and present a fully mechanically verified development within the theorem prover Isabelle/HOL. We discuss different approaches to formalization and discuss major advantages of the more algebraically oriented approach. Apart from applications in formalizing mathematics and in education, this work serves as a ground for formally investigating various non-Euclidean geometries and their intimate connections. We also present a formalization of part of Tarski axiom system withing Poincare disk model in Isabelle/HOL. Further on, we analyze connections between geometry and polynomials and the use of these connections. In Euclidean geometry, objects and relations between them can be expressed as polynomials. Further, any geometry construction can be expressed by set of polynomials and geometry statements can be proved by using algebraic methods (e.g. the Gröbner bases method or Wu’s method) over that set of polynomials. We describe an implementation of an algorithm in Isabelle/HOL that accepts a term representation of a geometry construction and returns its corresponding set of polynomials. Our further work will be to use the method of Gröbner bases within the Isabelle system on the generated polynomials, in order to prove correctness of the given construction. Furthermore, we investigate how spatial geometry constructions can be presented using polynomials. We investigate two different approaches in deriving those polynomials and then compare efficiency of algebraic provers depending on the approach used. We present a fully automated system for transforming geometry constructions into set of polynomials. Our further work would be to relate these geometry provers with dynamic geometry software and thus make easier for students to use it. URI: http://hdl.handle.net/123456789/4499 Files in this item: 1
06062017danijela_doktorat.pdf ( 1.158Mb ) -
Alshafah, Samira (Beograd , 2018)[more][less]
Abstract: Proteins with intrinsically disordered regions are involved in large number of key cell processes including signaling, transcription, and chromatin remodeling functions . On the other side, such proteins have been observed in people suffering from neurological and cardiovascular diseases, as well as various malignancies. Process of experimentally determining disordered regions in proteins is a very expensive and long - term process. As a consequence, a various computer programs for predicting position of disordered regions in proteins have been developed and constantly improved. In this thesis a new method for determining Amino acid sequences that characterize ordered/disordered regions is presented. Material used in research includes 4076 viruses wit h more than 190000 proteins. Proposed method is based on defining correspondence between n -grams (including both repeats and palindromic sequence s) characteristics and their belonging to ordered/disordered protein regions. Positions of ordered/disordered regions are predicted using three different predictors. The features of the repetitive strings used in the research include mol e fractions, fract ional differences, and z -values. Also, data mining techniques association rules and classification were applied on both repeats and palindromes. The results obtained by all techniques show a high level of agreement for a short length of less than 6, while the level of agreement grows up to the maximum with increasing the length of the sequences. The high reliability of the results obtained by the data mining techniques shows that there are n -grams, both repeating sequences and palindromes, which uniquely ch aracterize the disordered/ ordered regions of the proteins . The obtained results were verified by comparing with the results based on n- grams from the DisProt database which contain s the positions of experimentally verified disordered regions of the protein. Results can be used both for the fast localization of disordered/ordered regions in proteins as well as for further improving existing programs for their prediction. URI: http://hdl.handle.net/123456789/4746 Files in this item: 1
ThesisSamira_Alshafah.pdf ( 3.106Mb ) -
Đenić, Aleksandar (Beograd , 2018)[more][less]
Abstract: This pap er considers two discrete lo cation problems: Bus Terminal Lo cation Problem (BTLP) and Long-term Care Facility Lo cation Problem (LTCFLP). Vari- able Neighb orho o d Search (VNS) metho d for solving BTLP and LTCFLP is pre- sented in this pap er. VNS is a single-solution based metaheuristic based on system- atic change of neighb orho o ds while searching for optimal solution of the problem. It consists two main phases: shake phase and lo cal search phase. BTLP is a discrete lo cation problem which considers lo cating bus terminals in order to provide the highest p ossible quality of public service to the clients. Clients are presented as public transp ortation stations, such as bus or metro stations. VNS algorithm is used for solving BTLP. This algorithm uses improved lo cal search based on e cient neighb orho o d interchange. VNS is parallelized (PVNS) which leads to signi cant time improvement in function of the pro cessor core count. Computa- tional results show that prop osed PVNS metho d improves existing results from the literature in terms of quality. Larger instances, based on instances from the Trav- eling Salesman Problem library, are presented and computational results for those instances are rep orted. LTCFLP is created as a part of health care infrastructure planning in South Korea. Clients are considered as groups of patients with a need of long-term health care, while established facilities present lo cations where the centers that provide health care services should b e built. Prede ned are n lo cations where centers are to b e established. This problem seeks at most K lo cations to establish health centers so they are to b e equally loaded with clients demand. For solving LTCFLP, by using VNS algorithm, data structure based on fast interchange is presented. It reduces the time complexity of one iteration of lo cal search algorithm to O ( n · max( n,K 2 )) comparing to the known time complexity from the literature O ( K 2 · n 2 ) . Reduced time complexity of the presented VNS leads to b etter quality solutions, due to larger numb er of VNS iterations that can b e p erformed in less computational time. This pap er presents computational results that outp erform the b est known results from the literature. URI: http://hdl.handle.net/123456789/4744 Files in this item: 1
Aleksandar_Djenic_phd.pdf ( 2.183Mb ) -
Čukić, Ivan (Beograd , 2018)[more][less]
Abstract: There is a big class of problems that require software systems with asynchronously executed components. For example, distributed computations have the distributed nodes that process the data asynchronously to one anot- her, service-oriented architectures need to process separate requests asynchrono- usly, and multi-core and heterogeneous systems need to have multiple separa- te tasks running concurrently to best utilize the hardware. Even ordinary GUI applications need asynchronous components – the user interface needs to be re- sponsive at all times which means that no matter in what state the program is in, it needs to process and react to the input events coming from the user. The necessity of concurrency and asynchronous execution brings in the added com- plexity of the Inversion of Control (IoC) into the system, either through mes- sage passing or through event processing. IoC makes code difficult to develop and reason about, it increases component coupling and inhibits clean functional or object-oriented software design. In this dissertation, a method for solving the problems that IoC introduces is presented. It presents a way to model both synchronous and different types of asynchronous tasks with the continuation monad. The continuation monad serves as a primitive to build more complex control flow structures that mimic the control flow structures of the host programming language. It also allows for building more complex control structures specialized for parallelism, transactional execution, and for simulating functional programming idioms with asynchronous tasks through a generalization of the continuation monad that allows the asynchronous tasks to generate results one at a time. This allows for writing programming systems with asynchronously executed components by writing seemingly synchronous imperati- ve or functional code while leaving it up to the compiler to do all the heavy lifting and convert the written code to asynchronously executed set of tasks. Another benefit of the presented method is that it allows for easier automatic handling of the data lifetime without the need for garbage collection. This method has been successfully applied and tested in several Free/Libre Open Source Software and proprietary real-world software projects used by hun- dreds of millions of people around the world. In this dissertation, an example of a secure project management system is described which is based on a similar system implemented as a part of the KDE Plasma project. This dissertation also contains the important parts of the implementation of the AsynQt library which extends the Qt library, and its concurrency primitive – QFuture class – with functional reactive programming patterns based on the method proposed in this dissertation. URI: http://hdl.handle.net/123456789/4738 Files in this item: 1
ivan_cukic_phd.pdf ( 1.328Mb ) -
Radojčić, Nina (Beograd , 2018)[more][less]
Abstract: In this dissertation, three NP-hard optimization problems are studied and va- rious computational intelligence methods are considered for solving them, with a special emphasis on the possibilities of applying fuzzy logic in order to improve the performances of proposed methods. In addition, it is shown how fuzzy logic can be incorporated into a model to make it more adequate to real world applications. The first problem considered is the Risk-Constrained Cash-in-Transit Vehicle Routing Problem (RCTVRP) that represents a special case of the vehicle routing problem (VRP). Similar to the classical VRP, the aim is to determine the collection routes from one depot to a number of customers in order to minimize the overall travel distance (or cost). Additionally, the safety aspect of the routed risk constraints are introduced in the case of RCTVRP. The RCTVRP concerns the issue of secu- rity during the transportation of cash or valuable goods (e.g. in the cash-in-transit industry). The other two problems studied in this dissertation belong to the class of loca- tion problems: the Load Balancing Problem (LOBA) and the Max-Min Diversity Problem (MMDP). The goal of the LOBA problem is to locate a fixed number of facilities, such that the difference between the maximum and minimum number of customers served by each facility is balanced. The LOBA model is useful in cases where customers naturally choose the closest facility. The MMDP consists of se- lecting a subset of a fixed number of elements from a given set in such a way that the diversity among the selected elements is maximized. This problem also arises in real world situations encompassing a variety of fields, particularly the social and biological sciences. In order to solve the RCTVRP, a fuzzy GRASP (Greedy Randomized Adaptive Search Procedure) is hybridized with Path Reliking (PR) methodology. Carefully adjusted fuzzy modification incorporated into the proposed GRASP for the RC- TVRP improved its performance. Moreover, in this dissertation a new PR structure is implemented and can be used for other vehicle routing problems. To improve the algorithm’s time complexity, a new data structure for the RCTVRP is incor- porated. The proposed fuzzy GRASP with PR hybrid shows better computational performance compared to its non-fuzzy version. Furthermore, computational results on publicly available data sets indicate that the proposed algorithm outperforms all existing methods from the literature for solving the RCTVRP. For solving the LOBA problem two efficient hybrid metaheuristic methods are proposed: a combination of reduced and standard variable neighborhood search met- hods (RVNS-VNS) and hybridization of evolutionary algorithm and VNS approach (EA-VNS). The proposed hybrid methods are first benchmarked and compared to all the other methods on existing test instances for the LOBA problem with up to 100 customers and potential suppliers. In order to test the effectiveness of the pro- posed methods, we modify several large-scale instances from the literature with up to 402 customers and potential suppliers. Exhaustive computational experiments show that the proposed hybrid methods quickly reach all known optimal solutions while providing solutions on large-scale problem instances in short CPU times. Re- garding solution quality and running times, we conclude that the proposed EA-VNS approach outperforms other considered methods for solving the LOBA problem. EA approach is also proposed for solving the MMDP. Computational experi- ments on a smaller benchmark data set showed that the classic EA quickly reached all optimal solutions obtained previously by an exact solver. However, some of the larger instances of MMDP were challenging for classic EA. Although researchers have established the most commonly used parameter setting for EA that has good performance for most of the problems, it is still challenging to choose the adequate values for the parameters of the algorithm. One approach to overcome this is chan- ging parameter values during the algorithm run. As part of this dissertation this problem was addressed by extending the evolutionary algorithm by adding a fuzzy rule formulated from EA experts’ knowledge and experience. The implemented fuzzy rule changes the mutation parameter during the algorithm run. The results on tested instances indicate that the proposed fuzzy approach is more suitable for solving the MMDP than classic EA. For all three problems addressed whereas the smaller instances that CPLEX was able to solve, obtained optimal solutions were used for comparison with proposed methods and all of the proposed methods obtained these optimal solutions. Moreover, in this dissertation it has been shown that fuzzy logic is a successful tool in modeling the RCTVRP. In this problem the risk constraints are set by using a risk threshold T on each route and thus, the routes with risk larger than T are forbidden. However, in this dissertation the aim is to take into account the probability of being robbed along each route instead of just allowing solutions with routes that satisfy the risk constraints. A new fuzzy model for the RCTVRP is developed which takes into account the value of the risk index of each route and the solutions with lower values of risk indexes on their routes are considered superior. In order to achieve that fuzzy numbers are used in the improved model. Moreover, two mixed integer program formulations of new fuzzy model are developed and presented in this dissertation. The introduced fuzzy model is compared with the model from the literature using an adequate example and the advantages of the newly proposed fuzzy RCTVRP is demonstrated. Computational experiments are performed and the comparison of the two models given in the paper show that the newly presented approach leads to safer routes. URI: http://hdl.handle.net/123456789/4737 Files in this item: 1
tezaNinaRadojicic.pdf ( 1.665Mb ) -
Spasić, Mirko (Beograd , 2020)[more][less]
Abstract: The query containment problem is a very important computer science problem,originally defined for relational queries. With the growing popularity of theSPARQLquerylanguage, it became relevant and important in this new context, too. This thesis introducesa new approach for solving this problem, based on a reduction to satisfiability in first orderlogic. The approach covers containment underRDF SCHEMAentailment regime, and it candeal with the subsumption relation, as a weaker form of containment. The thesis provessoundness and completeness of the approach for a wide range of language constructs. It alsodescribes an implementation of the approach as an open source solverSPECS. The experi-mental evaluation on relevant benchmarks shows thatSPECSis efficient and comparing tostate-of-the-art solvers, it gives more precise results in a shorter amount of time, while suppor-ting a larger fragment ofSPARQLconstructs. An application of query language modeling canbe useful also along refactoring of database driven applications, where simultaneous changesthat include both a query and a host language code are very common. These changes canpreserve the overall equivalence, without preserving equivalence of these two parts consideredseparately. Because of the ability to guarantee the absence of differences in behavior betweentwo versions of the code, tools that automatically verify code equivalence have great benefitsfor reliable software development. With this motivation, a custom first-order logic modelingof SQL queries is developed and described in the thesis. It enables an automated approachfor reasoning about equivalence ofC/C++programs with embedded SQL. The approach is implemented within a publicly available and open source framework SQLAV. URI: http://hdl.handle.net/123456789/5095 Files in this item: 1
doktoratMirkoSpasic.pdf ( 1.258Mb ) -
Šandrih, Branislava (Beograd , 2020)[more][less]
Abstract: The main goal of this dissertation is to put different text classification tasks inthe same frame, by mapping the input data into the common vector space of linguisticattributes. Subsequently, several classification problems of great importance for naturallanguage processing are solved by applying the appropriate classification algorithms.The dissertation deals with the problem of validation of bilingual translation pairs, sothat the final goal is to construct a classifier which provides a substitute for human evalu-ation and which decides whether the pair is a proper translation between the appropriatelanguages by means of applying a variety of linguistic information and methods.In dictionaries it is useful to have a sentence that demonstrates use for a particular dictio-nary entry. This task is called the classification of good dictionary examples. In this thesis,a method is developed which automatically estimates whether an example is good or badfor a specific dictionary entry.Two cases of short message classification are also discussed in this dissertation. In thefirst case, classes are the authors of the messages, and the task is to assign each messageto its author from that fixed set. This task is called authorship identification. The otherobserved classification of short messages is called opinion mining, or sentiment analysis.Starting from the assumption that a short message carries a positive or negative attitudeabout a thing, or is purely informative, classes can be: positive, negative and neutral.These tasks are of great importance in the field of natural language processing and theproposed solutions are language-independent, based on machine learning methods: sup-port vector machines, decision trees and gradient boosting. For all of these tasks, ademonstration of the effectiveness of the proposed methods is shown on for the Serbianlanguage. URI: http://hdl.handle.net/123456789/5090 Files in this item: 1
BranislavaSandrihPhd_konacno.pdf ( 9.053Mb ) -
Tanasijević, Ivana (Beograd , 2020)[more][less]
Abstract: The motivation for writing this doctoral dissertation is a multimedia col-lection that is the result of many years of field research conducted by researchers from the Institute for Balkan studies of the Serbian Academy of Sciences and Arts. The collection consists of materials in the form of recorded interviews, various recorded customs, associated textual descriptions (protocols) and numerous other documents.The subject of research of this dissertation is the study of possibilities and the development of new methods that could be used as a starting point in solving the problem of managing the intangible cultural heritage of the Balkans. The subtasksthat emerge in this endeavor are the development of adequate design and implemen-tation of a multimedia database of intangible cultural heritage that would meet the needs of different types of users, automatic semantic annotation of protocols using natural language processing methods, as a basis for semi-automatic annotation of the multimedia collection, and successful search by metadata which comply with the CIDOC CRM standard, study of additional search possibilities of this collection in order to gain new knowledge, as well as development of selected methods.The main problem with the available methods is that there is still not enough developed infrastructure in the context of natural language processing, organization and management in the field of cultural heritage in the Balkans and especially for the Serbian language, which could be effectively used to solve the proposed problem.There is thus a strong need to develop methods to reach an appropriate solution.For the semi-automatic annotation of multimedia materials, automatic semantic annotation of the protocols associated with the materials was used. It was carriedout by methods of information extraction, recognition of named entities and topicextraction, using rule-based techniques with the help of additional resources suchas electronic dictionaries, thesauri and vocabularies from a specific domain.To classify textual protocols in relation to the topic, research was conducted onmethods that can be used to solve the problem of classifying texts in the Serbianlanguage, and a method was offered that is adapted to the specific domain beingprocessed (intangible cultural heritage), to the specific problems being solved (clas-sification of protocols in relation to the topic) and to the Serbian language, as one of the morphologically rich languages.To work with spatial data, a spatial model has been developed that is suitable for displaying results on a map, as well as for creating spatial queries through an interactive graphical display of a map of locations.The results of experiments conducted on the developed methods show that the use of a rule-based approach in combination with additional language resources an dwith putting in a reasonable amount of effort gives very good results for the task of information extraction. An F measure of 0.87 was reached for the extraction of named entities, while an F measure of 0.90 was reached for the extraction of topics,which is in the range of measures from published research from similar problem sand domains.The results of the text classification indicate that the selected statistical methods of machine learning in their basic form when applied to the protocols, although generally successful, give a bad F measure, 0.44, while significant improvement is achieved with the use of semantic techniques, in which case an F measure of 0.88 is reached.Some of the results presented in this dissertation are contained in the papers[266], [265], [94], [264], [267], which have been published or accepted for publication.The conclusion drawn from the research is that to solve the given problem it is necessary to engage experts from several fields, that the needs of different groups of users are complex, which complicates the task of organizing and managing them ultimedia collection, that the domain of cultural heritage is very rich in semantics,that context plays a major role in the tasks of information extraction and text classification, and finally that for these tasks the developed rule-based methods of natural language processing as well as statistical techniques of machine learning prove to be successful. URI: http://hdl.handle.net/123456789/5093 Files in this item: 1
IvanaTanasijevicdr.pdf ( 4.674Mb ) -
Grbić, Milana (Beograd , 2020)[more][less]
Abstract: In this dissertation some actual problems of bioinformatics and computational biology are explored,together with the methods for solving them. The following problems are considered: partitioning ofsparse biological networks intok-plexsubnetworks, prediction of the role of metabolites in metabolicreactions, partitioning of biological networks into highly connectedcomponents and the problem ofidentification of significant groups of proteins by adding new edges to the weighted protein interacti-ons network. The aforementioned problems have theoretical importance in areas of machine learningand optimization, and practical application in biological research. Inaddition to solving the afore-mentioned problems from the computational aspect, the dissertation explores further application ofthe obtained results in the fields of biology and biochemistry, as well as the integration of resultswithin existing bioinformatics tools.The problem of predicting the role of metabolites in metabolic reactions is solved by a predictivemachine learning method based on the conditional random fields, whilefor the remaining threeproblems the algorithams based on variable neighbourhood search are developed. For solving theproblem of identification of significant groups of proteins by adding new edges to the weighted proteininteractions network, the variable neighbourhood search is only the first phase of the proposedsolution, while in the second and the third phase of the proposed method, the integration withadditional biological information and bioinformatics tools are performed.The proposed computational methods of partitioning and groupingin biological networks confirmexisting findings in a new manner and lead to new discoveries about biological elements and theconnections between them. By solving these problems and by interpreting the obtained resultsin this dissertation, a scientific contribution was made to the scientific field of computer science,particularly to the scientific disciplines of bioinformatics and computational biology. URI: http://hdl.handle.net/123456789/5088 Files in this item: 1
grbic_Milana_disertacija.pdf ( 8.740Mb ) -
Pejović, Aleksandar (University of Novi Sad, Faculty of Technical Sciences , 2020)[more][less]
Abstract: This dissertation is about the development of a parallel software system for representing and solving problems of finite model theory and its application. The theoretical foundation of the system is presented, as well as an in-depth explanation of the implementation in Python. In particular, a parallel method for computing Boolean expressions based on the properties of finite free Boolean algebras is developed. It is also shown how various finite combinatorial objects can be coded in the formalism of Boolean algebras and counted by this procedure. Specifically, using a translation of first order predicate formulas to propositional formulas, we developed a technique for constructing and counting finite models of first order theories. Finally, we have developed some general techniques that enable more effective use of our system. We illustrate these techniques on two examples. The first one deals with partial orders, while the other one is about random graphs. URI: http://hdl.handle.net/123456789/4857 Files in this item: 3
APejovicDissertation.pdf ( 1.253Mb )APejovicPhDPresentation.pdf ( 1.103Mb )APejovicPhDSoftwareSources.zip ( 25.25Kb ) -
Maljković, Mirjana (Beograd , 2021)[more][less]
Abstract: Proteins are linear biological polymers composed of amino acids whose structure and function are determined by the number and order of amino acids. The structure of the protein has three levels: primary, secondary and ter- tiary (three-dimensional, 3D) structure. Since the experimental determination of protein 3D structure is expensive and time-consuming, it is important to develop predictors of protein 3D structure properties from the amino acid sequence (pri- mary structure), such as 3D structure of the protein backbone. The 3D structure of the backbone can be described using prototypes of local protein structure, i.e. prototypes of protein fragments with a length of few amino acids. A set of local structure prototypes determines the library of local protein structures, also called the structural alphabet. A structural alphabet is defined as a set of N proto- types of L amino acid length. The subject of this dissertation is the development of models for the prediction of structural alphabet prototypes for a given amino acid sequence using different data mining approaches. As one of the most known, structural alphabet Protein Blocks (PBs) was used in one part of the doctorial re- search. Structural alphabet PBs consists of 16 prototypes that are defined using fragments of 5 consecutive amino acids. The amino acid sequence is combined with the structural properties of a protein that can be determined based on amino acid sequence (occurrence of repeats in the amino acid sequence) and results of predictors of protein structural properties (backbone angles, secondary structures, occurrence of disordered regions, accessible surface area of amino acids) as an input to the prediction model of structural alphabet prototypes. Besides the de- velopment of models for prediction of prototypes of existing structural alphabet, the analysis of the capability of developing new structural alphabets is researched by applying the TwoStep clustering algorithm and construction of models for the prediction of prototypes of new structural alphabets. Several structural alpha- bets, which differ in the length of prototypes and the number of prototypes, have been constructed and analyzed. Fragments of the large number of proteins, whose structure is experimentally determined, were used to construct the new structural alphabets. URI: http://hdl.handle.net/123456789/5236 Files in this item: 1
dthesis.Matf.Mirjana.Maljkovic.pdf ( 47.43Mb ) -
Džamić, Dušan (Beograd , 2021)[more][less]
Abstract: The theory of complex networks has proven to be very important inthe study of the characteristics and structure of various complex systems. In thelast two decades, a large number of researches have been directed towards thedevelopment of methods for clustering in complex networks. In 2012, Center forDiscrete Mathematics and Theoretical Computer Science (DIMACS), which is awell-known consortium of prestigious academic institutions (Rutgers University,Princeton, Colombia) and research laboratories (Microsoft, IBM, AT & T, NEC),included the problem of clustering in complex networks on the list of the mostimportant problems and challenges in computer science. Clustering in complexnetworks can be applied in a variety of contexts to achieve different goals, andtherefore, there is no generally accepted definition of a cluster. For this reason,different approaches are used in developing clustering methods. An approach thathas attracted the most attention of researchers involves two subproblems: defininga function to determine the quality of a partition and constructing methods to finda partition that has the maximum value of the defined quality function. In thisapproach, the problem of clustering is formulated as the problem of combinatorialoptimization and various methods of mathematical optimization can be used tosolve it. One of the most commonly used quality function is the modularity.Clustering by modularity maximization, i.e., finding a partition with the max-imum value of modularity, is NP-hard problem. Thus, only heuristic algorithmsare suitable of processing large datasets. In this dissertation, a novel method formodularity maximization based on the variable neighborhood search heuristic isproposed. For the purpose of efficient application in large-scale complex networks,a procedure for decomposition of the problem into smaller subproblems is devel-oped. In addition, a mechanism for overcoming local maxima of modularity isimproved using criteria for occasional acceptance of solution which is worse thanthe current one. DIMACS instances are used to test the proposed method, and theobtained results are compared with the best ones presented in the literature, ob-tained by two methods developed in DIMACS implementation challenge in 2012.In addition, the obtained results are compared with the results of six methodsdeveloped after 2012, from the literature. A comparative analysis of the resultsshows that the proposed method outperforms the existing methods for modularitymaximization and improves the best known solutions on 9 out of 13 hard instances. Clustering by modularity maximization is not suitable for detecting small clus-ters in large networks. For this reason, a new function for measuring the qualityof a partition has been proposed in the dissertation. Through three theorems, itis shown that the new measure, called E-function, has the potential to identifyclusters in the network and overcome limits of modularity. For testing the pro-posed E-function and comparison with the modularity function, a generic variableneighborhood method is developed to optimize the considered quality function.Computational experiments are performed on generated and real instances fromthe literature for which the correct division into clusters is known. The resultsshow that the expected clusters can be identified, both on artificial and real in-stances, by maximizing the E-function. URI: http://hdl.handle.net/123456789/5214 Files in this item: 1
teza_dzamic.pdf ( 3.074Mb ) -
Jelović, Ana (Beograd , 2022)[more][less]
Abstract: n the first part of this dissertation different repeat types are defined as well as repeats that satisfy motif masks. A method for precise repeat finding in input sequences of arbitrary length has been described. As the input sequences can be very long, the number of found repeats can also be large. For that reason it is important that the method also includes filtering found repeats based on the expected number of their occurrences. The method was first applied to protein sequences in which experimentally confirmed T-cell epitopes from the IEDB database were registered. Association rules were applied to the found repeats in order to construct a model that would enable the prediction of the positions of T-cell epitopes in protein sequences. In this way, it would indicate to researchers a region in the protein sequence where an epitope can be expected with high confidence. In the case of T-cell epitopes, a large number of rules with high confidence was found. These rules can be considered as reliable predictors of the position of T-cell epitopes within the protein sequences. Based on the results found, association rules were formed that characterize the epitopes and the repeats associated with them in more detail. As a large number of results were found, only their part is presented in this disser- tation. On the basis of the strings that determine the repeat, a motif mask that the repeat needs to satisfy was searched for. The entire procedure was applied to both direct non-complementary repeats and indirect non-complementary repeats. With similar results, the entire procedure was applied to B-cell epitopes on data from the IEDB database. Data on experimentally confirmed short linear motifs were taken from the ELM database. In protein sequences where short linear motifs were registered, repeats were searched for and association rules were applied to them. The rules with high confidence have been singled out in particular. On the basis of the results found, motif masks that repeats with high confidence would satisfy were searched for. URI: http://hdl.handle.net/123456789/5442 Files in this item: 1
Ana_Jelovic_tekst_doktorata.pdf ( 6.127Mb ) -
Jovanović, Jasmina (Beograd , 2022)[more][less]
Abstract: The analysis of biological sequence similarity between different species is significant in identifying functional, structural or evolutionary relationships among the species. Biological sequence similarity and analysis of newly discovered nucleotide and amino acid sequences are demanding tasks in bioinformatics. As biological data is growing exponentially, new and innovative algorithms are needed to be constantly developed to get faster and more effective data processing. The challenge in sequence similarity analysis algorithms is that sequence does not always have obvious features and the dimension of sequence features may be very high for applying regular feature selection methods on sequences. It is important to have a simple and effective algorithm for determining biological sequence relationships. This thesis proposes two new methods for sequence transformation in feature vectors that takes into consideration statistically significant repetitive parts of analyzed sequences, as well as includes different approaches for determination of nucleotide sequence similarity and sequence classification for predicting taxonomy groups of biological sequence data. The first method is based on information theory and fact that both position and frequency of repeated sequences are not expected to occur with the identical presence in a random sequence of the same length. The second method includes building signatures of biological sequences and profiles of taxonomic classes based on repetitive parts of sequences and distances between these repeats. Proposed methods have been validated on multiple data sets and compared with results obtained using different well known and accepted methods in this field like BLAST, Clustal Omega and methods based on k-mers. Resulted precision for proposed methods is close to values provided for existing methods for the majority of tested data-sets, and time performance depends strictly to used infrastructure and sequence type. Methods provide results that are comparable with other commonly used methods focused on resolving the same problem, taking into consideration statistically significant repetitive parts of sequences with different characteristics. URI: http://hdl.handle.net/123456789/5440 Files in this item: 1
JasminaJovanovic.pdf ( 3.984Mb ) -
Protić, Danijela (Beograd , 2023)[more][less]
Abstract: Anomaly detection is the recognition of suspicious computer network behavior by comparing unknown network traffic to a statistical model of normal network behavior. Binary classifiers based on supervised machine learning are good candidates for normality detection. This thesis presents five standard binary classifiers: the k-nearest neighbors, weighted k-nearest neighbors, decision trees, support vector machines and feedforward neural network. The main problem with supervised learning is that it takes a lot of data to train high-precision classifiers. To reduce the training time with minimal degradation of the accuracy of the models, a two-phase pre-processing step is performed. In the first phase, numeric attributes are selected to reduce the dataset. The second phase is a novel normalization method based on hyperbolic the tangent function and the damping strategy of the Levenberg-Marquardt algorithm. The Kyoto 2006+ dataset, the only publicly available data set of real-world network traffic intended solely for anomaly detection research in computer networks, was used to demonstrate the positive impact of such pre-processing on classifier training time and accuracy. Of all the selected classifiers, the feedforward neural network has the highest processing speed, while the weighted k-nearest neighbor model proved to be the most accurate. The assumption is that when the classifiers work concurrently, they should detect either an anomaly or normal network traffic, which occasionally is not the case, resulting in different decision about the anomaly, i.e. a conflict arises. The conflicting decision detector performs a logical exclusive OR (XOR) operation on the outputs of the classifiers. If both classifiers simultaneously detected an anomaly or recognized traffic as normal, their decision was no conflict had occurred. Otherwise a conflict is detected. The number of conflicts detected provides an opportunity for additional detection of changes in computer network behavior. URI: http://hdl.handle.net/123456789/5599 Files in this item: 1
Danijela Protic - Doktorska Disertacija.pdf ( 3.143Mb ) -
Carić, Marko (Beograd , 2023)[more][less]
Abstract: This dissertation discusses the problem of calculating the number of equivalent classes of a Boolean function. The difficulty of determining the number of equivalence classes increases sharply with the number of variables sn. The motivation for choosing this topic lies in the fact that concrete numbers have been known so far only for relatively small values of n, although the problem itself has long been theoretically solved. Let G be the permutation group of the set Bn={0,1}n. Effect of Gonscalar group, Bn→B1, that is, vector invertible Boolean functions, Bn→Bn. Two scalar Boolean functions f(k)ig(k), defined by Bn, are considered equivalent. ∈G forever ∈Bnf(k )=g(σ(k)) holds. Two vector invertible Boolean functions f(k) and g(k) are considered equivalent in relation to the group G, i.e. ))).The equivalence relation∼decomposes the sets of all Boolean functions into equivalence classes.The equivalence of Boolean functions confers significant applications in the logical synthesis of combinatorial circuits and cryptography, especially in connection with the design of S-boxes. Let Un(G) and Vn(G) denote the number of scalar equivalence classes, ie. vector invertible Boolean functions of n variables with respect to the group G. The numbers Un(G) and Vn(G) can be calculated relatively simply if the cycle index from the group G to the G group is known. Induced by the group Snpermutations of coordinate elements step = (k1,k2,...,kn)∈Bn, •group Gn, induced by permutations and additions of coordinates, •group GLnlinear invertible transformations of elements of vector space Bn, and •group AGLof invertible transformations of elements Bn. If the permutation σ∈Ghasikcycles of lengthk1, its cycle structure is(σ)=(i1,i2,...).The cyclic index of the groupGisgeneratrix ZG(f1,f2,...)=1 |G|σ∈Gk1 fik k structure cycles σene structures of cycles σfraconstructionsGsides ∗pressuresGpermutations of the red group are known, but the cycle indexes itself, i.e. numbers Un(G) and Vn(G), are practically calculated only for relatively small values, for example n10. The dissertation presents original results in the field of enumeration of equivalence classes of Boolean functions in relation to these four groups of soft transformations. A similar expression is derived for all four groups of soft transformations for the cycle index in the form of the sumover partition softhennumbern. Based on the precyclical index much more. An overview of known results for relatively small and new results in the thesis for larger ones are shown in the following table: Number\GSnGnGLnAGLn Un(G)11→3310→328→3110→31 Vn(G)6→307→276→266, participates in direct effect, by special effect 26 latingthenumberofequivalenceclassesthatdoesnotuseacycleindex is shown and described in the third paper from the introductory chapter. These second parts of the dissertation refer to monotone Boolean functions — scalar Boolean functions that satisfy the condition of monotonicity (from which follows f(k)f(i)). Letrn, i.e. monotoneBooleanfunctionsofnvariables. The difficulty of calculating the number rn increases rapidly with n, so that until recently the last term of the sequence to be calculated was r7. The procedure described in the dissertation is based on Frobenius' theorem, on the basis of which the number r8 was determined. The dissertation consists of the first introductory chapter and the following three chapters. In the second chapter, theoretical concepts related to the material from chapters 3 and 4 are introduced, and they relate to discrete mathematics, combinatorics and the index of the cycle considered by the soft cycle of the 3rd form for describing the 3rd cycle. indices for the four considered groups of permutations, as well as the numbers Un(G) and Vn(G) of the equivalent class of the Boolean function with respect to these groups. First, the common improvements for all four groups are discussed, and then the specific accelerations related to individual groups are published by group. ter. In Chapter 4, the problem of finding the number of equivalence classes of a monotone Boolean function is solved. First, a general expression for calculating a number is given based on the Frobenius theorem of the form of the sum (by the soft number partition of the number) of the number from the fixed part of the graph corresponding to the point corresponding to the point. In different partitions, different ways of calculating the number of non-fixed points for n8 are shown. The procedure based on which the number r8 was calculated, which also represents the original contribution of this dissertation, is shown in the first paper from the list from pp. 1 to 3. practically at the same time the obtained result described in the dissertation. URI: http://hdl.handle.net/123456789/5571 Files in this item: 1
MarkoCaricDisertacija.pdf ( 1.467Mb ) -
Radosavljević, Jovan (Beograd , 2023)[more][less]
Abstract: Graph G = (V,E) is an ordered pair of set of nodes V and branches E. Order graph G is the number of nodes |V |, and its size is the number of branches |E|. Knots u, v ∈ V are adjacent if there is a branch uv ∈ E between them. Distance dist(u, v) nodes u and v G is the length of the shortest path from u to v. The diameter of the graph G is the largest distance dist(u, v) let two nodes in, v. They are discussed in the dissertation graphs of diameter 2. Intuitively, the notion that graphs are dia- meters 2 simple structures; however, they are known to be asymptotically close all graphs of diameter 2. That is why a narrower class is interesting — class D2C of critical graphs of diameter 2, i.e. graphs where the removal of any branches leads to an increase in diameter. In addition, a narrower class of pri- mitive D2C (PD2C) graphs, i.e. D2C graphs that do not have two nodes with the same set of neighbors. In the introductory chapter 2, the basic concepts, algorithms and dings used in the dissertation. They are presented in the following chapters original results regarding diameter graphs 2. Chapter 3 describes the procedure for obtaining a list of D2C graphs of order up to 13. With built-in parallelization, the creation of a list of D2C graphs of order up to 13 it lasted a month. This was a step forward, because previously there was a spi- around all graphs of diameter 2 lines up to 10. The obtained results were used for testing several known hypotheses about graphs of diameter 2. In chapter 4 it is shown that for every m ⩾ 3 a D2C graph containing cli- a ku of size m must have at least 2m nodes. At the same time, with accuracy up to isomorphism, there is exactly one graph of size 2m that contains a clique of characters m. Chapter 5 discusses PD2C graphs with the smallest number of branches. From list of all PD2C graphs of order n ⩽ 13 are selected PD2C graphs of size at most 2n − 4. Only three of the isolated graphs are of size 2n − 5, which is in accordance with the statement of the Erdes-Renji theorem about the lower bound for the size graphs of diameter 2 that do not contain a node adjacent to all other nodes (that limit is 2n − 5). PD2C graphs of size 2n − 4 rows up to 13 sorted are in three groups: • The first group belongs to the Z family, defined in the dissertation, which for each n ⩾ 6 contains exactly one PD2C graph of order n of size 2n − 4. • The second group consists of seven Hamiltonian PD2C graphs of order at most 9 of size 2n−4. In the dissertation it was proved that there is no such Hamil- tone graph of order greater than 11, i.e. that the seven graphs found are the only ones Hamiltonian PD2C graphs of size 2n − 4. • The third group consists of a unique graph that does not belong to any of the first two groups. Based on these results, the hypothesis was formulated that all PD2C graphs re- that n ⩾ 10 and sizes 2n − 4 belong to the family Z. Keywords: graphs, critical graphs of diameter 2, primitive graph- You Scientific field: Computing and informatics Narrower scientific field: Graph theory UDC number: 004.415.5(519.1 URI: http://hdl.handle.net/123456789/5594 Files in this item: 1
disertacijaJovanRadosavljevic.pdf ( 746.0Kb )
Previous Page
Now showing items 21-40 of 40