INFORMATION RETRIEVAL BERBASIS LATENT DIRICHLET ALLOCATION PADA DATA KEKAYAAN INTELEKTUAL

Hashri Hayati, Muhammad Riza Alifi

Abstract


The shift toward a knowledge-based economy underscores the importance of intellectual property (IP) management. Unfortunately, conventional keyword-based search methods often fail to capture the semantic relationships between concepts in documents—particularly complex ones like patents and copyrights. This study proposes a topic modeling approach using the Latent Dirichlet Allocation (LDA) method to improve the relevance and accuracy of information retrieval in IP data. The research developed 76 models based on four scenarios: with and without language translation, and with and without n-gram tokenization, using topic numbers ranging from 1 to 19. The best four models from each scenario yielded coherence scores between 0.4411 and 0.4581. Evaluation using Mean Average Precision (MAP) on the top 10 documents showed that the model without translation and with unigram tokenization (10 topics) achieved the best results with an average MAP of 78%. The findings indicate that language translation and n-gram tokenization do not significantly impact the coherence score. However, models without n-gram tokenization (bigram and trigram combinations) yielded relatively more semantically relevant search results based on MAP values. Automatic translation in this study resulted in lower MAP scores compared to models without translation.


Full Text:

PDF

References


Al-Shboul, B. and Myaeng, S.H., 2014. Wikipedia-based query phrase expansion in patent class search. Information Retrieval, 17(5–6), pp.430–451. https://doi.org/10.1007/s10791-013-9233-4.

Aristodemou, L. and Tietze, F., 2018. The state-of-the-art on Intellectual Property Analytics (IPA): A literature review on artificial intelligence, machine learning and deep learning methods for analysing intellectual property (IP) data. World Patent Information, https://doi.org/10.1016/j.wpi.2018.07.002.

Chen, L., Xu, S., Zhu, L., Zhang, J., Yang, G. and Xu, H., 2022. A deep learning based method benefiting from characteristics of patents for semantic relation classification. Journal of Informetrics, 16(3), p.101312.

Cho, S.-B., Shin, S. and Kang, D.-S., 2018. A study on the research trends on open innovation using topic modeling. Informatization policy, 25(3), pp.52–74.

Hanbury, A., Lupu, M., Kando, N., Diallo, B. and Adams, S., 2014. Guest editorial: Special issue on information retrieval in the intellectual property domain. Information Retrieval, https://doi.org/10.1007/s10791-014-9245-8.

Jeong, Y., Park, I. and Yoon, B., 2019. Identifying emerging Research and Business Development (R&BD) areas based on topic modeling and visualization with intellectual property right data. Technological Forecasting and Social Change, 146, pp.655–672. https://doi.org/10.1016/j.techfore.2018.05.010.

Jochim, C., 2014. Natural Language Processing and Information Retrieval Methods for Intellectual Property Analysis.

Khode, A. and Jambhorkar, S., 2017. A Literature Review on Patent Information Retrieval Techniques. Indian Journal of Science and Technology, [online] 10(36), pp.1–13. https://doi.org/10.17485/ijst/2017/v10i37/116435.

Khode, A. and Jambhorkar, S., 2022. Augmenting keyword-based patent prior art search using weighted classification code hierarchies. International Journal of Business Intelligence and Data Mining, 21(4), pp.397–418.

Lehmann, A., 2023. Topic Modeling for Intellectual Property Research: Comparing Methods Through Simulation and Application.

Modic, D., Hafner, A., Damij, N. and Cehovin Zajc, L., 2019. Innovations in intellectual property rights management: Their potential benefits and limitations. European Journal of Management and Business Economics, 28(2), pp.189–203. https://doi.org/10.1108/EJMBE-12-2018-0139.

Yun, J. and Geum, Y., 2020. Automated classification of patents: A topic modeling approach. Computers and Industrial Engineering, 147. https://doi.org/10.1016/j.cie.2020.106636.




DOI: https://doi.org/10.31884/jtt.v11i2.793

Refbacks

  • There are currently no refbacks.


Copyright (c) 2025 JTT (Jurnal Teknologi Terapan)



View Stats

 

 Creative Common Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)