Quantcast
Channel: SCN: Message List
Viewing all articles
Browse latest Browse all 9146

Issues about test analysis of customizing extraction rules in simplified Chinese

$
0
0

Since i was doing some research in medical industry based on simplified Chinese, apparently custom extraction rules will be created to meet my requirement. But following questions occured when i did a simple test in simplified Chinese customizing extraction rules.

 

1.Target

My goal is to extract content after "姓名". In other words, I want to extract the person's name

The sample text is like "姓名 张三 性别 男 年龄 78岁" etc.

 

2.Scenario

CGUL rules which fail to extract person name:

① #group name:  <姓名>  [OD name="name"] <>+[/OD]

② #subgroup item: <姓名>   

    #group name: %(item)[OD name="myname"] <>*[/OD]

 

 

After those failure I wonder can extraction rule realize in simplified Chinese,so I tried the following CGUL rule and it works:

CGUL rule:  #group myname:[OD name="myname"] < POS :Nn>*[/OD]

 

You can find the custom type "myname" in the result table as follow:

image001.png

 

Here is another simple test:

I write two extraction rule as follow,

    #group test8: <><>

    ②#group test9: <姓名> <>

 

Group test8 result likes: -------extracted two tokens and my custom type "test8" displayed

image002.png

 

But group test9 comes out with no extraction result, since i just add Chinese word"姓名" in a token<>.

 

So i really wonder how this issue happened. Can anyone who works on the same scenario shares some experience with me?


Viewing all articles
Browse latest Browse all 9146

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>