Since i was doing some research in medical industry based on simplified Chinese, apparently custom extraction rules will be created to meet my requirement. But following questions occured when i did a simple test in simplified Chinese customizing extraction rules.
1.Target
My goal is to extract content after "姓名". In other words, I want to extract the person's name
The sample text is like "姓名 张三 性别 男 年龄 78岁" etc.
2.Scenario
CGUL rules which fail to extract person name:
① #group name: <姓名> [OD name="name"] <>+[/OD]
② #subgroup item: <姓名>
#group name: %(item)[OD name="myname"] <>*[/OD]
After those failure I wonder can extraction rule realize in simplified Chinese,so I tried the following CGUL rule and it works:
CGUL rule: #group myname:[OD name="myname"] < POS :Nn>*[/OD]
You can find the custom type "myname" in the result table as follow:
Here is another simple test:
I write two extraction rule as follow,
①#group test8: <><>
②#group test9: <姓名> <>
Group test8 result likes: -------extracted two tokens and my custom type "test8" displayed
But group test9 comes out with no extraction result, since i just add Chinese word"姓名" in a token<>.
So i really wonder how this issue happened. Can anyone who works on the same scenario shares some experience with me?