Timeline & beta testing #24

piskvorky · 2014-12-20T10:18:29Z

Hello gentlemen,

what is the expected timeline for releasing semanticizest?

I have a client eager to try it out (because semanticizer is so slow, it's a bottleneck for them).

So I'm thinking if there's a way to share (human) resources -- maybe they could do some beta testing and benchmarking on live data, as soon as you proclaim the new semanticizest production-ready?

larsmans · 2014-12-20T14:12:24Z

This was planned for last month, but the deadline slipped because of other projects that had priority. My own plan is to release a bare-bones version in the second week of January.

piskvorky · 2014-12-20T14:21:34Z

Great, thanks Lars. Please ping me when you think it's ready (not sure what bare-bones means, perhaps that's enough).

larsmans · 2014-12-20T14:38:33Z

Bare-bones means we can semanticize with baseline metrics. No count-min sketch needed (you only need that for fitting more complicated models on the output of semanticizest).

larsmans · 2015-01-30T14:01:32Z

Hi @piskvorky, I think what we have now is ready for beta-testing. Would you like to have a try?

I was wanting to merge #22 before releasing as beta, but it needs tests and I'm not going to postpone further. The current functionality should be close to what the old semanticizer could do.

piskvorky · 2015-01-30T14:28:58Z

Excellent, thanks! Will check the situation with client and report back.

By the way, I remember we trained an extra model with David Graus, on Yahoo! queries. Is a similar thing possible here? Or does it not make sense? CC @graus .

larsmans · 2015-01-30T14:38:46Z

There's no re-ranking model in here. If you want that, you'd need to stack a model on top.

graus · 2015-01-30T14:44:23Z

(Which I will definitely work on if it does not magically appear -- I don't recall [if/where] this item ended up in terms of roadmap. Nor if there's a roadmap.)

larsmans · 2015-01-30T14:49:14Z

The idea was to provide all the information necessary for feature extraction in such a model. The thing is that we can't ship models, training data or anything, so we can't test this stuff and it will go stale.

piskvorky · 2015-01-30T15:25:46Z

Thanks guys.

One question: skimming the docs I can't see an obvious answer (will have to check the API in more detail later), but: I remember we had some disambiguation issues and needed semanticizer to return "unnormalized" statistics too, for some local result post-filtering.

We ended up adding something like link['unnormed'] = self.wpm.get_sense_data(ngram, sense_str) to our fork of semanticizer.

Is this already included in semanticizest, or will we have to add it again ourselves (+ pull request)? Or does this question not even make sense for the new code/approach?

larsmans · 2015-01-30T15:33:46Z

We're not returning that, but we should. There's an XXX in semanticizest/_semanticizer.py where it needs to be added...

c-martinez · 2015-02-02T08:20:01Z

Speaking of shipping models -- does anyone have an English model and wouldn't mind sharing it with me? I've tried building it myself, but my laptop crashes after processing 3920000 articles. ;-(

larsmans · 2015-02-03T15:52:00Z

I'm building one right now, ready in three hours. Ping me by email tomorrow morning.

c-martinez · 2015-02-03T16:06:36Z

Cool thanks :-)

piskvorky · 2015-03-30T09:43:49Z

We ran some initial checks, created the EN model, and I want to make sure I understand correctly:

semanticizEST only has a single API method, all_candidates(tokens). This returns all candidates for given tokens, no context, no disambiguation.

There's no API like in semanticizER where we send in a text (string) and it gives us back detected entities.

Is that correct?

Are you planning on extending the pipeline? What is the timeline before reaching ± semanticizER functionality? The README says that is the goal.

Thanks! CC @tgalery @graus @larsmans

piskvorky · 2015-04-29T17:47:50Z

Ping @graus @larsmans clarification of the project goals would be welcome. Please let us know what the status is.

Thanks a lot!

larsmans · 2015-04-30T14:10:22Z

The status is that we have a working replacement at http://github.com/semanticize/st that is only lacking:

disambiguation
REST API (currently being written)
Python wrapper (that will be this repo, I guess)
documentation

I plan to have this finished this week (and I'm working on Saturday). You're welcome to test this new version. People at UvA are already using it.

piskvorky · 2015-04-30T20:01:13Z

Thanks Lars.

@tgalery can you keep an eye on this? Once we know how to apply semanticizest at a level where it can replace semanticizer (=API for linking entities from plain text), let's evaluate.

tgalery · 2015-04-30T20:50:38Z

Will do @piskvorky !

larsmans · 2015-05-03T22:01:11Z

REST API now works, simple disambiguation in the works but not yet finished.

larsmans · 2015-05-07T14:20:12Z

@tgalery The package is now ready for beta-testing, AFAIC.

tgalery · 2015-05-07T14:36:01Z

Thanks @larsmans I will have a go when I have the time.

tgalery · 2015-06-18T12:36:11Z

Hi @larsmans @piskvorky I finally had time to take a look at this. I have been playing with the Danish model, and it seems that the st project has pretty much the same functionality of the semanticizest project. The only difference is that instead of having a single endpoint where you get all the candidates, one can also get the candidate matching a bestpath and an exact match of the string (as documented here https://github.com/semanticize/st/blob/68465fe840a6087698df8963af5980373c5cedb4/cmd/semanticizest/webserver.go) . Although this is functionality added on top of semanticizest, it seems that there is no spotter per se (i.e. something that determines which surface forms in the whole text are worth extracting candidates from) nor any robust incorporation of context. Am I right ? Are there any plans to incorporate those in the project ?

larsmans · 2015-06-22T12:45:36Z

Candidate entities are determined by semanticizest itself; this is a consequence of the hash representation. There are also no context features. We don't have plans to add them, but if @dodijk agrees that we're missing them, they could be added.

The plan was to have semanticizest do basic entity linking, and do it fast, without too many dependencies in terms of training sets, with enough useful output for downstream code to improve its results.

piskvorky added the question label Dec 20, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timeline & beta testing #24

Timeline & beta testing #24

piskvorky commented Dec 20, 2014

larsmans commented Dec 20, 2014

piskvorky commented Dec 20, 2014

larsmans commented Dec 20, 2014

larsmans commented Jan 30, 2015

piskvorky commented Jan 30, 2015

larsmans commented Jan 30, 2015

graus commented Jan 30, 2015

larsmans commented Jan 30, 2015

piskvorky commented Jan 30, 2015

larsmans commented Jan 30, 2015

c-martinez commented Feb 2, 2015

larsmans commented Feb 3, 2015

c-martinez commented Feb 3, 2015

piskvorky commented Mar 30, 2015

piskvorky commented Apr 29, 2015

larsmans commented Apr 30, 2015

piskvorky commented Apr 30, 2015

tgalery commented Apr 30, 2015

larsmans commented May 3, 2015

larsmans commented May 7, 2015

tgalery commented May 7, 2015

tgalery commented Jun 18, 2015

larsmans commented Jun 22, 2015

Timeline & beta testing #24

Timeline & beta testing #24

Comments

piskvorky commented Dec 20, 2014

larsmans commented Dec 20, 2014

piskvorky commented Dec 20, 2014

larsmans commented Dec 20, 2014

larsmans commented Jan 30, 2015

piskvorky commented Jan 30, 2015

larsmans commented Jan 30, 2015

graus commented Jan 30, 2015

larsmans commented Jan 30, 2015

piskvorky commented Jan 30, 2015

larsmans commented Jan 30, 2015

c-martinez commented Feb 2, 2015

larsmans commented Feb 3, 2015

c-martinez commented Feb 3, 2015

piskvorky commented Mar 30, 2015

piskvorky commented Apr 29, 2015

larsmans commented Apr 30, 2015

piskvorky commented Apr 30, 2015

tgalery commented Apr 30, 2015

larsmans commented May 3, 2015

larsmans commented May 7, 2015

tgalery commented May 7, 2015

tgalery commented Jun 18, 2015

larsmans commented Jun 22, 2015