-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update the parse function to accept an entity id #189
Changes from 14 commits
e7d6fde
b9e2b60
e1cf239
d2baf9d
9c46a11
95ff207
eab0424
33c2182
b592e8b
f378a14
880d1ef
a56e43a
3203a51
1196764
5acee7f
b130665
dc45471
8a88907
4d570c3
0d7e6ec
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -45,6 +45,10 @@ pipeline: | |
author: .//meta[@name="author"]/@content | ||
publishedAt: .//*[@class="date"]/text() | ||
description: .//meta[@property="og:description"]/@content | ||
keys: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. May be the method needs to be changed to use the built-in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure. The problem with doing that though is that the scraper won't be able to extract the body of the article, which is why the custom script exists. I guess, as it's an example it doesn't really matter too much, but that is why we have a difference. Personally I don't have an issue with having the documentation not match the example in the repo |
||
- title | ||
- author | ||
- publishedAt | ||
handle: | ||
store: store | ||
fetch: fetch | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
# from typing_extensions import TypedDict | ||
from typing import Optional, TypedDict | ||
|
||
|
||
class MetaBase(TypedDict): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. MetaBase vs Meta difference is no longer used. So I guess it's safe to merge these two together now? |
||
crawler: Optional[str] | ||
foreign_id: Optional[str] | ||
source_url: Optional[str] | ||
title: Optional[str] | ||
author: Optional[str] | ||
publisher: Optional[str] | ||
file_name: Optional[str] | ||
retrieved_at: Optional[str] | ||
modified_at: Optional[str] | ||
published_at: Optional[str] | ||
headers: any | ||
keywords: any | ||
|
||
|
||
class Meta(MetaBase, total=False): | ||
parent: any | ||
languages: any | ||
countries: any | ||
mime_type: any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest we use syntax similar to ftm mappings for consistency. Something like https://github.com/alephdata/aleph/blob/main/mappings/md_companies.yml#L15-L17
So the keys section will look like:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
keys
section needs to be updated now I think?