本周项目:Dat
本周的特色项目是 Dat,一个由资助支持的、开源、去中心化的数据集分发工具。Dat 由一个地理分布式团队构建和维护,团队中的许多人也参与了本文的撰写。
🌐 This week's featured project is Dat, a grant-funded, open source, decentralized tool for distributing data sets. Dat is built and maintained by a geodistributed team, many of whom helped write this post.
首先,什么是 Dat?
🌐 First off what is Dat?
我们希望将点对点和分布式系统的最佳部分引入数据共享。我们从科学数据共享开始,然后逐渐扩展到研究机构、政府、公共服务以及开源团队。
🌐 We wanted to bring the best parts of peer to peer and distributed systems to data sharing. We started with scientific data sharing and then began branching out into research institutions, government, public service, and open source teams as well.
另一种理解方式是把它看作类似 Dropbox 或 BitTorrent Sync 的同步和上传应用,只不过 Dat 是开源的。我们的目标是成为一个强大、开源、非营利的数据共享软件,适用于大规模、中小规模、小批量和大批量数据。
🌐 Another way to think about it is a sync and upload app like Dropbox or BitTorrent Sync, except Dat is open source. Our goal is to be a a powerful, open source, non-profit data sharing software for big, small, medium, small-batch and big-batch data.
要使用 dat CLI 工具,你只需输入:
🌐 To use the dat CLI tool, all you have to type is:
dat share path/to/my/folder
这将创建一个链接,你可以用它将该文件夹发送给其他人——没有中央服务器或第三方可以访问你的数据。与 BitTorrent 不同,你也无法窥探谁在分享什么(更多细节请参阅 Dat 论文草稿)。
🌐 And dat will create a link that you can use to send that folder to someone else -- no central servers or third parties get access to your data. Unlike BitTorrent, it's also impossible to sniff who is sharing what (see the Dat Paper draft for more details).
现在我们知道 Dat 是什么了。那么 Dat Desktop 又如何适应呢?
🌐 Now we know what Dat is. How does Dat Desktop fit in?
Dat Desktop 是一种让无法或不想使用命令行的人也能使用 Dat 的方式。你可以在你的电脑上托管多个 dat,并通过网络提供数据。
你能分享一些有趣的用例吗?
🌐 Can you share some cool use cases?
DataRefuge + Svalbard 项目
🌐 DataRefuge + Project Svalbard
我们正在开发一个代号为 Project Svalbard 的项目,它与 DataRefuge 有关,这是一个致力于备份可能消失的政府气候数据的组织。Svalbard 以北极的斯瓦尔巴全球种子库命名,该库拥有大型的地下植物 DNA 备份库。我们的版本是一个大型的、版本可控的公共科学数据集集合。一旦我们了解并信任这些元数据,就可以构建其他很酷的项目,比如 分布式志愿者数据存储网络 。
🌐 We're working on a thing codenamed Project Svalbard that is related to DataRefuge, a group working to back up government climate data at risk of disappearing. Svalbard is named after the Svalbard Global Seed Vault in the Arctic which has a big underground backup library of plant DNA. Our version of it is a big version controlled collection of public scientific datasets. Once we know and can trust the metadata, we can build other cool projects like a distributed volunteer data storage network.
加州公民数据联盟
🌐 California Civic Data Coalition
CACivicData 是一个开源档案库,提供来自 CAL-ACCESS 的每日下载服务,CAL-ACCESS 是加州追踪政治资金的数据库。他们提供 每日更新,这意味着他们的压缩文件中会包含大量重复数据。我们正在将他们的数据托管为 Dat 仓库,这将减少查找特定版本或更新到新版本所需的麻烦和带宽。
Electron 更新
🌐 Electron Updates
这一点目前尚未具体化,但我们认为一个有趣的用例是将编译好的 Electron 应用放入 Dat 仓库中,然后使用 Electron 中的 Dat 客户端拉取构建应用二进制文件的最新增量版本,以节省下载时间,同时也降低服务器的带宽成本。
🌐 This one isn't concrete yet, but we think a fun use case would be putting a compiled Electron app in a Dat repository, then using a Dat client in Electron to pull the latest deltas of the built app binary, to save on download time but also to reduce bandwidth costs for the server.
谁应该使用 Dat Desktop?
🌐 Who should be using Dat Desktop?
任何想在点对点网络上共享和更新数据的人。数据科学家、开放数据爱好者、研究人员、开发者。如果有人有我们还没想到的有趣用例,我们非常欢迎反馈。你可以加入我们的 Gitter 聊天 并向我们提问任何问题!
🌐 Anyone who wants to share and update data over a p2p network. Data scientists, open data hackers, researchers, developers. We're super receptive to feedback if anyone has a cool use case we haven't thought of yet. You can drop by our Gitter Chat and ask us anything!
Dat 和 Dat Desktop 的下一步计划是什么?
🌐 What's coming next in Dat and Dat Desktop?
用户账户和元数据发布。我们正在开发一个 Dat 注册表网页应用,将部署在 datproject.org,它基本上会是一个“数据集的 NPM”,唯一的区别是我们只是一个元数据目录,数据本身可以存放在任何网上位置(不像 NPM 或 GitHub,所有数据都是集中托管,因为源代码足够小,可以全部放在一个系统中)。由于许多数据集非常庞大,我们需要一个联邦注册表(类似于 BitTorrent 跟踪器的工作方式)。我们希望让人们能够轻松地通过 Dat Desktop 寻找或发布数据集,从而使数据共享进程无摩擦。
🌐 User accounts and metadata publishing. We are working on a Dat registry web app to be deployed at datproject.org which will basically be an 'NPM for datasets', except the caveat being we are just going to be a metadata directory and the data can live anywhere online (as opposed to NPM or GitHub where all the data is centrally hosted, because source code is small enough you can fit it all in one system). Since many datasets are huge, we need a federated registry (similar to how BitTorrent trackers work). We want to make it easy for people to find or publish datasets with the registry from Dat Desktop, to make the data sharing process frictionless.
另一个功能是多写入者/协作文件夹。我们有大计划进行协作工作流,可能会有分支,类似于 git,只是围绕数据集协作设计的。不过我们现在仍在努力提高整体稳定性并规范我们的协议!
🌐 Another feature is multi-writer/collaborative folders. We have big plans to do collaborative workflows, maybe with branches, similar to git, except designed around dataset collaboration. But we're still working on overall stability and standardizing our protocols right now!
为什么选择在 Electron 上构建 Dat 桌面?
🌐 Why did you choose to build Dat Desktop on Electron?
Dat 是使用 Node.js 构建的,因此它与我们的集成非常契合。除此之外,我们的用户使用各种各样的机器,因为科学家、研究人员和政府官员可能被迫为他们的机构使用特定的配置——这意味着我们需要能够针对 Windows 和 Linux 以及 Mac。Dat Desktop 让我们很容易做到这一点。
🌐 Dat is built using Node.js, so it was a natural fit for our integration. Beyond this, our users use a variety of machines since scientists, researchers and government officials may be forced to use certain setups for their institutions -- this means we need to be able to target Windows and Linux as well as Mac. Dat Desktop gives us that quite easily.
在构建 Dat 和 Dat Desktop 时,你遇到了哪些挑战?
🌐 What are some challenges you've faced while building Dat and Dat Desktop?
弄清楚人们想要什么。我们一开始使用的是表格数据集,但后来意识到这是一个有点复杂的问题,而且大多数人并不使用数据库。所以在项目进行到一半时,我们从头重新设计了一切,改用了文件系统,并且从未回头。
🌐 Figuring out what people want. We started with tabular datasets, but we realized that it was a bit of a complicated problem to solve and that most people don't use databases. So half way through the project, we redesigned everything from scratch to use a filesystem and haven't looked back.
我们也遇到了一些常见的 Electron 基础设施问题,包括:
🌐 We also ran into some general Electron infrastructure problems, including:
- 遥测 - 如何捕获匿名使用统计数据
- 更新 - 设置自动更新有点零散且需要一些技巧
- 发布——XCode 签名、在 Travis 上构建发布版本、进行 Beta 构建,这些都是挑战。
我们还在 Dat Desktop 的“前端”代码中使用了 Browserify 和一些很酷的 Browserify 转换器(这有点奇怪,因为即使我们有原生的 require,我们仍然进行打包——但这是因为我们需要这些转换器)。为了更好地管理我们的 CSS,我们从 Sass 切换到了使用 sheetify。这大大帮助我们模块化了 CSS,并使将我们的 UI 转向以组件为导向、具有共享依赖的架构变得更容易。例如,dat-colors 包含了我们所有的颜色,并在所有项目中共享。
🌐 We also use Browserify and some cool Browserify Transforms on the 'front end' code in Dat Desktop (which is kind of weird because we still bundle even though we have native require -- but it's because we want the Transforms). To better help manage our CSS we switched from Sass to using sheetify. It's greatly helped us modularize our CSS and made it easier to move our UI to a component oriented architecture with shared dependencies. For example dat-colors contains all of our colors and is shared between all our projects.
我们一直非常喜欢标准和最小化抽象。我们的整个界面都是使用常规 DOM 节点构建的,只借助了一些辅助库。我们已经开始将其中一些组件迁移到 base-elements,这是一个低级可复用组件库。像我们大多数技术一样,我们会不断迭代,直到做得正确,但作为一个团队,我们感觉自己正朝着正确的方向前进。
🌐 We've always been a big fan of standards and minimal abstractions. Our whole interface is built using regular DOM nodes with just a few helper libraries. We've started to move some of these components into base-elements, a library of low-level reusable components. As with most of our technology we keep iterating on it until we get it right, but as a team we have a feeling we're heading in the right direction here.
Electron 应该在哪些方面改进?
🌐 In what areas should Electron be improved?
我们认为最大的问题是原生模块。必须使用 npm 为 Electron 重新构建模块,这增加了工作进程的复杂性。我们的团队开发了一个名为 prebuild 的模块,它处理预构建的二进制文件,这在 Node 中效果很好,但 Electron 的工作进程仍然需要在安装后进行自定义步骤,通常是 npm run rebuild。这很烦人。为了解决这个问题,我们最近改用了一种策略,将所有平台的所有已编译二进制版本打包到 npm 压缩包中。这意味着压缩包会变大(虽然可以通过 .so 文件——共享库来优化),这种方法避免了必须运行安装后脚本,也完全避免了 npm run rebuild 模式。这意味着 npm install 在 Electron 中第一次就能正确工作。
🌐 We think the biggest pain point is native modules. Having to rebuild your modules for Electron with npm adds complexity to the workflow. Our team developed a module called prebuild which handles pre-built binaries, which worked well for Node, but Electron workflows still required a custom step after installing, usually npm run rebuild. It was annoying. To address this we recently switched to a strategy where we bundle all compiled binary versions of all platforms inside the npm tarball. This means tarballs get larger (though this can be optimized with .so files - shared libraries), this approach avoids having to run post-install scripts and also avoids the npm run rebuild pattern completely. It means npm install does the right thing for Electron the first time.
你最喜欢 Electron 的哪些方面?
🌐 What are your favorite things about Electron?
这些 API 似乎经过深思熟虑,相对稳定,并且能够很好地与上游 Node 版本保持同步,除此之外,我们没什么可要求的了!
🌐 The APIs seem fairly well thought out, it's relatively stable, and it does a pretty good job at keeping up to date with upstream Node releases, not much else we can ask for!
有哪些 Electron 开发技巧可能对其他开发者有用?
🌐 Any Electron tips that might be useful to other developers?
如果你使用原生模块,可以试试 prebuild!
🌐 If you use native modules, give prebuild a shot!
关注 Dat 开发的最佳方式是什么?
🌐 What's the best way to follow Dat developments?
在推特上关注 @dat_project,或订阅我们的 电子邮件通讯。
🌐 Follow @dat_project on Twitter, or subscribe to our email newsletter.



