Run multiple jobs in Flink local environment

Actually there is little reason to do this. Flink local environment is generally used to test and debug, not for scalable stream data processing.

But anyway, this is the way to run multiple jobs in the local environment.

First, start the mini-cluster (with some parameters).

Configuration configuration = new Configuration();
configuration.setLong(TaskManagerOptions.MANAGED_MEMORY_SIZE, -1L);
configuration.setInteger(ConfigConstants.LOCAL_NUMBER_TASK_MANAGER, 2);
configuration.setInteger(ConfigConstants.TASK_MANAGER_NUM_TASK_SLOTS, 10);

// start cluster
LocalFlinkMiniCluster exec = new LocalFlinkMiniCluster(configuration, true);

Then, create and add jobs to the mini-cluster.

StreamExecutionEnvironment env = StreamExecutionEnvironment.createLocalEnvironment();
DataStream stream = env.addSource(...);

StreamGraph streamGraph = env.getStreamGraph();
JobGraph jobGraph = streamGraph.getJobGraph();
exec.submitJobAndWait(jobGraph, true);

In last step, one can choose submitJobDetached instead.

XXE attack and mitigation

Recently I’ve got a security violation report from Sonar. It is the XXE attack. This is indeed a scary scenario, with attacker able to access server internal file with ease.

The simplest approach is to disable this feature.

XMLInputFactory xif = XMLInputFactory.newFactory();
xif.setProperty(XMLInputFactory.IS_SUPPORTING_EXTERNAL_ENTITIES, false);
xif.setProperty(XMLInputFactory.SUPPORT_DTD, false);
XMLStreamReader xsr = xif.createXMLStreamReader(source);

However in my case, I actually need the feature to take reference of the packed XMLs.

So one approach for mitigating the risk is to use a customized XMLResolver. Then in this XMLResolver, whitelist only the essential resource accesses (or better, do the resolution in memory, if possible).

XMLInputFactory xif = XMLInputFactory.newFactory();
XMLStreamReader xsr = xif.createXMLStreamReader(source);
在线课程推荐:Astrophysics: The Violent Universe

edX 在线课程 Astrophysics: The Violent Universe



该 Astrophysics 系列由4门课程组成:
Greatest Unsolved Mysteries of the Universe
Exploring Exoplanets
The Violent Universe

简易版的杭州毅行线路。龙井村 十里琅珰 上至 真际寺,下到 九溪烟树,再上至 贵人阁,下到 虎跑泉 结束。全程两座山,十几公里。

Oracle 某项目的 Database Design Guidelines

  1. 涉及用户数据的表必须有USER_ID列。主要为了sharding。相应的,表和索引基于USER_ID做partition。
  2. 使用基本构造。Heap表,B-tree索引,VARCHAR2(数据库整体应该已经是unicode编码了)。
  3. 不使用Trigger。这部分逻辑应该放在中间层。
  4. 不使用PL/SQL。同上。
  5. 数据生命周期。主要是注意数据的清理。
  6. 不使用并行操作。大多数时间并不只有一个模块在运行。
  7. 不在运行时使用DDL。DDL只在downtime执行。
  8. 可以的话使用global temporary table。
  9. 不使用foreign key。这个比较意外,主要是为了migration方便。
Ozymandias by Percy Shelley

I met a traveller from an antique land
Who said: Two vast and trunkless legs of stone
Stand in the desert… near them, on the sand,
Half sunk, a shattered visage lies, whose frown,
And wrinkled lip, and sneer of cold command,
Tell that its sculptor well those passions read
Which yet survive, stamped on these lifeless things,
The hand that mocked them and the heart that fed:

And on the pedestal these words appear:
‘My name is Ozymandias, king of kings:
Look on my works, ye Mighty, and despair!’
Nothing beside remains. Round the decay
Of that colossal wreck, boundless and bare
The lone and level sands stretch far away.

